Temporal instability can reflect either true psychological change or measurement error. I offer several recommendations to improve stability research and enhance the ability to detect error; these include the use of (a) theoretically meaningful retest intervals, (b) larger sample sizes, and (c) benchmark scales that permit comparative tests of stability. I illustrate this approach using retest data of obsessive–compulsive symptoms, dissociative tendencies, trait affectivity, and the Big Five. These data demonstrate that highly correlated measures of the same target constructs show significantly different levels of stability, even over 2-month retest intervals during which true change should be minimal. These discrepancies are not simply due to broad differences in content, but reflect more subtle differences in wording, instructions, and response formats.