Article

Stability versus change, dependability versus error: Issues in the assessment of personality over time

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Temporal instability can reflect either true psychological change or measurement error. I offer several recommendations to improve stability research and enhance the ability to detect error; these include the use of (a) theoretically meaningful retest intervals, (b) larger sample sizes, and (c) benchmark scales that permit comparative tests of stability. I illustrate this approach using retest data of obsessive–compulsive symptoms, dissociative tendencies, trait affectivity, and the Big Five. These data demonstrate that highly correlated measures of the same target constructs show significantly different levels of stability, even over 2-month retest intervals during which true change should be minimal. These discrepancies are not simply due to broad differences in content, but reflect more subtle differences in wording, instructions, and response formats.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Multiple types of reliability exist, each estimated using a different indicator. Two of the most pertinent with regard to the DES are Cronbach's α (i.e., consistency across items or internal consistency: Cortina, 1993;Cronbach, 1951) and dependability (i.e., short-term test-retest reliability: Cattell et al., 1970;Chmielewski & Watson, 2009;Gnambs, 2014;McCrae et al., 2011;Watson, 2004). ...
... In addition, because transient error can create the appearance of true change when none occurred, transient error reduces the validity of measures assessing trait-like constructs. Because α only assesses consistency across items during a single assessment, it is unable to detect the extent to which a measure is susceptible to these transient errors (Chmielewski et al., 2016;Schmidt et al., 2003;Watson, 2004). ...
... To determine the extent to which a measure is susceptible to transient error, short-term test-retest studies (i.e., dependability analyses) are necessary (Chmielewski et al., 2016;Chmielewski & Watson, 2009;Gnambs, 2014;McCrae et al., 2011;Watson, 2004). In contrast to typical retest studies, which often examine consistency over intervals during which both transient error and true changes in the construct are likely, dependability studies employ short retest intervals over which true trait-level change in a construct (e.g., trait dissociation) is unlikely to occur. ...
Article
It is imperative that psychological measures demonstrate strong psychometric properties in order to increase study replicability, develop an accurate understanding of constructs, identify potential mechanisms, and accurately determine treatment efficacy. The Dissociative Experiences Scale (DES) is the most widely used measure of dissociation. However, concerns have been raised about the DES’s response format and items. In addition, the measure has demonstrated poorer dependability (e.g., short-term test–retest reliability) than other dissociation measures. The current research examines these issues across two studies. The goal of Study 1 (N = 163 undergraduates) was to empirically test concerns regarding the DES’s response format and items. Participants’ responses to the DES using the standard response format did not align with their subsequent estimates of how frequently those items occurred. Moreover, participants often did not interpret the DES items in the way intended by the measure. In Study 2 (N = 447 undergraduates, 2-week retest interval), we attempted to improve the dependability of the DES by changing the standard DES’s response format without substantially altering its items. Changing the response format did not appear to improve the dependability of the DES, suggesting other features of the measure are responsible for its poor dependability. In conclusion, the present studies provide empirical evidence for concerns about the DES’s psychometric properties and indicate the DES demonstrates low reliability that appears to result, in part, from item wording. Keywords: Dissociative Experiences Scale, psychometrics, reliability, dependability, retest Supplemental materials: https://doi.org/10.1037/cns0000334.supp
... 4). However, not all measurement error is random (Anastasi & Urbina, 1997;Chmielewski & Watson, 2009;John & Soto, 2007;Schmidt et al., 2003;Taylor, 1999;Watson, 2004). In fact, there are two main classes of measurement error; random errors (as described by Byers-Heinlein) and systematic errors (which are correlated and do not cancel each other out; Chmielewski & Watson, 2009;Krippendorf, 1970;Taylor, 1999;Schmidt et al., 2003;Watson, 2004). ...
... However, not all measurement error is random (Anastasi & Urbina, 1997;Chmielewski & Watson, 2009;John & Soto, 2007;Schmidt et al., 2003;Taylor, 1999;Watson, 2004). In fact, there are two main classes of measurement error; random errors (as described by Byers-Heinlein) and systematic errors (which are correlated and do not cancel each other out; Chmielewski & Watson, 2009;Krippendorf, 1970;Taylor, 1999;Schmidt et al., 2003;Watson, 2004). The fact that there are different types and sources of error necessitates different indicators of reliability to tap into each (see Table 1). ...
... For this same reason, aggregating across trials on that same day will not cancel out error variance caused by fussiness. In this way, examining consistency across trials on the same occasion can only detect random error and will not detect systematic errors such as transient error (Cattell et al., 1970;Chmielewski & Watson, 2009;Green, 2003;Gnambs, 2014;Schmidt et al., 2003;Watson, 2004). ...
Article
The field of development needs more reliable infant work – improving our measures and methods is at the heart of accurately understanding why children do what they do! However, as scientists, we cannot stop with reliability but must also include measures of validity in our studies. In this commentary, we clarify and expand upon discussions of reliability and measurement error. We also argue for the importance of assessing the validity of our measures and tasks. Indeed, careful considerations of both reliability and validity are necessary for improving infant research. Improving the reliability of infant work is critical; however, reliability by itself is not sufficient for advancing science. Capturing multiple types of reliability and indices of validity will lead to the best measures and tools for researchers. A careful consideration of both reliability and validity are necessary for improving infant research.
... Additionally, test-retest stability was considered to be the most meaningful and important reliability in personality research (Watson, 2004). Up to now, only three published studies have provided evidence of the temporal consistency for different forms of the PID-5. ...
... However, the interpretation of test-retest reliability should take the sample size and retest interval into account. The sample sizes in the above three studies were too small to determine the true level of stability (Watson, 2004). Within a short retest interval, test-retest reliability could reflect the stability of the measure and observed changes could be attributed to measurement error. ...
... Within a short retest interval, test-retest reliability could reflect the stability of the measure and observed changes could be attributed to measurement error. However, there is no clear distinction between short-term and long-term stability of the personality pathological traits (Watson, 2004). To address this gap, more test-retest reliability evidence of the PID-5 in a relatively large sample within an adequate interval is needed. ...
Article
Full-text available
To evaluate the factor structure, reliability, and validity of the Personality Inventory for DSM-5 (PID-5) in Chinese nonclinical adolescents, a total of 1,442 Chinese middle school youths (Mage = 14.85, girls = 52.4%) were recruited in the present study. All the participants completed the full-length 220-item PID-5. Some participants (n = 1,003) were administered adolescents’ social adjustment as a criterion measure at the same time and 236 participants took part in longitudinal assessment of the PID-5 and adolescents’ social adjustment 6 months later. First, exploratory structural equation modeling analyses supported a six-factor structure of the PID-5 in our present sample. Second, Negative Affectivity, Detachment, Antagonistic, and Disinhibition domains had positive correlations with negative social adjustment, and negativecorrelations with positive social adjustment concurrently and longitudinally, with the exception of Constraint and Psychoticism. Third, Cronbach’s alpha for the PID-5 traits ranged from .57 to .91 in the full sample. The 6-month test–retest reliability by indexes of interclass correlation coefficient showed poor to good stability. As a whole, our findings provided preliminary evidence of the PID-5 as a reliable and valid measure of adolescents’ maladaptive personality traits in mainland China.
... The inclusion of personality, affect, and other psychopathology measures also provided "benchmarks" to aid in the interpretation of the test-retest correlations from our stability analyses (see Watson, 2004). These "benchmarks" permit one to conduct comparative tests of stability across constructs hypothesized to be more or less stable than the constructs of interest. ...
... Several aspects of the Table 5 data are interesting in their own right. For example, consistent with previous research indicating that personality measures show comparatively stronger temporal stability than measure of trait affectivity (Chmielewski & Watson, 2009;Watson, 2004), our data demonstrated that the test-retest coefficient for BFI Extraversion (.90) was significantly stronger than the coefficient for PANAS Positive Affect (.73) and that BFI Neuroticism (.85) produced a significantly higher test-retest correlation than PANAS Negative Affect (.73). More generally, it is noteworthy that all of the BFI scales demonstrated significantly stronger test-retest coefficients than any of the other scales in Table 5, with BFI Extraversion yielding a significantly stronger stability coefficient than any of the other BFI scales. ...
... This may encourage episodic processing, which is more biased than semantic memory (Robinson & Clore, 2002). As a result, we encourage the use of administration formats (e.g., general behavior ratings) that ask participants to rate their generalized self-view and selfconcepts-in other words, formats that access semantic rather than episodic memory-whenever possible (Watson, 2004). ...
Preprint
Fluctuations in mood and activity levels are defining features of bipolar disorder, but the temporal stability of measures used to assess symptoms and traits relevant to bipolar disorder is unclear. This study examined the short-term stability of several widely used, contemporary bipolar disorder measures (e.g., Altman Self-Rating Mania Scale, General Behavior Inventory, Hypomanic Personality Scale, Mood Disorder Questionnaire) over a period of roughly 2 weeks (M Retest Interval = 15.17 days) in an undergraduate sample. The stability correlations varied widely, ranging from .49 to .83. As would be expected, measures that were designed to assess traits related to bipolar disorder tended to show stronger stability than scales purporting to assess more transient symptoms of bipolar disorder. Other analyses revealed that—consistent with previous research—some bipolar disorder scales demonstrated moderate to strong positive relations with neuroticism/negative affect and other psychopathology, whereas others related weakly to such measures but showed more robust positive relations with extraversion/positive affect. Taken together, our findings suggest that it is important to consider administration instructions (e.g., trait vs. symptom ratings), subscale properties, and item format when selecting study measures in bipolar disorder research.
... Getting along, in which agreeableness was positively linked to social goals, and getting ahead, in which extraversion and conscientiousness were positively related to economic goals. Getting along and getting ahead as master motives represent two pivotal sources of human striving, such as described in the (neo)socioanalytic model (Hogan & Roberts, 2000, 2004: Getting along, on the one hand, maps onto a desire for social acceptance and approval (Hogan & Roberts, 2000, 2004, refers to the ability to relinquish individuality through participating in larger social networks, and manifests in striving for community, social relationships, intimacy, or altruism (Abele & Wojciszke, 2014;Digman, 1997;Rank, 1945;Sheldon & Cooper, 2008). Getting ahead, on the other hand, reflects a desire for status, power, and control of resources (Hogan & Roberts, 2000, 2004, refers to the capacity to deal with the environment as a separate individual unit, and manifests in goal pursuit as well as in striving for power, fame, or self-expansion (Rank, 1945;Sheldon & Cooper, 2008). 1 As such, everyday social living involves both getting along and getting ahead (Hogan, 1982), but people may differ with regard to their inclination to pursue one over the other. ...
... Getting along, in which agreeableness was positively linked to social goals, and getting ahead, in which extraversion and conscientiousness were positively related to economic goals. Getting along and getting ahead as master motives represent two pivotal sources of human striving, such as described in the (neo)socioanalytic model (Hogan & Roberts, 2000, 2004: Getting along, on the one hand, maps onto a desire for social acceptance and approval (Hogan & Roberts, 2000, 2004, refers to the ability to relinquish individuality through participating in larger social networks, and manifests in striving for community, social relationships, intimacy, or altruism (Abele & Wojciszke, 2014;Digman, 1997;Rank, 1945;Sheldon & Cooper, 2008). Getting ahead, on the other hand, reflects a desire for status, power, and control of resources (Hogan & Roberts, 2000, 2004, refers to the capacity to deal with the environment as a separate individual unit, and manifests in goal pursuit as well as in striving for power, fame, or self-expansion (Rank, 1945;Sheldon & Cooper, 2008). 1 As such, everyday social living involves both getting along and getting ahead (Hogan, 1982), but people may differ with regard to their inclination to pursue one over the other. ...
... Getting along and getting ahead as master motives represent two pivotal sources of human striving, such as described in the (neo)socioanalytic model (Hogan & Roberts, 2000, 2004: Getting along, on the one hand, maps onto a desire for social acceptance and approval (Hogan & Roberts, 2000, 2004, refers to the ability to relinquish individuality through participating in larger social networks, and manifests in striving for community, social relationships, intimacy, or altruism (Abele & Wojciszke, 2014;Digman, 1997;Rank, 1945;Sheldon & Cooper, 2008). Getting ahead, on the other hand, reflects a desire for status, power, and control of resources (Hogan & Roberts, 2000, 2004, refers to the capacity to deal with the environment as a separate individual unit, and manifests in goal pursuit as well as in striving for power, fame, or self-expansion (Rank, 1945;Sheldon & Cooper, 2008). 1 As such, everyday social living involves both getting along and getting ahead (Hogan, 1982), but people may differ with regard to their inclination to pursue one over the other. We argue that this inclination would be embedded across the levels of personality. ...
Article
Full-text available
According to the integrative framework for studying people, personality manifests and develops along three separate, but related, levels: the actor (e.g., traits), agent (e.g., goals), and author (i.e., narratives). Although these levels are thought to be conceptually interrelated, few studies have empirically examined such interrelations. To address this gap, the present study tested how traits, goals, and narratives are longitudinally related to each other and whether master motives (getting along and getting ahead) serve as helpful tools to structure these interrelations. Applying a developmental approach, we further explored these interrelations against the background of age-related effects. A sample of 141 participants (14–68 years, M = 35.40 years) completed self-reports on traits and goals at the beginning and end of a 2-year study. In between these measurements, participants took part in a life story interview that assessed narratives. We applied multilevel analyses and found that traits, goals, and narratives were meaningfully related to each other. Interactions with age occurred in less than 20% of the cases, emerged among the majority of variables (except for agreeableness and openness), were most pronounced for narratives, and were mainly found among young and middle-aged participants. The findings are discussed in view of master motives.
... for the APBS total score. Different opinions exist in the psychometric literature on the guidelines for interpreting the adequacy of test-retest coefficients, given that estimates of test-retest reliability are affected by the period between the test and retest (Charter, 2003;Revelle & Condon, 2018;Watson, 2004). In this RG meta-analysis, the studies used time intervals ranging from 52 to 208 weeks (with the exception of one study that had a time interval of 1 day). ...
... In this RG meta-analysis, the studies used time intervals ranging from 52 to 208 weeks (with the exception of one study that had a time interval of 1 day). A considerable test-retest correlation over a long period indicates temporal stability (Revelle & Condon, 2018;Watson, 2004). The prediction interval for temporal stability reliability estimates of the APBS indicates that future test-retest correlations might range between .259 and .884, ...
Article
The Adult Prosocialness Behavior Scale (APBS) is most often used to measure adult prosociality. We conducted a reliability generalization meta-analysis to compute the average APBS reliability and examine the heterogeneity among reliability estimations and the influence of moderator variables. An exhaustive search identified 74 articles that applied the APBS with 16 items assessed on a 5-point Likert-type scale. Of these, 58 had reliability coefficients with the current data, and 76 reliability estimates were provided. Random- and mixed-effects models were used. The average reliability coefficient was .903 for Cronbach’s alpha, .896 for McDonald’s omega, and .674 for test–retest. Moderator analyses were used to create a predictive model in which the target population and study language accounted for 48.7% of the total variability among Cronbach’s alpha coefficients. Although the APBS has shown satisfactory internal consistency, it can vary as a function of several factors.
... Finally, we demonstrate how Prolific can be used to obtain data needed for other psychometrically oriented analyses involving examination of the test-retest stability of measure scores. Use of data from the 377 Prolific participants who completed both study assessments for computing test-retest correlations exceeds even stringent sample size recommendations (e.g., 300 participants; Watson, 2004), which is important to note given that researchers often use very small sample sizes for test-retest analyses if they are conducted at all (e.g., samples of 50-75 participants; Watson, 2004). Test-retest stability coefficients reported as Pearson correlations conformed with theoretical expectations when examining stability over the 2-week study period, as shown in Table 4. ...
... Finally, we demonstrate how Prolific can be used to obtain data needed for other psychometrically oriented analyses involving examination of the test-retest stability of measure scores. Use of data from the 377 Prolific participants who completed both study assessments for computing test-retest correlations exceeds even stringent sample size recommendations (e.g., 300 participants; Watson, 2004), which is important to note given that researchers often use very small sample sizes for test-retest analyses if they are conducted at all (e.g., samples of 50-75 participants; Watson, 2004). Test-retest stability coefficients reported as Pearson correlations conformed with theoretical expectations when examining stability over the 2-week study period, as shown in Table 4. ...
Article
The Prolific platform offers a potentially useful and efficient crowdsourcing option for repeated assessment substance use research, including for psychometric research requiring large samples. We present both (a) a series of practical recommendations for using Prolific and (b) data from multiple samples demonstrating Prolific's potential for efficiently collected repeated measures data. First, we present data from a 5-day daily diary protocol. We recruited a large sample (N = 321 at Day 1) screened for a history of self-identified mental health issues and weekly alcohol use. Participant adherence was good (82%) even without in-person contact. Alcohol use patterns conformed to theoretical expectations: Participants were more likely to drink on Fridays and Saturdays than other days, men drank more than women, and higher Alcohol Use Disorders Identification Test (AUDIT; Saunders et al., 1993) scores were associated with an increased likelihood of use and more overall drinking on a given day. Second, we present data from 429 Prolific participants screened for a history of mental health issues who completed assessments 2 weeks apart with strong retention (N = 377; 88%). We compare these data with the data from undergraduates (N = 529) to demonstrate Prolific's utility for conducting psychometrically oriented substance use research. Internal consistency estimates for measures from the Prolific data matched or exceeded those from the undergraduate data. Furthermore, measure scores showed strong temporal stability, and factor structures (e.g., AUDIT item-level structures) conformed to theoretical expectations. Collectively, these findings indicate that Prolific can be used successfully for repeated measures data collection. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
... In addition, personal self-efficacy also affects the expectation of results [34,35]. It has been well researched that positive emotions act as a restraining mechanism in the relationship between behavior and expected results [36,37]. People who think they have the ability to complete a task will expect a smooth and perfect result when they try to achieve a goal [35]. ...
... Positive emotions can strengthen an individual's ability to solve problems and produce positive actions [33], even in uncertain and risky medical situations [39]. On the contrary, negative emotions suppress personal thoughts, narrow attention and cognition, and instill the belief that certain situations are difficult to control, making people afraid to act [36,37]. In this study, there was no significant difference regarding the factor "negative emotion" between participants who had or had not received PGY training. ...
Article
Full-text available
Background: Taiwan implemented the post-graduate year (PGY) training to reform the medical education system to provide holistic medical care after severe acute respiratory syndrome in 2003. In late 2019, COVID-19 quickly spread across the globe and became a pandemic crisis. This study aimed to investigate whether the establishment of the PGY training had positive effects on the self-efficacy and emotional traits of medical workers. Methods: One hundred and ten physicians, including PGY, residents, and visiting staff, were investigated using the General Self-Efficacy Scale (GSES) and Emotional Trait and State Scale (ETSS), and their feedback and suggestions were collected. An exploratory factor analysis was done to reduce the factor dimensions using the varimax rotation method, which was reduced to four factors: “the ability to cope with ease”, “proactive ability”, “negative emotion”, and “positive emotion”. A comparison with and without PGY training when facing the COVID-19 pandemic was conducted. Results: Those who had received PGY training (n = 77) were younger, had a lower grade of seniority, and had less practical experience than those who had not received PGY (n = 33). Those who had received PGY training had significantly higher scores for the factors “ability to cope with ease”, “proactive ability”, and “positive emotion” than those who had not received PGY training. Conclusion: The study revealed that PGY training may have had positive effects on the personal self-efficacy and emotional traits of physicians coping with the COVID-19 pandemic.
... Further, it must be remembered that the test-retest interval was six weeks. Multiple factors could influence test-retest performance especially longer duration test-retest intervals and changes in an individual's transient state (Polit, 2014;Watson, 2004). Specifically, it is reasonable to suggest that self-perceived balance confidence is a psychological construct that (in general) may be more of a stable trait on the 'stable trait-transient state' continuum (Matthews et al., 2009). ...
... To this end, any researcher that plans to use the ABC or ABC-6VI inventories in youth with VI should provide as many withinsample reliability and/or validity metrics as possible (even if they are not completing a 'psychometric' study) to (a) provide evidence that analyzed scores have integrity and (b) so that within-sample psychometrics can be compared across different samples/studies (Appelbaum et al., 2018). Further, test-retest inquiries which utilize an interval shorter than six weeks (two weeks may be an ideal length; Watson, 2004), minimal detectable difference/change investigations, predictive validity studies whereby populationspecific cutoffs for function, fall risk, or related outcomes are proposed (as has been done in older adults; Lajoie & Gallagher, 2004;Myers et al., 1998), or investigations which examine ABC scale modifications (e.g., four-or five-point scale, word adjustments; Filiatrault et al., 2007;Franchignoni et al., 2014) in youth with VI may be most valuable moving forward. Also, concerning the test-retest analyses, it is important to note that none of the participants (n = 8) were multimorbid. ...
Article
Full-text available
Falls are a significant medical and economical concern worldwide. Younger individuals with visual impairment (VI) may be more susceptible to falling and fall-related injuries when compared to peers without a VI. Self-perceived balance confidence is a psychological construct that may predict and/or mediate fall- or other health-related outcomes in youth with VI. However, extensive psychometric vetting of falls-related self-efficacy self-report inventories (such as the Activity-specific Balance Confidence Scale [ABC]) have not occurred in youth with VI. In line with classical test theory, the purposes of this study were to examine the immediate measurement properties of ABC scores in youth with VI and to derive and analyze a short version of the ABC in youth with VI (N=101). Total and item-level ABC (and the newly developed ABC-6VI) scores presented with strong-to-acceptable forms/levels of reliability and validity. ABC-6VI scores appear to have certain psychometric limitations (i.e., increased variability; decreased stability).
... Using big data to study construct validity could help to answer crucial questions in the field of individual differences. For example, when personality change is detected over a life span, is it how people express their personality that changes (i.e. it is an artefact of the measurement tool), or is it the internal structural biological mechanisms that change (Watson, 2004)? A common question on a personality questionnaire to measure extraversion using a likert scale is "rate how strongly you agree with the statement 'I am the life and soul of the party' on a seven point scale from 'strongly disagree' to 'strongly agree'. ...
... Thus, rather than having one measure of personality, having different behavioural measures of personality which are age and life stage specific, could reveal interesting insights about temporal stability in personality. If personality measures are better calibrated between age and life stages, personality may be more stable over time than current research suggests (Watson, 2004;Roberts et al., 2006;Specht et al., 2011). ...
Article
Full-text available
This thesis investigates what big data can add to the psychological study of human behaviour; and how Psychological theory can inform developments in machine learning models predicting human behaviour. It works through the difficulties that arise when the fields of machine learning and psychology meet. While machine learning models deal well with big datasets, they are designed for prediction, neglecting psychologists' desire to, not just predict, but understand behaviour. Psychology does well at using theory to specify models and explain the variance within a sample, yet can fail to consider how transferable the findings are to new samples. This research harnesses over a million loyalty card transaction records from a high-street health and beauty retailer linked to 12,968 questionnaire responses measuring demographics, shopping motivations, and individual differences. Equipped with real world behavioural records, and information on potential psychological and demographic drivers of behaviour, this thesis explores the ways in which psychological research can be undergone using big data to better understand three main areas: well-being, environmental behaviours, and anxiety symptoms. This thesis has the goal of marrying the strengths of traditional psychological methodology (utilising theoretical knowledge, quantifying uncertainty, and building interpretable models) with the exciting possibilities afforded by big data, all whilst ensuring that the models are generalisable and do not overfit. The following chapters discuss and evaluate novel research in this space, as well as the difficulties encountered, and compromises made, in undertaking `Big Data Psychology’.
... However, these attempts to set standards were criticized as suffering from poor theoretical basis (Charter & Feldt, 2001). Moreover, reliability of a given measurement is influenced by various factors such as (a) population characteristics, with greater heterogeneity resulting in higher reliability (Miller & Ulrich, 2013;Revelle & Condon, 2019); (b) the measured trait's characteristics, in which more stable constructs such as cognitive abilities are more reliable then less stable constructs, such as self-opinions (Conley, 1984); (c) time intervals between the two administrations, in which larger intervals reduce reliability (Revelle & Condon, 2019;Watson, 2004); and (d) behavior sample size, regarding, for example, the number of trials in a given task (Brysbaert, 2019). This multiplicity of sources that affect reliability questions the validity of dichotomous divisions such as "good" versus "bad," thus leaving the question above somewhat unanswered. ...
... However, merging the samples comes at the cost of mixing studies with quite different intersession intervals (a few days in Study 2 and 18 months in Study 3). In this regard, one may argue that the ICC in Study 2 represent dependability between the two sessions, rather than temporal stability of the measured construct (Watson, 2004). ...
Article
Cognitive tasks borrowed from experimental psychology are often used to assess individual differences. A cardinal issue of this transition from experimental to correlational designs is reduced retest reliability of some well-established cognitive effects as well as speed–accuracy trade-off. The present study aimed to address these issues by examining the retest reliability of various methods for speed–accuracy integration and by comparing between two types of task modeling: difference scores and residual scores. Results from three studies on executive functions show that (a) integrated speed–accuracy scoring is generally more reliable as compared with nonintegrated methods: mean response time and accuracy; and (b) task modeling, especially residual scores, reduced reliability. We thus recommend integrating speed and accuracy, at least for measuring executive functions.
... Test-retest reliability and stability both refer to the consistency of a measure's scores over time. Testretest reliability measures consistency of scores over a few days or weeks, while stability measures consistency of scores over a few months or longer (Watson, 2004). Both metrics are assessed with intraclass correlation coefficients (ICCs) or Pearson's r. ...
... Both metrics are assessed with intraclass correlation coefficients (ICCs) or Pearson's r. In general, higher values indicate higher test-retest reliability and stability, and it is expected that test-retest values are higher than stability values (Watson, 2004;Youngstrom et al., 2019). In line with past guidelines, we used the following criteria to rate measures' testretest reliability and stability: ICCs > .74 and Pearson's r > .70 are considered Excellent; ICCs = .59 ...
Article
Evidence-based assessment serves several critical functions in clinical child psychological science, including being a foundation for evidence-based treatment delivery. In this Evidence Base Update, we provide an evaluative review of the most widely used youth self-report measures assessing anxiety and its disorders. Guided by a set of evaluative criteria (De Los Reyes & Langer, 2018), we rate the measures as Excellent, Good, or Adequate across their psychometric properties (e.g., construct validity). For the eight measures evaluated, most ratings assigned were Good followed by Excellent, and the minority of ratings were Adequate. We view these results overall as positive and encouraging, as they show that these youth anxiety self-report measures can be used with relatively high confidence to accomplish key assessment functions. Recommendations and future directions for further advancements to the evidence base are discussed.
... By contrast, the trait describes an individual's relatively enduring disposition to be curious Kashdan & Steger, 2007;Silvia, 2008). The reported temporal stability of trait scores indexed by test-retest reliability (Boyle, 1979;Dai et al., 2021;Kashdan et al., 2018) justifies curiosity as a stable trait (Watson, 2004). The trait approach is suitable for studying curiosity as a catalyst for organizing collective actions as it takes time for collective actions to unfold and develop. ...
Article
Full-text available
Popular press and theoretical conjecture imply that curiosity is not just an individual motivation but also an enabler of collective actions. This study seeks to supervisors to manipulate team-level task structures, which primes certain forms of team regulatory focus and eventually affects team innovation. Two studies explicate curiosity as a catalyst for collective actions by examining team supervisors trait curiosity. We test the idea that trait curiosity predisposes team ’ using the interest/deprivation (I/D) taxonomy of curiosity revealed that, by predisposing supervisors to create more learning demand, I-type curiosity primes team promotion focus, which facilitates both radical and incremental team innovation. By predisposing supervisors to create more problem-solving demand, D-type curiosity arouses team prevention focus, which facilitates team incremental innovation but hinders radical innovation. The effect of supervisor curiosity is evident only when supervisors have high task authority. This study uncovered a powerful property of curiosity, demonstrating its promising contributions to organizational life.
... In the fifth and last step, we revisited the Prolific sample recruited in STEP 3 to readminister the N-BFI-20, assessed its test-retest reliability and probed its predictive validity (Clifton, 2020). In keeping with previous research (Donnellan et al., 2006;Gosling et al., 2003) and recommended practices (Watson, 2004) we made sure that for all participants a minimum interval of three weeks lay between their initial completion of the N-BFI and the readministration (Q 1 = 24 days, Q 3 = 24 days). Of the original 501 participants (see STEP 3), 449 completed the follow-up survey (M age = 31.46, ...
Article
Full-text available
Measurement is at the heart of scientific research. As many-perhaps most-psychological constructs cannot be directly observed, there is a steady demand for reliable self-report scales to assess latent constructs. However, scale development is a tedious process that requires researchers to produce good items in large quantities. In this tutorial, we introduce, explain, and apply the Psychometric Item Generator (PIG), an open-source, free-to-use, self-sufficient natural language processing algorithm that produces large-scale, human-like, customized text output within a few mouse clicks. The PIG is based on the GPT-2, a powerful generative language model, and runs on Google Colaboratory-an interactive virtual notebook environment that executes code on state-of-the-art virtual machines at no cost. Across two demonstrations and a preregistered five-pronged empirical validation with two Canadian samples (NSample 1 = 501, NSample 2 = 773), we show that the PIG is equally well-suited to generate large pools of face-valid items for novel constructs (i.e., wanderlust) and create parsimonious short scales of existing constructs (i.e., Big Five personality traits) that yield strong performances when tested in the wild and benchmarked against current gold standards for assessment. The PIG does not require any prior coding skills or access to computational resources and can easily be tailored to any desired context by simply switching out short linguistic prompts in a single line of code. In short, we present an effective, novel machine learning solution to an old psychological challenge. As such, the PIG will not require you to learn a new language-but instead, speak yours. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
... Hence, with health measures intervals of 1-2 weeks are typically used (Polit, 2014). This is often constrained by practicality and hence, test-retest within validation papers often represents intervals of convenience (Watson, 2004). ...
Article
Full-text available
The Revised Paranormal Belief Scale (RPBS) is the prevailing measure of supernatural credence. However, there exists only limited evidence to support the temporal stability and predictive validity of the instrument over time. Acknowledging this, the present study assessed the test–retest reliability of the RPBS using a large, heterogeneous sample across multiple trials. In addition, predictive validity was tested using a longitudinal statistical model, which focused on allied health outcomes (Perceived Stress and Somatic Complaints). A sample of 1,665 (Mage = 54.40, 853 females, 804 males, five non-binary and three not disclosing of gender) completed study measures at three time points separated by 2 month intervals. Prior to assessing temporal stability, assessment of structural validity and longitudinal invariance occurred. Test–retest reliability of the RPBS was in the moderate to high range across time intervals, and good internal consistency was observed. Furthermore, satisfactory stability coefficients existed for RPBS subfactors. Data-model fit for the predictive model was acceptable. Belief in the paranormal explained low variance over time in Perceived Stress and Somatic Complaints (between 2.4 and 4.2%). Findings supported the stability and reliability of the RPBS. In addition, they aligned with the notion that paranormal belief in the absence of high scores on cognitive-perceptual factors (e.g. transliminality and schizotypy), has a benign influence on perceived health.
... In addition, the recording of patient groups would be desirable. Although in both studies a total of more than 180 subjects were tested, the sample size is still below Watson's (2004) recommendation of at least 300 participants. In order to better quantify learning effects, several retests with different time intervals should be conducted. ...
Article
Full-text available
Objectives The Tower of London – Freiburg version (TOL-F) was developed in three parallel-test versions (A, B, and C) that only differ in their physical appearance by interchanged ball colors, but not in their cognitive demands. We addressed the question whether the test–retest reliability of an identical problem set differs from the parallel test–retest reliability of a structurally identical problem set with a marginally different physical appearance. Methods Reliabilities were assessed in two samples of young adults over a 1-week interval: In the parallel test–retest sample ( n = 93; 49 female), half of the participants accomplished version A at the first session and version B at the second session, while the other half started with version B in the first session and continued with A in the second session. In the identical test–retest sample ( n = 86; 48 female), half of the participants performed on version A in both the first and the second session, while the other half went through the same procedure with version B. Results For overall planning accuracy, intraclass correlation coefficients for absolute agreement were r = .501 for the parallel test–retest and r = .605 for the identical test–retest sample, with Pearson correlations of r = .559 and r = .708 respectively. Greatest lower bound estimates of reliability were adequate to high in the two samples (ranging between .765 and .854) confirming previous studies. Conclusions Although the TOL-F revealed only moderate intraclass correlations for absolute agreement, it showed some of the highest psychometric indices compared to repeated assessments with other TOL tests.
... It is worth noting that this idea represents an important change in how reliability is treated within the literature. Traditionally, researchers have argued that the reliability estimates that should be used for reliability adjustments are those in which the expected levels of test scores for respondents remain nearly stationary-that is, where the level of systematic change in the factors affecting respondent scores is negligible (Cattell & Tsujioka, 1964;Chmielewski & Watson, 2009;Gnambs, 2014;Watson, 2004). However, we argue that for the purpose of evaluating test similarity, the goal should be instead to equate the measurement intervals used in the numerator and denominator of Spearman's equation, so that the level of systematic change in the factors affecting respondent scores is matched. ...
Article
Determining whether different items provide the same information or mean the same thing within a population is a central concern when determining whether different scales or constructs are overlapping or redundant. In the present study, we suggest that retest-adjusted correlations provide a valuable means of adjusting for item-level unreliability. More exactly, we suggest dividing the estimated correlation between items X and Y measured over measurement interval |d| by the average retest correlations of the items over the same measurement interval. For instance, if we correlate scores from items X and Y measured 1 week apart, their retest-adjusted correlation is estimated by using their 1-week retest correlations. Using data from four inventories, we provide evidence that retest-adjusted correlations are significantly better predictors of whether two items are consensually regarded as "meaning the same thing" by judges than raw-score correlations. The results may provide the first empirical evidence that Spearman's (1904, 1910) suggested reliability adjustment do-in certain (perhaps very constrained!) circumstances-improve upon raw-score correlations as indicators of the informational or semantic equivalence of different tests. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
... While they approach the range of stabilities reported for personality traits (i.e., around r tt of 0.60 or 0.70, cf. Roberts & DelVecchio, 2000;Watson, 2004), they do leave room for individual change. The stability of authenticity was significantly lower than that of challenge, with a difference between r authenticity = 0.51 and r challenge = 0.67: z = 2.30, p = .011. ...
Article
Increased global competition and rapid technological advancements have dramatically altered organizational structures and the work environment. The Kaleidoscope Career Model (KCM) was developed to explain how individuals enact their careers within today's complex, dynamic workplace. The KCM is particularly relevant for studying career development activities, such as networking behavior, a key career management strategy. The purpose of this longitudinal study is to examine the relationship between the three parameters of the KCM – authenticity, balance, and challenge – and how individuals target their networking behavior. In addition, we examine the relationship between the KCM parameters and career success outcomes, and whether these outcomes are mediated by networking behavior. Alumni from a Midwestern U.S. university were surveyed in 2012 and again in 2019. Overall, the results of this study showed a link between the parameters of the KCM and how individuals target their networking behaviors to help achieve their career goals.
... Studies with longer intervals like two (Borkenau and Ostendorf, 1993) or four (Robins et al., 2001) years found lower reliabilities, and shorter two-week intervals found comparable results (Robins et al., 2001). A 6-week interval was chosen to reduce impacts of item or answer remembrance (which maybe occur in a 2-week interval) and have a more realistic view of stability and reflect true change with a minimized measurement error (Becker, 2000;Schmidt et al., 2003;Watson, 2004). The findings of the current study show a high level of robustness, without biases of different occasions separated by an interval where no rapid personality changes could be expected. ...
Article
Full-text available
Background In high-level sports, rapid screening and diagnostic instruments are necessary considering limited access that researchers have to these athletes. In the area of sport psychological diagnostics, the NEO-FFI is a promising tool to gain information about an athlete's personality traits. The current study investigated the NEO-FFI's scientific quality criteria and general application to elite-level soccer. Methods Personality traits of 378 elite-level soccer athletes were assessed using the NEO-FFI. Analysis focused on internal consistency, factor structure and gender differences. Additionally, a second measurement with a 6-week interval was conducted with a sub-sample of 86 athletes to analyse test-retest reliability. Results Overall, the results are in line with previous findings outside high-level sports. For the total sample, alpha-levels from 0.68 to 0.84 and intraclass correlation coefficients (ICC) for test-retest measures from 0.86 to 0.91 could be found. Item-level principal component analysis using both oblimin and oblique rotation showed better stability in neuroticism (N) and conscientiousness (C) than in extraversion (E), openness (O), and agreeableness (A). Gender differences could be found in values of internal consistency, ICC and NEO-FFI traits. Conclusion The results of this study demonstrate good transferability of the NEO-FFI from settings outside high-level sports into this specific niche of sport psychological assessment. However, the same weaknesses of the applied instrument in general populations were also replicated in the sporting population.
... In addition, some of the interviewers in the study by Buer Christensen et al. (2018) were inexperienced students. A notable limitation of the current test-retest examination is the small sample size (n = 30), which reduces the precision of the analyses to an extend that might be problematic for personality pathology measures (Watson, 2004). The width of the CIs of two subdomains was greater than .50, ...
Article
According to the alternative model for personality disorders (AMPD) of the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5), a moderate or greater impairment in personality functioning is the essential criterion for a personality disorder diagnosis. Personality functioning is operationalized in the Level of Personality Functioning Scale via 4 domains (identity, self-direction, empathy, and intimacy) and 2 higher order dimensions (self and interpersonal functioning). The current study examined the reliability (interrater, test-retest), structure, and validity (convergent, discriminant, and incremental) of the Structured Clinical Interview for the AMPD-Module I (SCID-5-AMPD-I). A clinical sample (n = 121) completed the SCID-5-AMPD-I, along with an interview for DSM-5 Section II personality disorders and self-reports for personality pathology (personality functioning, personality organization, personality structure, and pathological personality traits) and other forms of psychopathology (depression, anxiety, somatization, and general disability). Interrater and test-retest reliability was excellent for overall personality functioning, the higher order dimensions, and the domains, except for the empathy domain in the test-retest condition. Factor analyses suggest that personality functioning is an essentially unidimensional construct. Personality functioning demonstrated high convergence with other forms of personality pathology and showed good discriminant validity in relation to depression, anxiety, and somatization but not in relation to the broader construct of general disability. Personality functioning (Criterion A) showed incremental validity over pathological personality traits (Criterion B) in predicting interview-assessed DSM-5 Section II personality disorders but not in predicting self-reported personality and general psychopathology. The present study suggests that the SCID-5-AMPD-I is a viable measure for personality functioning. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
... Nevertheless, it is worth noting that other authors 8,29 have not assessed test-retest reliability either -when a construct is characterized precisely by its fluctuating course, as happens with delirium, test-retest reliability may be hazardous to assess. 34 The sample size was smaller than in the original study, since the estimation was carried out according to the recommendations for health-related questionnaires proposed by Terwee et al. 14 In any case, it has allowed diagnostic accuracy, even in smaller sample sizes subgroups, those with and without cognitive vulnerability. Finally, the number and timing of administrations by each nursing professional who applied the scale was not collected in detail, nor was the potntial impact of a learning curve considered. ...
Article
Background and objectives The 4AT scale is a sensitive tool for screening delirium, which can be applied rapidly in clinical settings without any specific training. It has not been translated, adapted, and validated to assess Spanish older adults. The aims of the study are: to translate and adapt to Spanish culture the 4AT scale, to present evidence of the diagnostic accuracy of this version (4AT-ES) when applied in non-specialized hospital wards, and to assess the loss of diagnostic accuracy in presence of risk factors. Methods A prospective sample was independently assessed on the 4AT-ES and the reference standard. One hundred and twenty-one inpatients (70+ years) for whom a psychiatric assessment was requested were included. Out of them, 50 were diagnosed with delirium. Nurses without specific training applied the 4AT-ES, and experienced psychiatrists cast the reference standard diagnosis (DSM-V criteria). Results Patients with delirium were older and had more risk factors (more previous delirium episodes, a higher likelihood of prior dementia/cognitive impairment) than controls. The 4AT-ES had excellent validity, sensitivity (96%) , and specificity (83.1%). The area under the curve was 0.918; in the subsample with any of those risk factors, its value did not decrease. Conclusion The 4AT-ES version of the 4AT scale was developed. When applied by non-specifically trained, nursing staff it showed excellent validity, sensitivity, and specificity, even in a subsample with previous risk factors. All indices were comparable to the original version. We recommend its use for efficient delirium screening in hospitalized older patients with suspected delirium.
... to .70) and did not include a twomonth interval for further comparison. Thus, while the exact interval that distinguishes between unreliability and true change is unclear (and perhaps impossible to resolve), we suggest-in line with previous research [e.g., 16,20,26]-that two weeks strikes a balance between mitigating both the possibility of true change and the likelihood that participants recall and repeat their responses from the first measurement. ...
Article
Full-text available
Despite the widespread use of the HEXACO model as a descriptive taxonomy of personality traits, there remains limited information on the test-retest reliability of its commonly-used inventories. Studies typically report internal consistency estimates, such as alpha or omega, but there are good reasons to believe that these do not accurately assess reliability. We report 13-day test-retest correlations of the 100- and 60-item English HEXACO Personality Inventory-Revised (HEXACO-100 and HEXACO-60) domains, facets, and items. In order to test the validity of test-retest reliability, we then compare these estimates to correlations between self- and informant-reports (i.e., cross-rater agreement), a widely-used validity criterion. Median estimates of test-retest reliability were .88, .81, and .65 ( N = 416) for domains, facets, and items, respectively. Facets’ and items’ test-retest reliabilities were highly correlated with their cross-rater agreement estimates, whereas internal consistencies were not. Overall, the HEXACO Personality Inventory-Revised demonstrates test-retest reliability similar to other contemporary measures. We recommend that short-term retest reliability should be routinely calculated to assess reliability.
... Participants from sample two were invited to respond to the survey once more, sixteen days later (sample three; Berchtold, 2016). This time lapse was chosen over the two-weeks suggested by Watson (2004), to enable participants to withdraw their data from the study, if they so wished and with the aim of preventing any impact from memory effects (Schmidt et al., 2003). Salient life events such as family bereavement can influence affective traits more so than the more stable characteristics of the Big Five (Vaidya et al., 2008), and sixteen days is a close enough timeframe in which the traits being measured are not expected to change, whereby the effect of contextual factors should be insignificant (Chmielewski & Watson, 2009). ...
Article
Full-text available
There has been an absence of consideration regarding measurement invariance across males and females in the widely available Dark Tetrad (DT) scales which measure psychopathy, Machiavellianism, narcissism and everyday sadism. This has resulted in criticisms of the measures, suggesting that the assessed constructs are not wholly relatable between the groups. This article documents the construction and validation of the Dark Side of Humanity Scale (DSHS), which measures dark personalities from an alternative viewpoint, determined by the constructs as they emerged from the male and female data, whilst aligning with theory and attaining invariance between sex. Across four samples (n = 2409), using a diverse range of statistical methods, including exploratory graph analysis, item response theory and confirmatory factor analysis, a divergence from the widely available DT measures emerged, whereby primary psychopathy and Machiavellianism were unified. This corroborated past research which had discussed the two constructs as being parallel. It further supported the DSHS with a shift away from the traditional DT conceptualisation. The resulting scale encompasses four factors which are sex invariant across samples and time. The first factor represents the successful psychopath, factor two addresses the grandiose form of entitlement, factor three taps into everyday sadism whilst the fourth factor pertains to narcissistic entitlement rage. Construct and external validity of the DSHS across two samples (n = 1338), as well as test-retest reliability (n = 413), was achieved. The DSHS provides an alternative approach to investigating the dark side of human nature, whilst also being sex invariant, thus making it highly suitable for use with mixed sex samples.
... and "excellent" reliability r ≥ 0.90. These thresholds were applied to qualify both ICR and TRR; however, we recognize that lower thresholds may be more suitable for qualifying TRR to the extent that the construct being measured is theorized to be more state-than trait-like (see Chmielewski & Watson, 2009;Watson, 2004). Similarly, different thresholds may be necessary when qualifying the reliability (both forms) of difference and residual scores given known lower reliability relative to constituent scores (Clayson et al., 2021;Meyer et al., 2017;Perkins et al., 2017). ...
Article
Addiction researchers are interested in the ability of neural signals, like the P3 component of the ERP, to index individual differences in liability factors like motivational reactivity to alcohol/drug cues. The reliability of these measures directly impacts their ability to index individual differences, yet little attention has been paid to their psychometric properties. The present study fills this gap by examining within-session internal consistency reliability (ICR) and between-session test–retest reliability (TRR) of the P3 amplitude elicited by images of alcoholic beverages (Alcohol Cue P3) and non-alcoholic drinks (NADrink Cue P3) as well as the difference between them, which isolates alcohol cue-specific reactivity in the P3 (ACR-P3). Analyses drew on data from a large sample of alcohol-experienced emerging adults (session 1 N = 211, 55% female, aged 18–20 yr; session 2 N = 98, 66% female, aged 19–21 yr). Evaluated against domain-general thresholds, ICR was excellent (M ± SD; r= 0.902 ± 0.030) and TRR was fair (r = 0.706 ± 0.020) for Alcohol Cue P3 and NADrink Cue P3, whereas for ACR-P3, ICR and TRR were poor (r = 0.370 ± 0.071; r = 0.201 ± 0.042). These findings indicate that individual differences in the P3 elicited by cues for ingested liquid rewards are highly reliable and substantially stable over 8–10 months. Individual differences in alcohol cue-specific P3 reactivity were less reliable and less stable. The conditions under which alcohol/drug cue-specific reactivity in neural signals is adequately reliable and stable remain to be discovered.
... Understanding the facet-level stability of the FFMRF is important for a variety of reasons. Demonstrating strong test-retest reliability, or dependability, is a critical property for trait measures (Watson, 2004). The level of test-retest reliability also caps the strength with which a measure can be associated with others. ...
Article
Full-text available
Abbreviated measures of personality have the promise of providing concise measurements of the broad domains. Nonetheless, few abbreviated instruments assess the lower-order traits within these models. The Five-Factor Model Rating Form (FFMRF) is one brief instrument that assesses 30 lower-order facets. Given a tradeoff of abbreviated measures is reduced measurement reliability and fidelity, it is important to determine how well the one-item indicators on the FFMRF fare as stand-alone measures. Using a sample of 530 young adults selected for externalizing psychopathology, we investigated the test–retest stability of the FFMRF over a 12-month period, as well as the correspondence between four specific facets of the FFMRF and conceptually linked impulsigenic facets from the UPPS. The FFMRF facets had a median test–retest correlation of 0.47 over 12 months. These values suggest reasonable measurement precision but are approximately 0.10 to 0.20 lower than lengthier measures, quantifying a tradeoff for brief forms. Further, the four FFMRF facets correlated between 0.43 and 0.63 with the relevant UPPS scales and in each case the convergent value was significantly greater than the discriminant ones. The present findings quantify the stability and validity of the FFMRF facet scales, which helps to contextualize the speed/accuracy tradeoffs of brief personality measures.
... Some studies have not used validated measures of personality, whereas others assessed personality at intervals very close to the onset of brain damage, without longitudinal follow-up, and without sufficient time for the patient to experience and integrate major life changes (e.g., loss of occupation, placement in an assisted living facility). Given that personality is fairly stable but is also influenced by major life events (Roberts et al., 2006;Vaidya et al., 2002;Watson, 2004), an important question is whether patients with amnesia would be capable of updating their self-ratings of personality in response to these sorts of significant life changes. Results from a small number of case studies that have attempted to address this question are equivocal. ...
Article
Little is known about the role of declarative memory in the ongoing perception of one’s personality. Seven individuals who developed a rare and severe type of anterograde amnesia following damage to their medial temporal lobes were identified from our neurological patient registry. We examined the stability of their personality ratings on the Big Five Inventory over five retest periods and assessed the accuracy of their ratings via analyses of self–caregiver agreement. The patients portrayed a stable sense of self over the course of 1 year. However, their self-ratings differed from those provided by the caregivers. Intriguingly, these discrepancies diminished when caregivers retrospectively rated the patients’ personalities prior to their brain injury, suggesting that patients’ perceptions of themselves were stuck in the past. We interpret our findings to indicate that the ability to form new declarative memories is not required for maintaining a stable sense of self but may be important for updating one’s sense of self over time.
... (Reese, et al., 2011; . პირველს უოტსონი (Watson, 2004) ტესტ-რეტესტულ სანდოობას ანუ "საიმედოობას" უწოდებს, მეო-რეს კი -"სტაბილურობას", თუმცა, რას მივიჩნევთ "საკმარისად", მთლიანად გან-სახილველი კონსტრუქტის ბუნებაზეა დამოკიდებული. კონსტრუქტში ვგულისხ-მობ იმას, თუ პიროვნების რომელი მახასიათებლის სტაბილურობა-ცვალებადო-ბაზეა საუბარი. ...
... (Reese, et al., 2011; . პირველს უოტსონი (Watson, 2004) ტესტ-რეტესტულ სანდოობას ანუ "საიმედოობას" უწოდებს, მეო-რეს კი -"სტაბილურობას", თუმცა, რას მივიჩნევთ "საკმარისად", მთლიანად გან-სახილველი კონსტრუქტის ბუნებაზეა დამოკიდებული. კონსტრუქტში ვგულისხ-მობ იმას, თუ პიროვნების რომელი მახასიათებლის სტაბილურობა-ცვალებადო-ბაზეა საუბარი. ...
Book
Full-text available
(Written in Georgian Language) The presented book consists of nine chapters presented in three parts. The first part discusses the theoretical framework and all main concepts, constructs and approached I employ in my research. The second part discusses the ongoing process of life storytelling. It answers the following questions: how, when and why do we start telling our stories? And how do we manage to grow, develop, enhance and change ourselves and at the same time to maintain the sense of self-continuity. The third part of the book tells the story of ongoing transformations in the context of complex person-culture interaction. Hence, next chapters overview the master narrative theoretical framework recently proposed by McLean and Syed (2016). In particular, the criteria and types of master narrative as well as the concept of alternative master narrative and its variations are discussed. The last chapter brings us to the issue of interrelation of master and alternative narratives and generativity,. The epilogue sums up the book underlying the place and importance of studying life narratives in personality and identity studies.
... Test-retest reliability and stability, respectively, refer to the consistency of a measure's scores across over a few days or weeks, and a few months or longer (Watson, 2004). We rated measures' test-retest reliability and stability based on commonly used benchmarks: intraclass correlations (ICCs) > .74 and Pearson's r > .70 are considered Excellent; ICCs = .59-.74 and Pearson's r = .50-.70 are considered Good; and ICCs = .40-.58 and Pearson's r = .30-.50 are considered Adequate (e.g., Cohen, 2013;Landis & Koch, 1977; also see Etkin et al., 2021). ...
Article
This Evidence Base Update of parent-report measures of youth anxiety symptoms is a companion piece to our update on youth self-report anxiety symptom measures (Etkin et al., 2021). We rate the psychometric properties of the parent-report measures as Adequate, Good, or Excellent using criteria developed by Hunsley and Mash (2008) and Youngstrom et al. (2017). Our review reveals that the evidence base for parent-report measures is considerably less developed compared with the evidence base for youth self-report measures. Nevertheless, several measures, the parent-report Screen for Child Anxiety-Related Emotional Disorders, Multidimensional Anxiety Scale for Children, and Spence Children's Anxiety Scale, were found to have Good to Excellent psychometric properties. We conclude our review with suggestions about which parent-report youth anxiety measures are best suited to perform different assessment functions and directions for additional research to expand and strengthen the evidence base.
... to .70) and did not include a twomonth interval for further comparison. Thus, while the exact interval that distinguishes between unreliability and true change is unclear (and perhaps impossible to resolve), we suggest-in line with previous research [e.g., 16,20,26]-that two weeks strikes a balance between mitigating both the possibility of true change and the likelihood that participants recall and repeat their responses from the first measurement. ...
Preprint
Full-text available
The HEXACO model of personality is currently one of the most widely-used in its field. While numerous studies report HEXACO facet and domain alpha reliabilities (), few have examined its test-retest reliability (rTT)—a fundamental property of psychological tests. We thus report 12-day rTT of the 100-item HEXACO-PI-R domains, facets, and items and compare these to (of the former two) and cross-rater agreement (rCA). Median rTTs were r = .65, .81, and .88 (n = 416) for items, facets, and factors, respectively, supporting the scale’s reliability. Meanwhile, facet rCA was highly correlated with rTT but unrelated to . These results indicate that HEXACO-PI-R demonstrates rTT similar to other contemporary measures, and rTT data should be routinely collected for scales.
... Older studies were based on the investigation of manifest variables using test-retest correlations and repeated measures ANOVA (e.g., Arsenian, 1970;Costa et al., 2000;Crawford et al., 1986;Gustavsson et al., 1997;Watson & Walker, 1996). As mentioned above, these results can be distorted by measurement error (Borghuis et al., 2017;Watson, 2004). ...
Preprint
Full-text available
The occurrence of major life events is associated with changes in mental health, well-being, and personality. To better understand these effects, it is important to consider how individuals perceive major life events. Although theories such as Appraisal Theory and Affective Adaptation Theory suggest that event perceptions change over time and that these changes are relevant for personality and well-being, stability and change of the perceptions of major life events have not been systematically examined. The present paper aims to fill this gap using data from a longitudinal study (N = 619 at T1). In this study, participants rated nine characteristics of the same major life event up to five times within one year with the Event Characteristics Questionnaire. We estimated rank-order and mean-level stabilities as well as intraclass correlations of the nine life event characteristics with continuous time models. Furthermore, we computed continuous time models for the stability of affective well-being and the Big Five personality traits to generate benchmarks for the interpretation of the stability coefficients. Rank-order stabilities of the life event characteristics were lower than for the Big Five, but higher than for affective well-being. Furthermore, we found significant mean-level changes for the life event characteristics extraordinariness, change in world views and external control. Most of the variance in life event characteristics was explained by between-person differences. Future research should examine whether these changes in perceived event characteristics are associated with changes in other constructs and which factors contribute to the stability and change of perceived event characteristics.
... Step 3. Conduct retest analyses of items with adequate qualitative and quantitative properties Test-retest correlations over short spans are particularly good indicators of item quality: for an item to provide reliable and useful information, raters first have to answer it consistently in the short-run --that is, they have to be able to agree with themselves on the content of the item. The retest interval can be a couple of months (Watson, 2004), a couple of weeks (Mõttus, Sinick et al., 2019;Soto & John, 2017), a couple of days (Wood et al., 2010), or even a couple of minutes (Lowman et al., 2018;Wood et al., 2018). What makes these estimates so valuable is that they are particularly good predictors of many standard indicators of item validity simultaneously, such as self-other agreement correlations and stability correlations over longer time periods (McCrae, Kurtz, Yamagata, Terracciano, 2011;Henry & Mõttus, 2020), while also being estimable for single items. ...
Article
Full-text available
In pursuit of a more systematic and comprehensive framework for personality assessment, we introduce procedures for assessing personality traits at the lowest level: nuances. We argue that constructing a personality taxonomy from the bottom up addresses some of the limitations of extant top-down assessment frameworks (e.g., the Big Five), including the opportunity to resolve confusion about the breadth and scope of traits at different levels of the organization, evaluate unique and reliable trait variance at the item level, and clarify jingle/jangle issues in personality assessment. With a focus on applications in survey methodology and transparent documentation, our procedures contain six steps: (1) identification of a highly inclusive pool of candidate items, (2) programmatic evaluation and documentation of item characteristics, (3) test-retest analyses of items with adequate qualitative and quantitative properties, (4) analysis of cross-ratings from multiple raters for items with adequate retest reliability, (5) aggregation of ratings across diverse samples to evaluate generalizability across populations, (6) evaluations of predictive utility in various contexts. We hope these recommendations are the first step in a collaborative effort to identify a comprehensive pool of personality nuances at the lowest level, enabling subsequent construction of a robust hierarchy – from the bottom up.
... For all subscales other than self-motivation, values equaled or exceeded r = .78. The value for self-motivation is lower than would be expected for a trait measure (Chmielewski & Watson, 2009;Watson, 2004). ...
Article
Research on self-reported executive functioning (EF) and personality has largely focused on normative personality traits. While previous research has demonstrated that maladaptive personality traits are associated with performance-based EF, the literature examining the relationship between these traits and self-reported EF is limited. The current study examined the relationship between multiple domains of self-reported EF (Barkley Deficits in Executive Functioning Scale) and both normative (The International Personality Item Pool–NEO–120 Item [IPIP-120]) and maladaptive (Personality Inventory for DSM-5–Short Form [PID-5-SF]) personality traits in an undergraduate student sample ( n = 354). Similar to past research, relationships were largest across EF domains for both measures related to neuroticism (i.e., IPIP-120 neuroticism and PID-5-SF negative affectivity) and conscientiousness (i.e., IPIP-120 conscientiousness and PID-5-SF disinhibition). Normative personality traits generally accounted for greater variance in EF when examined alone and were also generally associated with greater incremental validity when compared with maladaptive personality traits. However, multiple regression analyses indicated that maladaptive personality traits added unique predictive variance above and beyond normative personality traits in their association with multiple domains of EF. These results highlight the utility of assessing both normative and maladaptive personality traits as well as multiple domains of EF to more fully understand the relationship between personality and EF.
... A small number of studies have directly investigated how the number of days between tests impacts test-retest and alternate forms reliability coefficients. Watson (2004) examined how a 2-month versus 2½-year retesting period impacted test-retest coefficients for the Big Five and trait affectivity instruments using data from 392 to 462 psychology students. He found that coefficients tended to be higher with a shorter time interval. ...
Article
An essential question when computing test–retest and alternate forms reliability coefficients is how many days there should be between tests. This article uses data from reading and math computerized adaptive tests to explore how the number of days between tests impacts alternate forms reliability coefficients. Results suggest that the highest alternate forms reliability coefficients were obtained when the second test was administered at least 2 to 3 weeks after the first test. Even though reliability coefficients after this amount of time were often similar, results suggested a potential tradeoff in waiting longer to retest as student ability tended to grow with time. These findings indicate that if keeping student ability similar is a concern that the best time to retest is shortly after 3 weeks have passed since the first test. Additional analyses suggested that alternate forms reliability coefficients were lower when tests were shorter and that narrowing the first test ability distribution of examinees also impacted estimates. Results did not appear to be largely impacted by differences in first test average ability, student demographics, or whether the student took the test under standard or extended time. It is suggested that for math and reading tests, like the ones analyzed in this article, the optimal retest interval would be shortly after 3 weeks have passed since the first test.
... Step 3. Conduct retest analyses of items with adequate qualitative and quantitative properties Test-retest correlations over short spans are particularly good indicators of item quality: for an item to provide reliable and useful information, raters first have to answer it consistently in the short-run --that is, they have to be able to agree with themselves on the content of the item. The retest interval can be a couple of months (Watson, 2004), a couple of weeks (Mõttus, Sinick et al., 2019;Soto & John, 2017), a couple of days (Wood et al., 2010), or even a couple of minutes (Lowman et al., 2018;Wood et al., 2018). What makes these estimates so valuable is that they are particularly good predictors of many standard indicators of item validity simultaneously, such as self-other agreement correlations and stability correlations over longer time periods (McCrae, Kurtz, Yamagata, Terracciano, 2011;Henry & Mõttus, 2020), while also being estimable for single items. ...
Preprint
In pursuit of a more systematic and comprehensive framework for personality assessment, we introduce procedures for assessing personality traits at the lowest level: nuances. We argue that constructing a personality taxonomy from the bottom up addresses some of the limitations of extant top-down assessment frameworks (e.g., the Big Five), including the opportunity to resolve confusion about the breadth and scope of traits at different levels of organization, evaluate unique and reliable trait variance at the item level, and clarify jingle/jangle issues in personality assessment. With a focus on applications in survey methodology and transparent documentation, our procedures contain six steps: (1) identification of a highly inclusive pool of candidate items, (2) programmatic evaluation and documentation of item characteristics, (3) test-retest analyses of items with adequate qualitative and quantitative properties, (4) analysis of cross-ratings from multiple raters for items with adequate retest reliability, (5) aggregation of ratings across diverse samples to evaluate generalizability across populations, (6) evaluations of predictive utility in various contexts. We hope these recommendations are the first step in a collaborative effort to identify a comprehensive pool of personality nuances at the lowest level, enabling subsequent construction of a robust hierarchy -- from the bottom up.
... Using this interpretation, it can be concluded that the multiple choice questions in STEM Knowledge Test has good reliability. The lower value obtained compared to other SEMARAK components might be caused by the multidimensional nature of the test (Seybert & Becker, 2019) and inappropriate retest intervals (Watson, 2004). ...
Conference Paper
Full-text available
Purpose-This study aims to determine the validity and reliability of an integrated-STEM water rocket module (known as SEMARAK). SEMARAK is an abbreviation for SElangkah Menuju Angkasa Raya Kita (A Step Towards Our Outer Space). SEMARAK is designed to align with the Kurikulum Standard Sekolah Menengah (KSSM), whereas the module includes selected science and mathematics subjects learning standards; ranging from Form One to Form Five learning standards. In addition to inquiry-based instruction, SEMARAK is equipped with 3 instruments that enable the instructor to measure the application of STEM knowledge, skill and value among pupils under their supervision. Methodology-SEMARAK was developed following the Sidek Module Development Model. A pilot test was conducted to determine the reliability of SEMARAK, which involved 34 Form Three pupils from a school in Kuala Lumpur who had just completed their PT3 assessment. The module was carried out in four 2-hours sessions which were designed according to the Engineering Design Process (EDP). The participants' skill (designing skill) and value (innovation and creativity) were assessed using two adapted rubrics throughout the implementation of the module. At the end of the instruction, a STEM knowledge test was administered and the participants' feedback on the module's usability was collected via questionnaire. In addition, experts with various STEM expertise were appointed to evaluate the validity of the module. Findings-SEMARAK perceived high reliability (α=0.943) and received very satisfactory content validity index (CVI). These showed that SEMARAK can be utilized to inculcate STEM knowledge, designing skills and creativity and innovation value among pupils while using water rocket activity as a medium for STEM teaching and learning approach. Significance-The development of SEMARAK can mitigate the shortages of STEM teaching and learning resources for Malaysian teachers while allowing the reinforcement of various KSSM learning standards during the organization of water rocket activity.
... In clinical diagnosis and prognosis, retest reliability is a prerequisite for accurate assessment and the monitoring of interventions over time. Despite the importance of retest reliability, it is rarely measured and reported, a situation that is not unique to cognitive hearing science (Watson, 2004). ...
Article
Full-text available
In this article, we consider the issue of reproducibility within the field of cognitive hearing science. First, we examine how retest reliability can provide useful information for the generality of results and intervention effectiveness. Second, we provide an overview of retest reliability coefficients within three areas of cognitive hearing science (cognition, speech perception, and self-reported measures of communication) and show how the reporting of these coefficients differs between fields. We argue that practices surrounding the provision of retest coefficients are currently most rigorous in clinical assessment and that basic science research would benefit from adopting similar standards. Finally, based on a distinction between direct replications (which aim to keep materials as close to the original study as possible) and conceptual replications (which test the same purported mechanism using different materials), we discuss new initiatives which address the need for both. Using the example of the auditory Stroop task, we provide practical illustrations of how these theoretical issues can be addressed within the context of a multi-lab replication study. By illustrating how theoretical concepts can be put into practice in empirical research, we hope to encourage others to set up and participate in a wide variety of reproducibility-related studies.
... The distinction between relative and absolute reliability is especially useful here (see also Baumgartner, 1989;Hallman, Srinivasan, & Mathiassen, 2015;Maestri et al., 2009;Weir, 2005): Relative reliability (also known as rank order consistency; e.g., Clarke & Clarke, 1984;Roberts & DelVecchio, 2000;Watson, 2004) refers to the extent to which people retain their within-sample rank for repeated measurements (Atkinson & Nevill, 1998;Baumgartner, 1989;Bruton, Conway, & Holgate, 2000;Sole et al., 2007;Weir, 2005). In contrast, absolute reliability informs about "the degree to which repeated measurements vary for individuals" (Atkinson & Nevill, 1998, p. 219), regardless of the individual's relative position. ...
Article
Research on heart rate variability (HRV) received increasing attention. This study analysed the reliability of the most common HRV parameters for baseline measurements. 103 healthy students (83 women, M = 21.72±3.31 years) participated in five short-term HRV sessions, each including supine, sitting, and standing positions, respectively, spanning a time interval of eleven months. Relative reliability was evaluated by intraclass correlation coefficients, and absolute reliability by standard errors of measurement, smallest real differences, and 95 % limits of random variation. No systematic mean differences between measurements emerged. Intraclass correlation coefficients were quite low (supine: .49–.64, sitting: .40–.57, standing: .35–.56). Absolute reliability indicators revealed pronounced variations between test and retest. Influences of posture and time between measurements on reliability were small and unsystematic. We conclude that such high levels of within-subjects variability in HRV measurements (a) hamper the detection of changes over time, and (b) should be considered carefully in future analyses.
... Accurately ascertaining a CD history when diagnosing ASPD in adults may be challenging due to often having to assess CD retrospectively, as the precise details of past cognitive states, emotions, and behaviors may be difficult to recollect accurately (Robinson & Clore, 2002;Watson, 2004). Unfortunately, relying on retrospective recall of CD behaviors may be common and necessary for ASPD diagnosis in many cases, as a prior documented history of CD-relevant behaviors or lack therefore may not be available for many adults being assessed. ...
Article
Many studies have examined the correlates and factor structure of the conduct disorder (CD) criteria in child and adolescent samples, finding that the set of behaviors defining CD are heterogeneous in nature. However, the factor structure of the CD criteria has not been examined in adults, even though the CD criteria often must be assessed retrospectively when diagnosing antisocial personality disorder (ASPD) in adulthood. To advance understanding of assessing CD behaviors retrospectively in adults, we present factor analytic and correlational results from a large sample of adult outpatients (N = 1,793). Our results indicate that CD ratings are defined by at least two latent factors of Rule Violation (e.g., curfew violations) and Aggression (e.g., using a weapon). Ratings of aggressive behaviors tended to show somewhat stronger associations with other externalizing psychopathology than did rule violation ratings. Furthermore, CD dimensions identified in our factor analyses correlated robustly with ASPD but correlated just as strongly with diagnostic ratings of other externalizing psychopathology such as substance use history. We discuss how these findings from a large adult sample parallel results from child and adolescent samples indicating that the CD criteria are defined by distinct dimensions. Furthermore, we interpret these findings within the context of other recent studies suggesting that a CD history may not be a specific precursor to ASPD in adulthood.
... Accurately ascertaining a CD history when diagnosing ASPD in adults may be challenging due to often having to assess CD retrospectively, as the precise details of past cognitive states, emotions, and behaviors may be difficult to recollect accurately (Robinson & Clore, 2002;Watson, 2004). Unfortunately, relying on retrospective recall of CD behaviors may be common and necessary for ASPD diagnosis in many cases, as a prior documented history of CD-relevant behaviors or lack therefore may not be available for many adults being assessed. ...
Article
Full-text available
Many studies have examined the correlates and factor structure of the conduct disorder (CD) criteria in child and adolescent samples, finding that the set of behaviors defining CD are heterogeneous in nature. However, the factor structure of the CD criteria has not been examined in adults, even though the CD criteria often must be assessed retrospectively when diagnosing antisocial personality disorder (ASPD) in adulthood. To advance understanding of assessing CD behaviors retrospectively in adults, we present factor analytic and correlational results from a large sample of adult outpatients (N = 1,793). Our results indicate that CD ratings are defined by at least two latent factors of Rule Violation (e.g., curfew violations) and Aggression (e.g., using a weapon). Ratings of aggressive behaviors tended to show somewhat stronger associations with other externalizing psychopathology than did rule violation ratings. Furthermore, CD dimensions identified in our factor analyses correlated robustly with ASPD but correlated just as strongly with diagnostic ratings of other externalizing psychopathology such as substance use history. We discuss how these findings from a large adult sample parallel results from child and adolescent samples indicating that the CD criteria are defined by distinct dimensions. Furthermore, we interpret these findings within the context of other recent studies suggesting that a CD history may not be a specific precursor to ASPD in adulthood.
... However, assessment of reinforcement sensitivity was only done through self-report methods. As such, some findings may have been partially impacted by other factors associated with clinical severity, such as limited insight, current mood and response style (Chmielewski & Watson, 2009;Klein et al., 2011;Watson, 2004). Future meta-analyses may account for these concerns by including behavioral (e.g., Millgram et al., 2019;Treadway et al., 2012) and biological measures (e.g., DelDonno et al., 2015) in their assessments of reinforcement sensitivity. ...
... To test construct stability, we examined the frequency of weight-control dieting in both waves of data. Out of those who dieted at Wave 3, as many as 65% reported being on a weight-control diet at Wave 2. This suggests high stability (Watson, 2004) over the course of 10 years. To test item validity, we tested the relationships between weight-control dieting and: (1) Perceptions of overweight: Dieters had significantly higher overweight perception than non-dieters. ...
Article
Background: Despite the ever-growing literature on weight-control diets, data about dieting among older adults are scarce. Purpose: To describe the prevalence of weight-control dieting across age groups and weight statuses (from healthy-weight to overweight and obese). To identify cross-sectional associations of perceived health and perceived overweight status with dieting among older adults. Methods: Secondary analyses of the second and third waves of the Midlife in the US study (MIDUS). Sample included 2588 participants (40-93 years old, 54.5% females, age = 64.4 ± 11.1 years, BMI = 28.3 ± 5.9 kg/m2). Logistic regressions were used to predict dieting across age groups (independent variables: BMI, perceived health, perceived overweight status; covariates: BMI change, education, age, race). Results: As many as 15% of participants had reported dieting during the previous year. Older age was associated with less dieting among healthy weight (p = .02) and overweight (p < .001) participants, but not among participants with obesity (p = .36). Among participants younger than 75, overweight perception (vs. healthy-weight perception) was linked with higher likelihood for dieting (40-55 years: OR = 3.94[1.70-9.1]; 55-65 years: OR = 4.11[1.91-8.82]; 65-75 years: OR = 4.50[1.90-10.65]). Nevertheless, among participants older than 75, excellent (vs. good/fair/poor) perceived health was linked with higher likelihood of dieting (good vs. excellent: OR = 0.29[0.09-0.87]; fair/poor vs. excellent: OR = 0.12[0.03-0.54]). Conclusions: Older age is associated with less weight-control dieting among people without obesity. Although overweight perception may have a stronger impact on dieting during younger age, health perception may have a stronger impact on dieting during older age, suggesting that the motivation behind weight-control diets may potentially change throughout the adult lifespan.
Preprint
Full-text available
Introduction: The loss of splenic function is associated with an increased risk of infection in sickle cell disease (SCD); however, spleen function is rarely documented among SCD patients in Africa, due partly to the non-availability of sophisticated techniques such as scintigraphy. Methods of assessing splenic function which may be achievable in resource-poor settings include counting red blood cells (RBC) containing Howell Jolly Bodies (HJB) and RBC containing silver-staining (argyrophilic) inclusions (AI) using a light microscope. We evaluated the presence of HJB - and AI - containing RBC as markers of splenic dysfunction among SCD patients in Nigeria. Methods: We prospectively enrolled children and adults with SCD in steady state attending outpatient clinics at a tertiary hospital in North-East Nigeria. The percentages of HJB - and AI-containing red cells were estimated from peripheral blood smears and compared to normal controls. Results: There were 182 SCD patients and 102 healthy controls. Both AI- and HJB-containing red cells could be easily identified in the participants blood smears. SCD patients had a significantly higher proportion of red cells containing HJB (1.5%; IQR 0.7% - 3.1%) compared to controls (0.3%; IQR 0.1% - 0.5%) (P = 0.0001). The AI red cell counts were also higher among the SCD patients (47.4%; IQR 34.5% - 66.0%) than the control group (7.1%; IQR 5.1% - 8.7%) (P = 0.0001). The intra-observer reliability for assessment of HJB- (R = 0.92; R2 = 0.86) and AI- containing red cells (R = 0.90; R2 = 0.82) was high. The estimated intra-observer agreement was better with the HJB count method (95% limits of agreement, -4.5 to 4.3; P = 0.579). There were 182 SCD patients and 102 healthy controls. SCD patients had a significantly higher proportion of red cells containing HJB compared to controls (median 1.5% vs 0.3% respectively; P = 0.0001). The AI red cell counts were also higher among the SCD patients than the controls (median 47.5% vs 7.1% respectively; P = 0.0001). The intra-observer reliability for assessment of HJB- (R = 0.92; R2 = 0.86) and AI- containing red cells (R = 0.90; R2 = 0.82) was high. The estimated intra-observer agreement was better with the HJB count method (95% limits of agreement, -4.5 to 4.3; P = 0.579). Conclusion: We have demonstrated the utility of light microscopy in the assessment of red cells containing - HJB and AI inclusions as indices of splenic dysfunction in Nigerian SCD patients. These methods can be easily applied in the routine evaluation and care of patients with SCD to identify those at high risk of infection and initiate appropriate preventive measures.
Article
The Self-Rating Scale (SRS; Hooley et al., 2010), a widely used measure of self-criticism in self-injury research, did not utilize conventional test development methods and has limited psychometric data. We examined the internal consistency, test–retest reliability, and convergent and discriminant validity of the SRS. Participants were 295 psychology undergraduate students. The SRS demonstrated good internal consistency (α = .93), adequate test–retest reliability ( r = .76), and satisfactory convergent validity with other measures of self-criticism. Convergent validity was also adequate for expected dimensions of perfectionism (socially prescribed, self-oriented, concerns about mistakes, and doubts about actions), depressive symptoms, and negative and positive affect. The SRS demonstrated adequate discriminant validity with expected constructs of perfectionism (other-oriented, personal standards, and organizational perfectionism). Although the SRS appears to be a psychometrically sound measure of self-criticism, high correlations with depression and perfectionism raise questions regarding the overlap of these constructs.
Article
Using representative panel data sets from the Netherlands and Germany, this study analyzes the long-term stability of two behaviorally validated measures on individuals’ forward-looking attitude: the consideration of future consequences scale and two ultra-short survey items on patience and impulsiveness. Overall, their intra-individual correlation is sufficiently high to consider the measures as stable, and a comprehensive list of live events does not correlate with their instability. Past events have only a small and time-restricted effect, if any. Robustness tests indicate that measurement errors seem to be the most likely reason for their instability. Although these findings mitigate endogeneity concerns, error-in-variable biases can, nevertheless, be substantial.
Article
The aim of this study was to explore the moderating role of organizational climate on the relationship between psychological hardiness and adherence to criminal investigation procedure (ACIP). A total of 403 Nigerian police investigators selected purposefully from the headquarters of Criminal Investigation and Intelligence Department (CIID) in the five South Eastern states of Nigeria were assessed on the predictor variable psychological hardiness, moderating variable organizational climate, and on the outcome variable ACIP using self-report questionnaire. The result of Moderated multiple regression analysis showed that psychological hardiness and organizational climate predicted the variance in ACIP. Similarly, organizational climate moderated the relationship between psychological hardiness and ACIP. The findings show that both psychological hardiness and organization climate are relevant factors for enhancing ACIP among investigating police officers.
Article
Self-transcendence is thought to increase well-being and is implicitly promoted in contextual cognitive behavioral therapies (CCBTs). This study conceptualizes, develops, and validates the first comprehensive CCBT-informed self-transcendence questionnaire. Using a CCBT-informed theory, we propose four self-transcendence facets: distancing oneself from mental content, distinguishing an observer of mental experience that is separate from the content of experience, experiencing innate connectedness with other beings, and noticing the constantly changing nature of experience. We measured these facets with items from existing relevant questionnaires and novel, expert-informed items. Exploratory factor analyses and bifactor exploratory structural equation models supported the first three of these facets. Those factors evidenced convergent validity with decentering, defusion, experiential avoidance, and mindfulness, and criterion and incremental validity in predicting psychological well-being. Our findings support a CCBT-informed model of self-transcendence, introduce the first instrument to comprehensively measure the self-transcendence facets we identified, indicate links with well-being, and suggest future intervention targets.
Preprint
Full-text available
This is the Intro, Content, Reference list and Summary of the monography telling the stories of narrative identity developments in the midst of transformative experiences through the negotiating with broader cultural narratives. For English summary see the file.
Article
It has recently been demonstrated that metrics of structural validity are severely underreported in social and personality psychology. We comprehensively assessed structural validity in a uniquely large and varied data set ( N = 144,496 experimental sessions) to investigate the psychometric properties of some of the most widely used self-report measures ( k = 15 questionnaires, 26 scales) in social and personality psychology. When the scales were assessed using the modal practice of considering only internal consistency, 88% of them appeared to possess good validity. Yet when validity was assessed comprehensively (via internal consistency, immediate and delayed test-retest reliability, factor structure, and measurement invariance for age and gender groups), only 4% demonstrated good validity. Furthermore, the less commonly a test was reported in the literature, the more likely the scales were to fail that test (e.g., scales failed measurement invariance much more often than internal consistency). This suggests that the pattern of underreporting in the field may represent widespread hidden invalidity of the measures used and may therefore pose a threat to many research findings. We highlight the degrees of freedom afforded to researchers in the assessment and reporting of structural validity and introduce the concept of validity hacking ( v-hacking), similar to the better-known concept of p-hacking. We argue that the practice of v-hacking should be acknowledged and addressed.
Article
Full-text available
This review organizes a variety of phenomena related to emotional self-report. In doing so, the authors offer an accessibility model that specifies the types of factors that contribute to emotional self-reports under different reporting conditions. One important distinction is between emotion, which is episodic, experiential, and contextual, and beliefs about emotion, which are semantic, conceptual, and decontextualized. This distinction is important in understanding the discrepancies that often occur when people are asked to report on feelings they are currently experiencing versus those that they are not currently experiencing. The accessibility model provides an organizing framework for understanding self-reports of emotion and suggests some new directions for research.
Article
Full-text available
Forgivingness is the disposition to forgive interpersonal transgressions over time and across situations. There is currently no acceptable measure of forgivingness for use in testing theoretical propositions. The authors describe a five-item scenario-based scale, the Transgression Narrative Test of Forgivingness (TNTF). In five studies examining 518 university students from three disparate universities, the authors assess the item and full-scale functioning of the TNTF and its concurrent and 8-week predictive validity relative to trait anger, rumination, neuroticism, agreeableness, and hostility. Test-retest reliability and stability of item locations were both good. Norms are presented by gender, ethnicity, and religious activity. The TNTF is a brief measure of forgivingness that is not theory dependent and is therefore useful in basic and intervention research from a variety of theoretical perspectives.
Article
Full-text available
Recent research has moved beyond the mere documentation of implicit stereotypes to consider how these measures relate to attitudes and predict behaviors. Little is known, however, about the basic psychometric properties of these measures. The present research includes three studies that provide evidence for test-retest reliability of implicit stereotypes when supraliminal priming of associated traits precedes a group categorization decision (Experiments 1 and 2) and when subliminal presentation of a group member precedes a decision about trait applicability (Experiment 3). Across the studies, significant evidence of implicit racial and gender stereotyping was obtained. These effects showed moderate test-retest reliability of comparable levels from 1 hour to 3 weeks. Implications of these findings for the use of implicit measures are considered.
Article
Full-text available
This series of studies describes the development of a measure of emotional intelligence based on the model of emotional intelligence developed by Salovey and Mayer [Salovey, P. & Mayer, J. D. (1990). Emotional intelligence. Imagination, Cognition and Personality, 9, 185–211.]. A pool of 62 items represented the different dimensions of the model. A factor analysis of the responses of 346 participants suggested the creation of a 33-item scale. Additional studies showed the 33-item measure to have good internal consistency and testretest reliability. Validation studies showed that scores on the 33-item measure 1.(a) correlated with eight of nine theoretically related constructs, including alexithymia, attention to feelings, clarity of feelings, mood repair, optimism and impulse control;2.(b) predicted first-year college grades;3.(c) were significantly higher for therapists than for therapy clients or for prisoners;4.(d) were significantly higher for females than males, consistent with prior findings in studies of emotional skills;5.(e) were not related to cognitive ability and6.(f) were associated with the openness to experience trait of the big five personality dimensions.
Article
Full-text available
This study examined the reliability and validity of a new elementary cognitive task (ECT) based on the Posner paradigm. This task measures the speed and efficiency of long-term memory retrieval using non-verbal stimuli. Results indicated that the non-verbal Posner task has acceptable test-retest reliability. Parameters of reaction time (RT) were found to correlate substantially (after correction for range restriction) with verbal IQ, but not with performance IQ.
Article
Full-text available
This article presents a basic conceptualization of ingroup identi- fication as the degree to which the ingroup is included in the self and introduces the Inclusion of Ingroup in the Self (IIS) measure to reflect this conceptualization. Using responses from samples of women and ethnic minority groups, four studies demonstrate the utility of this conceptualization of ingroup identification and provide support for the IIS. Results from these studies establish construct validity, concurrent and discriminant validity, and high degrees of test-retest reliability for the IIS. Reaction time evi- dence also is provided, supporting the use of the IIS as a measure of ingroup identification. Particular strengths of this conceptu- alization of ingroup identification and potential uses for the IIS are discussed.
Article
Full-text available
Forgivingness is the disposition to forgive interpersonal trans-gressions over time and across situations. There is currently no acceptable measure of forgivingness for use in testing theoretical propositions. The authors describe a five-item scenario-based scale, the Transgression Narrative Test of Forgivingness (TNTF). In five studies examining 518 university students from three disparate universities, the authors assess the item and full-scale functioning of the TNTF and its concurrent and 8-week predictive validity relative to trait anger, rumination, neuroticism, agreeableness, and hostility. Test-retest reliability and stability of item locations were both good. Norms are pre-sented by gender, ethnicity, and religious activity. The TNTF is a brief measure of forgivingness that is not theory dependent and is therefore useful in basic and intervention research from a variety of theoretical perspectives.
Article
Full-text available
Four studies demonstrate the psychometric adequacy and validity of scales designed to assess coping through emotional approach. In separate undergraduate samples, exploratory and confirmatory factor analyses of dispositional (Study 1) and situational (Study 3) coping item sets yielded 2 distinct emotional approach coping factors: emotional processing (i.e., active attempts to acknowledge and understand emotions) and emotional expression. The 2 scales yielded high internal consistency and test-retest reliability, as well as convergent and discriminant validity. A study (Study 2) of young adults and their parents established the scales' interjudge reliabilities. Longitudinal (Study 3) and experimental (Study 4) research supported the predictive validity of the emotional approach coping scales with regard to adjustment to stressful encounters. Findings highlight the utility of functionalist theories of emotion as applied to coping theory. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Investigated, in 2 experiments, whether judgments of happiness and satisfaction with one's life are influenced by mood at the time of judgment. In Exp I, moods were induced by asking 61 undergraduates for vivid descriptions of a recent happy or sad event in their lives. In Exp II, moods were induced by interviewing 84 participants on sunny or rainy days. In both experiments, Ss reported more happiness and satisfaction with their life as a whole when in a good mood than when in a bad mood. However, the negative impact of bad moods was eliminated when Ss were induced to attribute their present feelings to transient external sources irrelevant to the evaluation of their lives; but Ss who were in a good mood were not affected by misattribution manipulations. The data suggest that (a) people use their momentary affective states in making judgments of how happy and satisfied they are with their lives in general and (b) people in unpleasant affective states are more likely to search for and use information to explain their state than are people in pleasant affective states. (18 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Five studies tested the hypothesis that stable individual differences exist in the chronic tendency to engage in evaluative responding. In 2 studies, the 16-item Need to Evaluate Scale (NES) was developed and demonstrated to possess high internal consistency, a single factor structure, high test-retest reliability, and convergent and discriminant validity. Three additional studies supported the predictive validity of the NES. In one, high-NES participants were more likely to report having attitudes toward a variety of important social and political issues than low-NES participants. In another study, high-NES participants wrote more evaluative thoughts in a free thought listing about unfamiliar paintings than low-NES participants. In a final study, high-NES participants wrote more evaluative thoughts in a free thought listing about a typical day in their lives than low-NES participants. Implications for research in social and personality psychology are discussed. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
D. Watson and A. Tellegen (1985) proposed a "consensual" structure of affect based on J. A. Russell's (1980) circumplex. The authors' review of the literature indicates that this 2-factor model captures robust structural properties of self-rated mood. Nevertheless, the evidence also indicates that the circumplex does not fit the data closely and needs to be refined. Most notably, the model's dimensions are not entirely independent; moreover, with the exception of Pleasantness–Unpleasantness, they are not completely bipolar. More generally, the data suggest a model that falls somewhere between classic simple structure and a true circumplex. The authors then examine two of the dimensions imbedded in this structure, which they label Negative Activation (NA) and Positive Activation (PA). The authors argue that PA and NA represent the subjective components of broader biobehavioral systems of approach and withdrawal, respectively. The authors conclude by demonstrating how this framework helps to clarify various affect-related phenomena, including circadian rhythms, sleep, and the mood disorders. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Generativity may be conceived in terms of 7 interrelated features: cultural demand, inner desire, generative concern, belief in the species, commitment, generative action, and personal narration. Two studies describe the development and use of 3 assessment strategies designed to tap into the generativity features of concern, action, and narration. A self-report scale of generative concern, the Loyola Generativity Scale (LGS), exhibited good internal consistency and retest reliability and showed strong positive associations with reports of actual generative acts (e.g., teaching a skill) and themes of generativity in narrative accounts of important autobiographical episodes. In 1 sample of adults between the ages of 19 and 68, LGS scores of fathers were higher than those of men who had never had children. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
The Obsessive-Compulsive Inventory (OCI) is a new self-report instrument developed to address the problems inherent in available instruments for determining the diagnosis and severity of obsessive-compulsive disorder (OCD). The OCI consists of 42 items composing 7 subscales: Washing, Checking, Doubting, Ordering, Obsessing (i.e., having obsessional thoughts), Hoarding, and Mental Neutralizing. Each item is rated on a 5-point (0-4) Likert scale of symptom frequency and associated distress. One hundred and forty-seven individuals diagnosed with OCD; 58 with generalized social phobia; 44 with posttraumatic stress disorder; and 194 nonpatients completed the OCI and other measures of OCD, anxiety, and depression. The present article describes the psychometrics of the OCI including (a) scale construction and content validity, (b) reliability (internal consistency and retest reliability), and (c) convergent and discriminant validity. The OCI exhibited satisfactory reliability and validity with all 4 samples. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Time perspective (TP), a fundamental dimension in the construction of psychological time, emerges from cognitive processes partitioning human experience into past, present, and future temporal frames. The authors' research program proposes that TP is a pervasive and powerful yet largely unrecognized influence on much human behavior. Although TP variations are learned and modified by a variety of personal, social, and institutional influences, TP also functions as an individual-differences variable. Reported is a new measure assessing personal variations in TP profiles and specific TP "biases." The 5 factors of the Zimbardo Time Perspective Inventory were established through exploratory and confirmatory factor analyses and demonstrate acceptable internal and test-retest reliability. Convergent, divergent, discriminant, and predictive validity are shown by correlational and experimental research supplemented by case studies. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
This longitudinal study provides an analysis of the relationship between personality traits and work experiences with a special focus on the relationship between changes in personality and work experiences in young adulthood. Longitudinal analyses uncovered 3 findings. First, measures of personality taken at age 18 predicted both objective and subjective work experiences at age 26. Second, work experiences were related to changes in personality traits from age 18 to 26. Third, the predictive and change relations between personality traits and work experiences were corresponsive: Traits that "selected" people into specific work experiences were the same traits that changed in response to those same work experiences. The relevance of the findings to theories of personality development is discussed.
Book
Autobiographical Memory and the Validity of Retrospective Reports presents the collaborative efforts of cognitive psychologists and research methodologists in the area of autobiographical memory. The editors have included an esteemed group of researchers whose work covers a wide range of issues related to autobiographical memory and the validity of retrospective reports, reflecting the diverse traditions in cognitive psychology and survey research. The first part of the book provides different theoretical perspectives on retrospective reports, along with supporting experimental evidence. The second part of this volume focuses specifically on retrospective reports of behaviors, including recall of the frequency and intensity of physical pain, of the number of cigarettes smoked, of dietary habits, and of child support payments. The following sections address the cognitive processes involved in event dating and time estimation, and a discussion of the differences between self and proxy reports. The final part extends the discussion of autobiographical memories in different directions, including the impact of autobiographical memories on individuals' assessment of their current life, the assessment of social change on the basis of retrospective reports, and the issue of collective memories. This book, an indispensable and timely resource for researchers and students of cognitive psychology as well as to survey methodologists and statisticians, demonstrates the considerable progress made in understanding the cognitive dynamics of retrospective reports.
Article
Gray [In H. J. Eysenck, A model for personality (pp. 246–276). New York: Springer; 1981; The neuropsychology of anxiety: an enquiry into the functions of the septo-hippocampal system. Oxford: Oxford University Press; 1982] has described two motivational systems, the Behavioural Inhibition System (BIS) and the Behavioural Activation System (BAS), that control aversive and appetitive behaviour, respectively. Research on Gray's model of personality has been hindered by the lack of specific self-report measures of the reactivity and responsivity of these systems. We describe a set of studies that illustrate the main psychometrical characteristics of the Sensitivity to Punishment and Sensitivity to Reward Questionnaire (SPSRQ). The two scales of the questionnaire were developed by writing items to assess BIS and BAS functioning, respectively. Results showed that both scales were independent, and presented satisfactory internal consistency and test-retest reliability. Studies 2–5 reported data related to convergent and discriminant validity of the scales. The Sensitivity to Punishment scale was: (1) positively related to Eysenck's neuroticism dimension; (2) negatively related to extraversion; (3) not related to psychoticism; (4) significantly related to the STAI-Trait scale of Spielberger; and (5) related to the somatic, behavioral, and cognitive anxiety scales of Lehrer and Woolfolk [Behavioral Assessment, 4, (1982) 167–177.]. The Sensitivity to Reward scale was: (1) positively related to Eysenck's extraversion and neuroticism; (2) moderately related to psychoticism; (3) positively related to the Eysenck's Impulsiveness scale [Psychological Reports, 43, (1978) 1247–1255] and the Zuckerman's Sensation Seeking Scales [Journal of Consulting and Clinical Psychology, 46, (1978) 139–149]. Although future construct validity studies are needed, discussion is focused on the importance of using specific designed measures to evaluate and develop Gray's model.
Article
Confirmatory factor analysis was used to compare 6 models of posttraumatic Stress disorder (PTSD) symptoms, ranging from I to 4 factors, in a sample of 3,695 deployed Gulf War veterans (N = 1,896) and nondeployed controls (N = 1,799). The 4 correlated factors-intrusions, avoidance, hyperarousal; and dysphoria-provided the best fit. The dysphoria factor combined traditional markers of numbing and hyperarousal. Model superiority was cross-validated in multiple subsamples, including a subset of deployed participants who were exposed to traumatic combat stressors. Moreover, convergent and discriminant validity correlations suggested that intrusions may be relatively specific to PTSD, whereas dysphoria may represent a nonspecific component of many disorders. Results are discussed in the context of hierarchical models of anxiety and depression.
Article
The Threat Index (TI), a measure of death concern grounded in personal construct theory, was submitted to psychometric refinement. The factorability of the TI using the traditional split-match scoring was compared with methods based on Manhattan, Euclidian, standardized Euclidian, and Mahalanobis distance formulas. Statistical and substantive interpretability were enhanced with the standardized Euclidian factor structure. The LISREL VI program was used to determine the best model for the scale in an exploratory factor analysis. A nonhierarchical, G + 3 model met the criterion of goodness of fit > 0.9 for the 1st subsample (n = 405). In a confirmatory factor analysis with a 2nd subsample (n = 405), the model was confirmed. Internal consistency and test-retest reliability were acceptable for Global Threat and 3 subfactors-Threat to Well-Being, Uncertainty, and Fatalism-and all subfactors were found to be independent of social desirability.
Article
There has been recent concern about the degree to which posttraumatic stress disorder (PTSD) symptomatology influences reports of prior exposure to highly stressful life events. In this longitudinal study of 2.942 male and female Gulf War veterans, the authors documented change in stressor reporting across 2 occasions and the association between change and PTSD symptom severity. A regression-based cross-lagged analysis was used to examine the relationship between PTSD symptom severity and later reported stressor exposure. Shifts in reporting over time were modestly associated with PTSD symptom severity. The cross-lagged analysis revealed a marginal association between Time 1 PTSD symptom severity and Time 2 reported stressor exposure for men and suggested that later reports of stressor exposure are primarily accounted for by earlier reports and less so by earlier PTSD symptomatology.
Article
Defining hope as a cognitive set that is composed of a reciprocally derived sense of successful (1) agency (goal-directed determination) and (2) pathways (planning of ways to meet goals), an individual-differences measure is developed. Studies with college students and patients demonstrate acceptable internal consistency and test–retest reliability, and the factor structure identifies the agency and pathways components of the Hope Scale. Convergent and discriminant validity are documented, along with evidence suggesting that Hope Scale scores augmented the prediction of goal-related activities and coping strategies beyond other self-report measures. Construct validational support is provided in regard to predicted goal-setting behaviors; moreover, the hypothesized goal appraisal processes that accompany the various levels of hope are corroborated.
Article
Instrument refinement refers to any set of procedures designed to improve an instrument's representation of a construct. Though often neglected, it is essential to the development of reliable and valid measures. Five objectives of instrument refinement are proposed: identification of measures' hierarchical or aggregational structure, establishment of internal consistency of undimensional facets of measures, determination of content homogeneity of undimensional facets, inclusion of items that discriminate at the desired level of attribute intensity, and replication of instrument properties on an independent sample. The use of abbreviated scales is not recommended. The refinement of behavioral observation procedures is discussed, and the role of measure refinement in theory development is emphasized.
Chapter
Publisher Summary The dominant paradigm in current personality psychology is a reinvigorated version of one of the oldest approaches, trait psychology. Personality traits are “dimensions of individual differences in tendencies to show consistent patterns of thoughts, feelings, and actions.” In this context, trait structure refers to the pattern of co-variation among individual traits, usually expressed as dimensions of personality identified in factor analyses. For decades, the field of personality psychology was characterized by competing systems of trait structure; more recently a consensus has developed that most traits can be understood in terms of the dimensions of the Five-Factor Model. The consensus on personality trait structure is not paralleled by consensus on the structure of affects. The chapter discusses a three-dimensional model, defined by pleasure, arousal, and dominance factors in which it is possible to classify such state-descriptive terms as mighty, fascinated, unperturbed, docile, insolent, aghast, uncaring, and bored. More common are two-dimensional systems with axes of pleasure and arousal or positive and negative affect. These two schemes are interpreted as rotational variants—positive affect is midway between pleasure and arousal, whereas negative affect lies between arousal and low pleasure.
Article
Taxometric and biometric analyses were conducted on 2 North American samples to investigate the prevalence and biometric structure of pathological dissociation. Results indicated that approximately 3.3% of the general population belongs to a pathological dissociative taxon. A brief 8-item self-report scale called the DES-T can be used to calculate taxon membership probabilities in clinical and nonclinical samples of adults (a SAS scoring program is provided for this purpose). The genetic and environmental architecture of pathological dissociative symptoms was explored by conducting a biometric analysis on DES-T ratings from 280 identical and 148 fraternal twins. The findings suggest that approximately 45% of the observed variance on the DES-T can be attributed to shared environmental influences. The remaining variance is due to nonshared environmental influences.
Article
Theory testing in the area of hypercompetitiveness has been impeded by the lack of an adequate psychometric instrument. Four studies were conducted as part of an initial research program designed to remedy this deficiency by constructing an individual difference measure of general hypercompetitive attitude with satisfactory psychometric properties. In Studies 1 and 2, a 26-item scale was derived primarily through item-total correlational analysis; it demonstrated adequate internal and test-retest reliabilities. The remaining two studies were concerned with determining the construct validity of the scale. In line with theoretical expectations based on Horney's theory of neurosis, subjects who perceived themselves as hypercompetitive were less psychologically healthy. The potential usefulness of the scale in therapeutic, athletic, school, and business settings is discussed.
Article
The present study assessed the psychometric properties of the Larson and Chastain, 1990[Larson, D. G. and Chastain, R. L. (1990). Self-concealment: conceptualization, measurement and health implications. Journal of Social and Clinical Psychology, 9, 439–455.] Self-Concealment Scale. Based on a university student population, internal consistency (α=0.83 to 0.87) and retest reliability estimates (r=0.74) suggested good stability both within the instrument and over time. Although exploratory methods suggested two subscales (keeping secrets and personal concealment), both the reliability and confirmatory factor analyses of an independent sample supported scale unidimensionality. Directions for further scale validation research are suggested.
Article
As part of a larger research programme concerned with the role of anger/hostility in heart disease in Singapore, three commonly used measures of anger/hostility (Cook & Medley Ho Scale, STAXI, Buss-Durkee Hostility Inventory) were examined for reliability and validity in an Asian population. A total of 968 Singaporean Chinese, Malay and Indian respondents completed one or more of these measures together with measures of symptom and illness experience. In addition, blood pressure and heart rate measures were taken for 201 respondents. Overall, the Ho and STAXI measures had reasonably high internal consistency and test-retest reliability. Internal consistency and test-retest reliability for the Buss Durkee measure were high for the total score but variable for the component scales. Correlation and regression showed that the Ho and STAXI appeared to be tapping a common core of variance, which can be characterized as trait anger. Correlations of the Ho and STAXI with health measures produced modest but statistically significant correlations for measures of symptom and illness experience and generally low and non-significant correlations for heart rate and blood pressure.
Article
The development of a 70-item measure of a comprehensive set of personality features derived from Reversal Theory (M. J. Apter, The Experience of Motivation: The Theory of Psychological Reversals, Academic Press, 1982) is reported. The Motivational Style Profile (MSP) measures the dominance of all five pairs of metamotivational states identified in the theory, together with tendencies towards arousability, effortfulness and optimism/pessimism. The MSP also measures the overall salience of each pair of states within the individual's conscious experience over time. The paper describes the development of the MSP subscales through several cycles of item analysis, involving both American and British samples. Data on test-retest reliability for the resulting instrument is also reported together with some concurrent validation data and the results of a factor analysis of the MSP items which suggests a five factor structure.
Article
The present study sought to develop the Hindi version of the Self-Report Altruism Scale (SRA-scale) devised by Rushton, Chrisjohn and Fekken (1981; Personality and Individual Differences, 2, 292–302). Statements of the original SRA-scale, apart from being adapted to Indian culture, were presented in general response format wherein the subjects could “imagine” themselves in different situations and report the amount of help they could render. The Hindi SRA-scale bore high equivalence to the original scale and was found to have high internal consistency, split-half and test-retest reliability, criterion-related and construct validity. The scale promises to be a useful tool for measuring altruism in the Indian milieu.