Content validity of patient-reported outcome measures: Perspectives from a PROMIS meeting

Department of Medical Social Sciences, Feinberg School of Medicine Northwestern University, 625 Michigan Ave., Suite 2700, Chicago, IL 60611, USA.
Quality of Life Research (Impact Factor: 2.49). 08/2011; 21(5):739-46. DOI: 10.1007/s11136-011-9990-8
Source: PubMed


Content validity of patient-reported outcome measures (PROs) has been a focus of debate since the 2006 publication of the U.S. FDA Draft Guidance for Industry in Patient Reported Outcome Measurement. Under the auspices of the Patient Reported Outcomes Measurement Information System (PROMIS) initiative, a working meeting on content validity was convened with leading PRO measurement experts. Platform presentations and participant discussion highlighted key issues in the content validity debate, including inconsistency in the definition and evaluation of content validity, the need for empirical research to support methodological approaches to the evaluation of content validity, and concerns that continual re-evaluation of content validity slows the pace of science and leads to the proliferation of study-specific PROs. We advocate an approach to the evaluation of content validity, which includes meticulously documented qualitative and advanced quantitative methods. To advance the science of content validity in PROs, we recommend (1) development of a consensus definition of content validity; (2) development of content validity guidelines that delineate the role of qualitative and quantitative methods and the integration of multiple perspectives; (3) empirical evaluation of generalizability of content validity across applications; and (4) use of generic measures as the foundation for PROs assessment.

10 Reads
    • "In the current study, the GH Single-item In general, would you say your health is: excellent, very good, good, fair, or poor? had adequate convergent and discriminant validity, adequate reliability and sensitivity to detect a decline in perceived health, and adequate sensitivity to detect a conditional experimental effect. The GH Singleitem performed nearly as well as the SF-36 General Health Scale in most post-baseline analyses, and so would appear to be particularly useful as a longitudinal measure when illness or aging might gradually erode respondents' ability to complete a multi-item scale (Bostan et al., 2014; Knäuper & Turner, 2003; Magasi et al., 2012; Neidhammer , Kerrad, Schutte, Chastang, & Kelleher, 2013), and whenever health change is expected to be sudden or rapid, e.g., in mobile phone applications designed to track the spread of disease or prevalence of injury during an epidemic or disaster. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The research performance of the single-item self-rating In general, would you say your health is: excellent, very good, good, fair, or poor? was evaluated relative to the SF-36 General Health Scale that contains this item, using data for a sample of psychiatric outpatients who had co-occurring chronic physical conditions (N = 177). The scale was more robust than the single-item in cross-sectional validity tests and for predicting 2-year outcomes, but the single-item had stronger discriminant validity as a measure of physical health, especially in post-baseline analyses. Single-item and scale were both sensitive enough to detect change in perceived health over 2 years and a conditional experimental effect on health self-perceptions in a randomized trial. These findings demonstrate that a global single-item can be as valid, reliable, and sensitive as a multi-item scale for longitudinal research purposes, even if the scale performs better in cross-sectional surveys or as a screening measure.
    No preview · Article · Oct 2015 · Journal of Clinical Psychology in Medical Settings
  • Source
    • "Content validity is the extent to which a descriptive system of a measure “represents the most relevant and important aspects of a concept in the context of a given measurement application” [14]p.743. Assessment of the content coverage of a measure allows understanding about inferences that can be drawn from the results of a measure. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The ICECAP-A and EQ-5D-5L are two index measures appropriate for use in health research. Assessment of content validity allows understanding of whether a measure captures the most relevant and important aspects of a concept. This paper reports a qualitative assessment of the content validity and appropriateness for use of the eq-5D-5L and ICECAP-A measures, using novel methodology. In-depth semi-structured interviews were conducted with research professionals in the UK and Australia. Informants were purposively sampled based on their professional role. Data were analysed in an iterative, thematic and constant comparative manner. A two stage investigation - the comparative direct approach - was developed to address the methodological challenges of the content validity research and allow rigorous assessment. Informants viewed the ICECAP-A as an assessment of the broader determinants of quality of life, but lacking in assessment of health-related determinants. The eq-5D-5L was viewed as offering good coverage of health determinants, but as lacking in assessment of these broader determinants. Informants held some concerns about the content or wording of the Self-care, Pain/Discomfort and Anxiety/Depression items (EQ-5D-5L) and the Enjoyment, Achievement and attachment items (ICECAP-A). Using rigorous qualitative methodology the results suggest that the ICECAP-A and EQ-5D-5L hold acceptable levels of content validity and are appropriate for use in health research. This work adds expert opinion to the emerging body of research using patients and public to validate these measures.
    Full-text · Article · Dec 2013 · PLoS ONE
  • Source
    • "In all, forty-four patients across three NP populations were interviewed which can be regarded as a sufficiently large sample size. The consistency of the findings support content validity and concept saturation as defined by Leidy and Vernon [33]and Magasi et al. [34]. Specifically, they define saturation as the point at which no substantially new information/concepts continue to emerge beyond what has been previously mentioned when interviewing the last few patients. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background/objective The Self Assessment of Treatment (SAT) questionnaire was developed to reflect key patient reported outcomes of Neuropathic Pain (NP) treatments. This study aimed to understand how patients perceived the relevance and ease of understanding of the questions in the SAT and to recommend modifications based on patient and clinician interviews. Methods Semi-structured interviews were conducted with clinicians and NP patients to provide information regarding treatment attributes and the impact of pain. Patients were debriefed on the SAT, a 5-item scale evaluating pain, activity level, quality of life (QoL) and satisfaction with treatment (recommend treatment and undergo treatment again). The SAT has a recall period reflecting back to the start of treatment. The qualitative analysis software ATLAS.ti 5.0 was used to analyze patient transcripts. Changes to the SAT were integrated into the questionnaire for a second round of debriefing interviews. Results Three NP clinicians and 44 patients (20 painful diabetic neuropathy, 16 HIV-associated neuropathy and 8 post herpetic neuralgia) with a mean age of 60.3 (12.3) years and an even gender distribution were interviewed. Patient treatment experience included anticonvulsants (73%), antidepressants (34%), opioids (25%), and topical medications (41%). Pain descriptors and treatment attributes were similar across the three NP groups. Pain relief was judged the most important treatment attribute, followed by ability to undertake activities. Sleep improvement was another important attribute. Activity limitations and QOL were perceived as too broad and non-specific, and were split into 3 concepts each (activity limitations was split into self care, daily and physical activities and QOL was split into sleep, emotions, and social function). A 7-day recall period was introduced. The item stem and response options were made consistent, and a baseline and follow-up questionnaires were developed (except for the satisfaction items) to enable monitoring onset of treatment benefit and change over time. Conclusions The content validity of the revised SAT was improved by the qualitative research, and NP treatment benefits are reflected in a more consistent fashion by the changes. Baseline and follow-up versions make it possible to perform assessments of change over time.
    Full-text · Article · Jan 2013 · Health and Quality of Life Outcomes
Show more