Article

Development of an item bank for a computerised adaptive test of upper-extremity function

Taylor & Francis
Disability and Rehabilitation
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The purpose of this study was to determine the psychometric characteristics of an upper-extremity item bank as a precursor to developing a computer adaptive patient reported outcome instrument. The Activity dimension of the World Health Organization's International Classification of Functioning, Disability and Health (ICF) provided the conceptual framework for the items. Factor and Rasch analyses were used to evaluate the psychometric properties of the item bank, including: monotonicity, local independence, dimensionality, item difficulty hierarchy and match between sample ability and item difficulty. Monotonicity of the rating scale was supported. Nine item pairs were locally dependent, and thus one item from each pair was removed from subsequent analyses. There was evidence for two unidimensional constructs; gross upper-extremity and fine hand. Both constructs showed good internal consistency and person separation. In general, the order of item difficulty within each construct replicated the hypothesised item difficulty order. The fine hand construct had a ceiling effect. The above study of our newly developed upper-extremity item bank empirically verified the intended item difficulty order, identified separate constructs (i.e. gross upper-extremity and fine hand) and provided insights into eliminating the ceiling effect of one of the constructs. These findings are critical precursors to the development of upper-extremity components of the ICF Activity Measure, an ICF-based, CAT located on the web at: www.icfmeasure.phhp.ufl.edu.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Clinicians need reliable and valid measures of hand function so they can monitor progress, set goals, determine effectiveness of intervention, and seek reimbursement for therapy services (Carmeli, Patish, & Coleman, 2003;Lemmens, Timmermans, Janssen-Potten, Smeets, & Seelen, 2012). In order for a measure of hand function to be most useful for these purposes, test items need to be well developed and the scores need to be reliable and valid for the stated purpose (Lehman et al., 2011). In the words of Elaine Ewing Fess -an expert in hand therapy, author of Functional Tests in the book Rehabilitation of Hand and Upper Extremity (Skirven, Osterman, Fedorczyk, & Amadio, 2011) -on assessment of hand, "A thorough and unbiased assessment procedure furnishes information that helps predict rehabilitation potential, provides data with which subsequent measurements may be compared, and allows medical specialists to plan and evaluate treatment programs and techniques. ...
... Cook et al., 2007). Lehman et al. (2011) have conducted and documented a similar study in which they developed an upper extremity item bank for patient reported outcome measure in musculoskeletal conditions with two uni-dimensional constructs of gross upper extremity and fine hand use. ...
... The objective of this study was to define and describe the construct of 'hand and arm function in daily activities' using a construct map and the relationships to other variables using a nomological network. This construct has two dimensions, gross and fine movements in daily activities based on current literature (Lehman et al., 2011). 'Gross movements in daily activities' dimension is defined as proximal arm movements such as transport, reaching, carrying, and pushing. ...
Article
Purpose. A new generic assessment, the Hand and Arm Function Measure, with both patient-reported and performance-based items was devised for people with neurological conditions using the evidence-centered design framework. The objective of this study was to gather experiences of stakeholders regarding upper extremity function in daily activities and seek opinions regarding a preliminary set of items to establish face and content validity. Methods. This descriptive qualitative study included focus groups, cognitive interviews, and an open-ended survey. Stakeholders (n=24) were selected by purposeful sampling of content experts in rehabilitation (n=4) and people who had stroke (n=7), traumatic brain injury (n=2), Parkinson disease (n=6), and multiple sclerosis (n=5). Responses were coded and thematically analyzed by two authors independently. Results. The construct was operationally defined and relevant items categorized based on International Classification of Functioning, Disability, and Health. The items were designed based on aspects of upper extremity function relevant to this population. A 145-item bank was generated and a preliminary set of 59 items (14 performance-based and 45 patient-reported) systematically identified and modified. Conclusions. Face and content validity developed through stakeholder engagement helped generate the evidence to develop a comprehensive outcome measure in rehabilitation. Further investigation of the psychometric properties is needed.
... The ICF model is a classifica- tion of health and health-related conditions and consists of six domains: body structure/function, activity, partici- pation, health condition, environment factors, and per- sonal factors (World Health Organization, 2001). The domains of the ICF can be linked to health-related out- come measures (Cieza et al., 2002) and can be used as a conceptual framework for developing and validating outcome measures ( Lehman et al., 2011aLehman et al., , 2011b). Using the ICF for the purposes of this review, we categorized UE outcome measures into three domains: body struc- ture/function, activity, and participation. ...
... Another method for assessing construct validity used in the articles reviewed was item difficulty hierarchy, pro- viding information on how well instruments are con- structed in terms of item difficulty structure. For example, the ICF-AM was developed on the basis of the conceptual framework of the ICF and a hypothetical difficulty hierarchy ( Lehman et al., 2011aLehman et al., , 2011b). The test items were developed on the basis of these two difficulty assumptions (i.e. the weight of the object and the distance from the floor to object). ...
... Our review found that nine measures showed multiple measurement dimensions, including DASH ( Franchignoni et al., 2010;Cano et al., 2011;Lehman et al., 2011b), ICF-AM ( Lehman et al., 2011a), M-ASES ( Cook et al., 2008), MESUPES (Van de Winckel et al., 2006), OPTIMAL ( Elston et al., 2013), RMA (Van de Winckel et al., 2007;Kurtais et al., 2009), and STREAM ( Hsueh et al., 2006). These outcome measures consist of multiple measurement constructs, such as gross upper function, fine dexterity, trunk movement, leg movement, and mobility. ...
Article
The aim of this study was to provide a systematic review of psychometric studies of upper extremity (UE) outcome measures validated by Rasch analysis and assess the extent to which their measurement areas cover the domains of the International Classification of Functioning, Disability and Health model. A literature search from 1966 to 2014 was performed using PubMed, CINAHL, Scopus, PsycINFO, Ovid/MEDLINE, ERIC, and Cochrane library. Fourteen keywords indicating 'upper extremity', 'psychometric properties', and 'outcome measures' were used. From a total of 1039 studies, 17 UE impairment outcome measures that fulfilled the inclusion criteria were selected and reviewed. The instruments targeted adults with various neurological or orthopedic conditions (i.e. stroke, upper and lower extremity impairments, and back pain). Twelve instruments targeted the body structure/function domain and 11 instruments targeted the activity domain of the International Classification of Functioning, Disability and Health model. Only two instruments targeted the participation domain. All outcome measures showed reasonably sound psychometric properties, including construct validity (good fit statistic), moderate to high reliability (r=0.86-0.99), and sound dimensionality (unidimensional). The reviewed psychometric properties of UE outcome measures are useful for clinicians in deciding which measures to use to assess patients' UE impairments.
... Item response theory (IRT)-based CAT has been proposed [2,[18][19][20][21] for efficient, reliable, and valid assessments of health-related functions. Although many researchers have contributed to the dichotomous [2,7], polytomous [22,23], and combined item-bank formats used by CAT (called a Rasch partial credit model [PCM] [24] or a generalized partial credit model [GPCM] [25]), few were jointly available for a comparison of precision and efficiency differences in CAT estimation methods (e.g., maximum likelihood estimation [MLE] [26], expected a posteriori estimation [EAP] [27,28], and maximum a posteriori estimation [MAP] [29]). ...
... The first CAT item will be randomly selected from the item pool. The next item to be answered is the item with the maximal variance among the remaining items according to the provisional person ability [21,38]. For the detailed item selection rules, interested readers can see Additional file 3 on the Excel VBA codes. ...
... Our findings in Task 1 (to compare CAT precision and efficiency) are consistent with the literature [2,5,21,22,38,41], and they support the notion that CAT is more efficient than NAT. We confirmed that GPCM-type ADL CAT (i.e., in contrast to CADL-CAT [2,7], which uses dichotomous Rasch models) similarly requires significantly fewer items for person measures than does NAT, but does not compromise precision of measurement. ...
Article
Full-text available
Background Computer adaptive testing (CAT) of the activities of daily living (ADL) functions is required (i) to reveal the advantages of using an efficient and accurate estimation method, (ii) to determine the cutpoint for classifying ADL strata in patients with stroke, and (iii) to evaluate the feasibility of online CAT used in clinical settings for smartphones. Methods Normally standardized distributions of ADL measurements were simulated using item parameters from published papers. We retrieved item parameters of the combined Barthel Index and Frenchay Activities Index from the literature (the 23-item comprehensive ADL [CADL] and 34-item ADL scales) and simulated three 1000-person measures from a normal standard CAT distribution: [i] CADL (CADL-CAT), [ii] ADL (ADL-CAT), and [iii] NAT (Non-Adaptive Testing). The cutpoints of ADL person strata were determined using a norm-reference method. Maximum a posteriori estimation, expected a posteriori estimation, and maximum likelihood estimation (MAP) were used to compare the Pearson correlation coefficients and different number ratios of paired measures yielded by CAT and NAT. The number of items and the cutpoints for the scale were separately determined. ResultsWe found that (i) correlation coefficients for the three CAT-estimated measures were 0.77 (CADL), 0.93 (Male ADL), and 0.93 (Female ADL) compared with their NAT counterparts. Different number ratios of person-paired measures between CAT and NAT for the three scales were all less than 5 %, indicating no difference exists between CAT and NAT. However, CAT might be 66 % more efficient than NAT. (ii) The estimated cutpoints of T scores (i.e., with a mean of 50 and a standard deviation of 10) were 45, 55, and 65 (e.g., separating person ADL function to four strata with not active, fairly active, active, and very active). (iii) An available-for-download online ADL-CAT APP for clinical practice was demonstrated. Conclusions An online ADL-CAT APP using the MAP method was created and used on smartphones to classify ADL strata in patients with stroke.
... CAT has nearly four decades of research behind it but has only been applied more recently to health care. CAT has been used to shorten or develop questionnaires for assessment of fatigue [23], depression [24][25][26], suicide ideation [4], other mental health disorders [27,28], physical [29] and upper extremity functioning [30], health status in patients with knee osteoarthritis [31], activities of daily living in outpatients with stroke [32], and exposure of nurses to workplace bullying [33,34] and in patient-reported outcome measurement studies [24,29]. Overall, its application has proven to be successful in shortening questionnaires, while patient measurements remained valid and reliable. ...
Article
Full-text available
BACKGROUND: There is a need for shorter-length assessments that capture patient questionnaire data while attaining high data quality without an undue response burden on patients. Computerized adaptive testing (CAT) and classification and regression tree (CART) methods have the potential to meet these needs and can offer attractive options to shorten questionnaire lengths. OBJECTIVE: The objective of this study was to test whether CAT or CART was best suited to reduce the number of questionnaire items in multiple domains (eg, anxiety, depression, quality of life, and social support) used for a needs assessment procedure (NAP) within the field of cardiac rehabilitation (CR) without the loss of data quality. METHODS: NAP data of 2837 CR patients from a multicenter Cardiac Rehabilitation Decision Support System (CARDSS) Web-based program was used. Patients used a Web-based portal, MyCARDSS, to provide their data. CAT and CART were assessed based on their performances in shortening the NAP procedure and in terms of sensitivity and specificity. RESULTS: With CAT and CART, an overall reduction of 36% and 72% of NAP questionnaire length, respectively, was achieved, with a mean sensitivity and specificity of 0.765 and 0.817 for CAT, 0.777 and 0.877 for classification trees, and 0.743 and 0.40 for regression trees, respectively. CONCLUSIONS: Both CAT and CART can be used to shorten the questionnaires of the NAP used within the field of CR. CART, however, showed the best performance, with a twice as large overall decrease in the number of questionnaire items of the NAP compared to CAT and the highest sensitivity and specificity. To our knowledge, our study is the first to assess the differences in performance between CAT and CART for shortening questionnaire lengths. Future research should consider administering varied assessments of patients over time to monitor their progress in multiple domains. For CR professionals, CART integrated with MyCARDSS would provide a feedback loop that informs the rehabilitation progress of their patients by providing real-time patient measurements.
... CAT has nearly four decades of research behind it, but has only been applied more recently to healthcare. CAT has been used to shorten or develop questionnaires for assessment of fatigue [23], depression [24][25][26], suicide ideation [4], and other mental health disorders [27,28], physical [29] and upper extremity functioning [30], health status in patients with knee osteoarthritis [31], activities of daily living in outpatients with stroke [32, exposure of nurses to workplace bullying [33,34], and in patient reported outcomes measurement studies [24,29]. Overall, its application has proven to be successful in shortening questionnaires while patient measurements remained valid and reliable. ...
Preprint
BACKGROUND There is a need for assessment procedures of shorter lengths that capture patient questionnaire data while attaining high data quality without undue response burden for patients. Computerized Adaptive Testing (CAT) and Classification and Regression Trees (CART) methods have the potential to meet these needs and can offer an attractive option to shorten questionnaires OBJECTIVE We aimed to test which method, CAT or CART, is best suited to reduce the number of questionnaire items in multiple domains (anxiety, depression, quality of life and social support) used for a needs assessment procedure (NAP) within the field of cardiac rehabilitation (CR), without loss of data quality. METHODS NAP data of 2837 CR patients of a multicentre CARDDS Online program was used. Patients use an online portal, MYCARDDS to provide their data. CAT and CART were assessed on their performances in shortening the NAP procedure and in terms of sensitivity and specificity. RESULTS With CAT and CART, an overall reduction of 36 % and 72% of NAP questionnaires length could be realized respectively with a mean sensitivity and specificity of 0.765 and 0.817 for CAT, 0.777 and 0.877 for Classification Trees, and 0.743 and 0.40 for Regression Trees respectively. CONCLUSIONS Both CAT and CART can be used to shorten the questionnaires of the NAP used within the field of CR. CART yet showed the best performance with overall an about twice as large decrease in questionnaire items of the NAP, and the highest sensitivity and specificity. To our knowledge, our study is thus the first assessing differences in performance between CAT and CART for shortening questionnaires lengths. One example for future research is to administer varied assessments of patients over time to monitor their progress in multiple domains. For CR professionals, CART integrated with MYCARDDS would provide a feedback loop that informs the rehabilitation progress of their patients by providing real-time patient measurements.
... progress, set goals, determine effectiveness of intervention, and seek reimbursement for therapy services [4,5]. In order for a measure of upper extremity function to be useful for these purposes and psychometrically robust, it is critical for test items to be developed carefully with stakeholder engagement [6,7]. ...
... A useful scale using the Rasch model should be evaluated by 3 steps (prior tests, Rasch fit statistics, and post hoc tests) suggested by Smith [21] and Tennant and Pallant [22] (details shown in Methods) to verify a single domain. In many articles, authors used Rasch modeling to develop CAT on clinical samples, but none adopted the model testing steps recommended by Smith to verify scales before implementing CAT [9,10,[23][24][25][26]. ...
Article
Full-text available
Workplace bullying is a prevalent problem in contemporary work places that has adverse effects on both the victims of bullying and organizations. With the rapid development of computer technology in recent years, there is an urgent need to prove whether item response theory-based computerized adaptive testing (CAT) can be applied to measure exposure to workplace bullying. The purpose of this study was to evaluate the relative efficiency and measurement precision of a CAT-based test for hospital nurses compared to traditional nonadaptive testing (NAT). Under the preliminary conditions of a single domain derived from the scale, a CAT module bullying scale model with polytomously scored items is provided as an example for evaluation purposes. A total of 300 nurses were recruited and responded to the 22-item Negative Acts Questionnaire-Revised (NAQ-R). All NAT (or CAT-selected) items were calibrated with the Rasch rating scale model and all respondents were randomly selected for a comparison of the advantages of CAT and NAT in efficiency and precision by paired t tests and the area under the receiver operating characteristic curve (AUROC). The NAQ-R is a unidimensional construct that can be applied to measure exposure to workplace bullying through CAT-based administration. Nursing measures derived from both tests (CAT and NAT) were highly correlated (r=.97) and their measurement precisions were not statistically different (P=.49) as expected. CAT required fewer items than NAT (an efficiency gain of 32%), suggesting a reduced burden for respondents. There were significant differences in work tenure between the 2 groups (bullied and nonbullied) at a cutoff point of 6 years at 1 worksite. An AUROC of 0.75 (95% CI 0.68-0.79) with logits greater than -4.2 (or >30 in summation) was defined as being highly likely bullied in a workplace. With CAT-based administration of the NAQ-R for nurses, their burden was substantially reduced without compromising measurement precision.
... Computerized adaptive testing (CAT) has been used to achieve efficient, reliable, and valid assessments of health-related functions. [12][13][14] It uses a computer to administer items to interviewees and can assess interviewees' levels of function as reliably as needed (ie, to reach a preset reliability level). 15 The CAT assessment is tailored to the unique functional level of each interviewee. ...
Article
Background: An efficient, reliable, and valid measure for assessing activities of daily living (ADL) function is useful to improve the efficiency of patient management and outcome measurement. Objective: The purpose of this study was to construct a computerized adaptive testing (CAT) system for measuring ADL function in outpatients with stroke. Design: Two cohort studies were conducted at 6 hospitals in Taiwan. Methods: A candidate item bank (44 items) was developed, and 643 outpatients were interviewed. An item response theory model was fitted to the data and estimated the item parameters (eg, difficulty and discrimination) for developing the ADL CAT. Another sample of 51 outpatients was interviewed to examine the concurrent validity and efficiency of the CAT. The ADL CAT, as the outcome measure, and the Barthel index (BI) and Frenchay Activities index (FAI) were administered on the second group of participants. Results: Ten items did not satisfy the model's expectations and were deleted. Thirty-four items were included in the final item bank. Two stopping rules (ie, reliability coefficient >.9 and maximum test length of 7 items) were set for the CAT. The participants' ADL scores had an average reliability of .93. The CAT scores were highly associated with those of the full 34 items (Pearson r=.98). The scores of the CAT were closely correlated with those of the combined BI and FAI (r=.82). The time required to complete the CAT was about one fifth of the time used to administer both the BI and FAI. Limitations: The participants were outpatients living in the community. Further studies are needed to cross-validate the results. Conclusions: The results demonstrated that the ADL CAT is quick to administer, reliable, and valid in outpatients with stroke.
... These criteria are detailed in Smith et al [17] (http://www.hqlo.com/content/5//19). There are many published papers [1,[18][19][20][21] of studies using the Rasch model to develop CAT in clinical settings, but none of them have incorporated the Internet-based polytomously scored CAT to gather feedback from patients in hospitals. ...
Article
Full-text available
Many hospitals have adopted mobile nursing carts that can be easily rolled up to a patient's bedside to access charts and help nurses perform their rounds. However, few papers have reported data regarding the use of wireless computers on wheels (COW) at patients' bedsides to collect questionnaire-based information of their perception of hospitalization on discharge from the hospital. The purpose of this study was to evaluate the relative efficiency of computerized adaptive testing (CAT) and the precision of CAT-based measures of perceptions of hospitalized patients, as compared with those of nonadaptive testing (NAT). An Excel module of our CAT multicategory assessment is provided as an example. A total of 200 patients who were discharged from the hospital responded to the CAT-based 18-item inpatient perception questionnaire on COW. The numbers of question administrated were recorded and the responses were calibrated using the Rasch model. They were compared with those from NAT to show the advantage of CAT over NAT. Patient measures derived from CAT and NAT were highly correlated (r = 0.98) and their measurement precisions were not statistically different (P = .14). CAT required fewer questions than NAT (an efficiency gain of 42%), suggesting a reduced burden for patients. There were no significant differences between groups in terms of gender and other demographic characteristics. CAT-based administration of surveys of patient perception substantially reduced patient burden without compromising the precision of measuring patients' perceptions of hospitalization. The Excel module of animation-CAT on the wireless COW that we developed is recommended for use in hospitals.
Article
Full-text available
The Interactive Nutrition Specific Physical Exam Competency Tool (INSPECT) is a tool designed specifically to observe and measure registered dietitian nutritionists’ (RDNs) nutrition-focused physical exam (NFPE) competence in authentic acute care settings. The initial INSPECT items were generated and tested for content and face validity using expert RDNs’ input. The INSPECT was further examined for inter-rater, intra-rater, and internal consistency using clinical supervisor observations of RDNs performing NFPE on patients in real-life acute care settings. These previous studies showed the INSPECT to have excellent content validity, acceptable face validity, good inter-rater reliability, moderate to strong intra-rater reliability, and excellent internal consistency. In the current study, the Rasch measurement model was applied to examine the item-level properties of the INSPECT. Results confirm that the INSPECT measured a single construct. All items fit the established criteria for clinical observations of >0.5 and <1.7, had positive point measure correlations, met the Wright Unidimensionality Index criteria of ≥0.9, exhibited one latent construct with >40% variance explained by the Rasch dimension as well as a sub-dimension based on item difficulty from the principal component analysis of the first contrast Rasch residuals. Rasch rating scale analysis revealed that the rating scale and majority of the items (39/41) fit the Rasch model. Rasch item hierarchy analysis matched the a priori hypothesized hierarchy for the top-most and bottom-most items. Ceiling effects were seen for three items (hand hygiene, personal protective equipment, and patient position) and one item (handgrip using hand dynamometer) reached the floor effect. Rasch reliability assessment demonstrated high person reliability (0.86), high item reliability (0.96), and person separation of 3.56 ability levels. The principal component analysis of residuals revealed two factors based on item difficulty, one for micronutrient exam and another for macronutrient exam, initial steps, and bedside manner. The resulting two factors may likely be due to a sub-dimension of the latent NFPE trait. Overall, the INSPECT items were found to have good item-level psychometrics. Continued testing of the INSPECT with RDNs at different ability levels will help to determine cut-off scores ranging from novice to expert. Establishing cut-off scores for the INSPECT will further enhance the utility of the tool.
Article
Aims: Outcome measures quantifying aspects of health in a precise, efficient, and user-friendly manner are in demand. Computer adaptive tests (CATs) may overcome the limitations of established fixed scales and be more adept at measuring outcomes in trauma. The primary objective of this review was to gain a comprehensive understanding of the psychometric properties of CATs compared with fixed-length scales in the assessment of outcome in patients who have suffered trauma of the upper limb. Study designs, outcome measures and methodological quality are defined, along with trends in investigation. Materials and methods: A search of multiple electronic databases was undertaken on 1 January 2017 with terms related to "CATs", "orthopaedics", "trauma", and "anatomical regions". Studies involving adults suffering trauma to the upper limb, and undergoing any intervention, were eligible. Those involving the measurement of outcome with any CATs were included. Identification, screening, and eligibility were undertaken, followed by the extraction of data and quality assessment using the Consensus-Based Standards for the Selection of Health Measurement Instruments (COSMIN) criteria. The review is reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) criteria and reg istered (PROSPERO: CRD42016053886). Results: A total of 31 studies reported trauma conditions alone, or in combination with non-traumatic conditions using CATs. Most were cross-sectional with varying level of evidence, number of patients, type of study, range of conditions and methodological quality. CATs correlated well with fixed scales and had minimal or no floor-ceiling effects. They required significantly fewer questions and/or less time for completion. Patient-Reported Outcomes Measurement Information System (PROMIS) CATs were the most frequently used, and the use of CATs is increasing. Conclusion: Early studies show valid and reliable outcome measurement with CATs performing as well as, if not better than, established fixed scales. Superior properties such as floor-ceiling effects and ease of use support their use in the assessment of outcome after trauma. As CATs are being increasingly used in patient outcomes research, further psychometric evaluation, especially involving longitudinal studies and groups of patients with specific injuries are required to inform clinical practice using these contemporary measures. Cite this article: Bone Joint J 2018;100-B:693-702.
Article
Full-text available
Rating scales are employed as a means of extracting more information out of an item than would be obtained from a mere “yes/no”, “right/wrong” or other dichotomy. But does this additional information increase measurement accuracy and precision? Eight guidelines are suggested to aid the analyst in optimizing the manner in which rating scales categories cooperate in order to improve the utility of the resultant measures. Though these guidelines are presented within the context of Rasch analysis, they reflect aspects of rating scale functioning which impact all methods of analysis. The guidelines feature rating-scale-based data such as category frequency, ordering, rating-to-measure inferential coherence, and the quality of the scale from measurement and statistical perspectives. The manner in which the guidelines prompt recategorization or reconceptualization of the rating scale is indicated. Utilization of the guidelines is illustrated through their application to two published data sets. https://www.winsteps.com/a/Linacre-optimizing-category.pdf
Article
Full-text available
Osteoarthritis is one of the most common joint disorders in the elderly, yet few studies have targeted symptomatic osteoarthritis, especially symptomatic hand osteoarthritis. The authors conducted a survey in 1992– 1993 among an elderly population to estimate the prevalence of symptomatic hand osteoarthritis and to assess its impact on grip strength and functional activities. Framingham Study subjects received hand radiographs and answered queries on joint symptoms. Functional activities were assessed using an interviewer-administered questionnaire. Grip strength and observed functional performance were evaluated using standard procedures. A hand joint was defined as having symptomatic osteoarthritis if both symptoms and radiographic evidence of osteoarthritis were present. Of 1,041 subjects aged 71–100 years (36% men), the prevalence of symptomatic hand osteoarthritis was higher in women (26.2%) than in men (13.4%). Compared with those without symptomatic hand osteoarthritis, subjects with the disease had 10% reduced maximal grip strength, reported more difficulty writing, handling, or fingering small objects (odds ratio = 3.4), and showed more self-reported and observed difficulty carrying a 10-pound (4.5-kg) bundle (odds ratio = 1.7 and 1.6, respectively). In conclusion, in the context of a remarkable paucity of data on the epidemiology of symptomatic hand osteoarthritis, this study suggests that symptomatic hand osteoarthritis is a common disease among elders and frequently impairs hand function. activities of daily living; hand; hand strength; osteoarthritis; prevalence Abbreviations: CI, confidence interval; OR, odds ratio.
Article
Full-text available
Despite the widespread use of exploratory factor analysis in psychological research, researchers often make questionable decisions when conducting these analyses. This article reviews the major design and analytical decisions that must be made when conducting a factor analysis and notes that each of these decisions has important consequences for the obtained results. Recommendations that have been made in the methodological literature are discussed. Analyses of 3 existing empirical data sets are used to illustrate how questionable decisions in conducting factor analyses can yield problematic results. The article presents a survey of 2 prominent journals that suggests that researchers routinely conduct analyses using such questionable methods. The implications of these practices for psychological research are discussed, and the reasons for current practices are reviewed. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Computer-based testing by credentialing agencies has become common; however, selecting a test design is difficult because several good ones are available—parallel forms, computer adaptive (CAT), and multistage (MST). In this study, three computerbased test designs under some common examination conditions were investigated. Item bank size and item quality had a practically significant impact on decision consistency and accuracy. Even in nearly ideal situations, the choice of test design was not a factor in the results. Two conclusions follow from the findings: (a) More time and resources should be committed to expanding the size and quality of item banks, and (b) designs that individualize an exam administration such as MST and CAT may not be helpful when the primary purpose of the examination is to make pass-fail decisions and conditions are present for using parallel forms with a target information function that can be centered on the passing score.
Article
Full-text available
Despite the widespread use of exploratory factor analysis in psychological research, researchers often make questionable decisions when conducting these analyses. This article reviews the major design and analytical decisions that must be made when conducting a factor analysis and notes that each of these decisions has important consequences for the obtained results. Recommendations that have been made in the methodological literature are discussed. Analyses of 3 existing empirical data sets are used to illustrate how questionable decisions in conducting factor analyses can yield problematic results. The article presents a survey of 2 prominent journals that suggests that researchers routinely conduct analyses using such questionable methods. The implications of these practices for psychological research are discussed, and the reasons for current practices are reviewed.
Article
Full-text available
Bariatric surgery patients are required to receive psychological clearance before they are eligible for surgery. In spite of this, there are no standard assessment practices or tests designed specifically for these evaluations. The objective of this study is to determine the reliability and construct validity of the PsyBari, a psychological test designed for bariatric surgery patients. The PsyBari was administered to 752 patients. Internal consistency reliability and exploratory factor analyses were conducted. Items with high percentages of missing data, low communalities, and low item loadings were identified and deleted. Cronbach's α = 0.930 (0.940 for males and 0.927 for females). Six factors were obtained for each gender: for females, awareness of eating habits, early life problems due to weight, dysphoric feelings about weight, weight-related impairment, surgical anxiety, and guilty feelings related to eating; for males, physical impairment with depression, awareness of eating habits, early life problems due to weight, interpersonal support with anxiety about weight, anger, and guilty feelings about eating habits. Results indicate that there are unique psychometric parameters when constructing tests for bariatric surgery patients. The PsyBari has good overall reliability, although two of the 11 subscales have poor reliability. Factor analyses revealed six factors for each gender. Some factors were common for both genders, some were unique for each gender, and some consisted of mixed constructs.
Article
Full-text available
Several methods have been devised to estimate shoulder function, none of which is entirely satisfactory. The method described in this article is applicable irrespective of the details of the diagnostic or radiologic abnormalities caused by disease or injury. The method records individual parameters and provides an overall clinical functional assessment. It is accurately reproducible by different observers and is sufficiently sensitive to reveal even small changes in function. The method is easy to perform and requires a minimal amount of time for evaluation of large population groups.
Article
Full-text available
To apply the Rasch measurement model to the development of a clinical tool for measuring manual (dis)ability (ABILHAND). Manual ability was evaluated in terms of the difficulty perceived by a hand-impaired patient on 57 representative unimanual or bimanual activities. A clinical laboratory. Eighteen rheumatoid arthritis patients (14 women, 4 men) were interviewed after wrist arthrodesis (10 right, 4 left, and 4 both wrists). Their ages ranged from 38 to 77 years, time since diagnosis ranged from 7 to 41 years, and time since surgery ranged from 0.5 to 17 years. ABILHAND, administered at a mean duration of 7 years after arthrodesis. Forty-six of the 57 items define a common, single manual ability continuum with widespread measurement range and regular item distribution. Items relating to feeding, grooming, and dressing upper body worked consistently with their counterparts in other disability scales. More difficult items extend the measurement range beyond that of most existing manual ability scales. Even in a small sample of patients, using the Rasch methodology enabled the investigators to produce a useful scale of manual (dis)ability and to define manual ability as a unique construct, at least in patients with rheumatoid arthritis.
Article
Full-text available
Responsiveness is an important property of an outcomes questionnaire. It can be defined as the ability of an instrument to capture important changes in a patient's health status over time. The authors previously designed the Michigan Hand Outcomes Questionnaire (MHQ), a hand-specific outcomes instrument that contains six distinct scales: (1) overall hand function, (2) activities of daily living, (3) pain, (4) work performance, (5) aesthetics, and (6) patient satisfaction with hand function. In the first study, the authors demonstrated that the MHQ is a reliable and valid instrument for the hand. The purpose of this second study is to assess the responsiveness, or sensitivity, of the MHQ to clinical change in patient status. A total of 187 consecutive patients with chronic hand disorders completed a baseline MHQ prior to receiving treatment at a university plastic surgery clinic. Approximately 6 to 18 months after completing the first questionnaire, patients were sent a follow-up MHQ by mail. The second questionnaire was identical to the first, with the exception of one additional question added to each of the six MHQ scales. This additional question asked patients to rate the change in their hands since completing the last questionnaire using a seven-point response scale. Spearman's correlation coefficient was used to correlate the responses from patients' self-assessment questions with the actual score change (after score - before score). The response rate for the second administration was 49% (92 questionnaires returned)-a fairly good rate of return for mail surveys. There were no significant differences in gender, race, education, and income between responders and nonresponders. When patients' self-assessment of change was correlated with the change in the six scale scores over time, all six correlations were statistically significant, with p < 0.05. The correlations ranged from 0.25 for the aesthetics scale to 0.43 for the pain scale. The MHQ was responsive using patients' self-assessment of their clinical change. Future studies will evaluate the responsiveness of the MHQ compared with objective physiological measures such as grip strength, range of motion, and the Jebson-Taylor test. Additionally, research is underway to assess the responsiveness of the MHQ for specific procedures, including metacarpophalangeal arthroplasties for rheumatoid arthritis and microvascular toe-to-hand reconstructions.
Article
Full-text available
The Disabilities of the Arm, Shoulder and Hand (DASH) outcome measure was developed to evaluate disability and symptoms in single or multiple disorders of the upper limb at one point or at many points in time. The purpose of this study was to evaluate the reliability, validity, and responsiveness of the DASH in a group of diverse patients and to compare the results with those obtained with joint-specific measures. Two hundred patients with either wrist/hand or shoulder problems were evaluated by use of questionnaires before treatment, and 172 (86%) were re-evaluated 12 weeks after treatment. Eighty-six patients also completed a test-retest questionnaire three to five days after the initial (baseline) evaluation. The questionnaire package included the DASH, the Brigham (carpal tunnel) questionnaire, the SPADI (Shoulder Pain and Disability Index), and other markers of pain and function. Correlations or t-tests between the DASH and the other measures were used to assess construct validity. Test-retest reliability was assessed using the intraclass correlation coefficient and other summary statistics. Responsiveness was described using standardized response means, receiver operating characteristics curves, and correlations between change in DASH score and change in scores of other measures. Standard response means were used to compare DASH responsiveness with that of the Brigham questionnaire and the SPADI in each region. The DASH was found to correlate with other measures (r > 0.69) and to discriminate well, for example, between patients who were working and those who were not (p<0.0001). Test-retest reliability (ICC = 0.96) exceeded guidelines. The responsiveness of the DASH (to self-rated or expected change) was comparable with or better than that of the joint-specific measures in the whole group and in each region. Evidence was provided of the validity, test-retest reliability, and responsiveness of the DASH. This study also demonstrated that the DASH had validity and responsiveness in both proximal and distal disorders, confirming its usefulness across the whole extremity.
Article
Full-text available
Rating scales are employed as a means of extracting more information out of an item than would be obtained from a mere "yes/no", "right/wrong" or other dichotomy. But does this additional information increase measurement accuracy and precision? Eight guidelines are suggested to aid the analyst in optimizing the manner in which rating scales categories cooperate in order to improve the utility of the resultant measures. Though these guidelines are presented within the context of Rasch analysis, they reflect aspects of rating scale functioning which impact all methods of analysis. The guidelines feature rating-scale-based data such as category frequency, ordering, rating-to-measure inferential coherence, and the quality of the scale from measurement and statistical perspectives. The manner in which the guidelines prompt recategorization or reconceptualization of the rating scale is indicated. Utilization of the guidelines is illustrated through their application to two published data sets.
Article
Full-text available
To establish a questionnaire to quantify the extent of the function and activities of the hand in patients with degenerative or inflammatory disease of the hand and finger joints. One hundred and seventy-two patients with osteoarthritis (OA, n = 69) or rheumatoid arthritis (RA, n = 103) completed a new questionnaire, the SACRAH, that included 23 visual analogue scales covering the extent of hand function, stiffness and level of pain. SACRAH scores may range from 0 to 100. Comparing all studied patients, there was no significant difference in SACRAH scores between OA and RA patients (34 vs 32, not significant). Scores for both patient groups differed significantly from those for 30 healthy controls. Among patients taking NSAIDs only, individuals suffering from OA (n = 50) scored significantly lower than RA patients (n = 42) (36 vs 48, P < 0.004). Sixty-one RA patients taking DMARDs scored lower than the RA patient group treated with NSAIDs only (20 vs 48, P < 0.0001). Thirty-two RA patients were evaluated longitudinally at their first visit and 3 months after the initiation of DMARDs. Following therapy, SACRAH scores were significantly reduced from 50 to 11 (P < 0.0001). The questionnaire enables the quantification of compromised hand function, stiffness and pain in OA and RA patients, and is sensitive to therapy-related changes in RA patients.
Article
Full-text available
To identify all available shoulder disability questionnaires designed to measure physical functioning and to evaluate evidence for the clinimetric quality of these instruments. Systematic literature searches were performed to identify self administered shoulder disability questionnaires. A checklist was developed to evaluate and compare the clinimetric quality of the instruments. Two reviewers identified and evaluated 16 questionnaires by our checklist. Most studies were found for the Disability of the Arm, Shoulder, and Hand scale (DASH), the Shoulder Pain and Disability Index (SPADI), and the American Shoulder and Elbow Surgeons Standardised Shoulder Assessment Form (ASES). None of the questionnaires demonstrated satisfactory results for all properties. Most questionnaires claim to measure several domains (for example, pain, physical, emotional, and social functioning), yet dimensionality was studied in only three instruments. The internal consistency was calculated for seven questionnaires and only one received an adequate rating. Twelve questionnaires received positive ratings for construct validity, although depending on the population studied, four of these questionnaires received poor ratings too. Seven questionnaires were shown to have adequate test-retest reliability (ICC >0.70), but five questionnaires were tested inadequately. In most clinimetric studies only small sample sizes (n<43) were used. Nearly all publications lacked information on the interpretation of scores. The DASH, SPADI, and ASES have been studied most extensively, and yet even published validation studies of these instruments have limitations in study design, sample sizes, or evidence for dimensionality. Overall, the DASH received the best ratings for its clinimetric properties.
Conference Paper
Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods.
Article
This article is at www.rasch.org/rmt/rmt83b.htm
Article
Acknowledgements 1. Introduction Part One: Interactional Analysis 2. Using Behavioral Coding to Indentify Cognitive Problems with Survey Questions(Floyd Jackson Fowler Hr., and Charles F. Cannell) 3. Questionnaire Pretesting: Computer-Assisted Coding of Concurrent Protocols(Ruth N. Bolton and Tima M, Bronkhorst) 4. From Paradigm to Prototype and Back Again: Interactive Aspects of Cognitive Processing in Standardized Survey Interviews(Nora Cate Schaffer and Douglas W. Maynard) Part Two: Verbal Protocols 5. The Validity and Consequnces of Verbal Reports About Attitudes (Timothy D. Wilson, Suzanne J. LaFleur, and D. Eric Amderson) 6. Expanding and Enhancing the Use of Verbal Protocols in Survey Research(Barbara Bickart and E. Marla Felcher) 7. Integrating Questionnaire Design with a Cognitive Computational Model of Human Question Answering(Arthur C. Graesser. Sailaja Bommareddy, Shane Swamer, and Jonathon M. Golding) Part Three: Other Methods for Determining Cognitive Processes 8. Cognitive Interviewing Techniques: In the Lab and in the Field(Theresa J. DeMaio and Jennifer M. Rothgeb) 9. Cognitve Techniques in Interviewing Older People(Jared B. Jobe, Donald M. Kellerm, and Albert F. Smith) 10. An Individual Differences Perspective in Assessing Cognitive Processes(Richard E. Petty and W. Blair G. Jarvis) 11. A Coding System for Appraising Questionnaires(Judith T, Lessler and Barbara H. Forsyth) 12. Exemplar Generation: Assessing How Respondents Give Meaning to Rating Scales(Thomas M. Ostrom and Katherine M. Gannon) 13. The How and Why of Response Latency Measurement in Telephone Surveys(John N. Bassili) 14. Implicit Memory and Survey Measurement(Mahzarin R. Banji, Irene V. Blair, and Norbert Schwarz) 15. Use of Sorting Tasks to Assess Cognitive Structures(Marilynn B. Brewer and Layton N. Lui) Part Four: Conclusion 16. How Do We Know What We Think They Think Is Really What They Think?(Robert M. Groves).
Book
Ronald K. Hambleton; H. Swaminathan; H. Jane Rogers., The following values have no corresponding Zotero field: Label: B496 ID - 337
Book
A revision will be coming out in the next few months.
Article
Previous studies of expert physical therapists have sampled therapists based on years of clinical experience or reputation, not on their patients' clinical outcomes. The purposes of this study were to identify expert physical therapists by using patient self-reported outcomes and to describe the characteristics of clinicians whose patients with lumbar spine syndromes reported higher health-related quality of life (HRQL) following rehabilitation. Retrospective data were analyzed on 24276 patients (mean age=47.8 years, SD=16, range=14-97) with lumbar spine syndromes treated by 930 physical therapists participating in the Focus On Therapeutic Outcomes database in 1999-2000. Physical therapists and staff answered questions concerning years of experience and practice setting when starting their participation in the outcomes system. Patient self-report HRQL data were collected at intake and discharge from outpatient rehabilitation. Discharge HRQL data were risk adjusted using patient characteristics. Data were aggregated by physical therapist. Risk-adjusted discharge HRQL scores were used to classify physical therapists whose patients reported mean HRQL improvement above the 90th percentile as experts and physical therapists whose patients reported mean HRQL improvement between the 45th and 55th percentiles as average. Therapists classified as expert had fewer patients in the database than did therapists classified as average (mean SD) (19 +/-17 versus 29 +/-22). Mean treatment duration was different between groups (32 +/- 11 days for the expert group versus 31+/-8 days for the average group). The results challenge assumptions that extensive clinical experience is necessary to achieve superior patient outcomes, and they provide information about the relationship between therapist characteristics and patient outcomes.
Article
The American Shoulder and Elbow Surgeons have adopted a standardized form for assessment of the shoulder. The form has a patient self-evaluation section and a physician assessment section. The patient self-evaluation section of the form contains visual analog scales for pain and instability and an activities of daily living questionnaire. The activities of daily living questionnaire is marked on a four-point ordinal scale that can be converted to a cumulative activities of daily living index. The patient can complete the self-evaluation portion of the questionnaire in the absence of a physician. The physician assessment section includes an area to collect demographic information and assesses range of motion, specific physical signs, strength, and stability. A shoulder score can be derived from the visual analogue scale score for pain (50%) and the cumulative activities of daily living score (50%). It is hoped that adoption of this instrument to measure shoulder function will facilitate communication between investigators, stimulate multicenter studies, and encourage validity testing of this and other available instruments to measure shoulder function and outcome.
Article
The increasing use of computerized adaptive tests (CATs) to generate outcome measures during rehabilitation has prompted questions concerning score interpretation. The purpose of this study was to describe meaningful interpretations of functional status (FS) outcome measures estimated with a body part-specific CAT developed from the Lower-Extremity Functional Scale (LEFS). This investigation was a prospective cohort study of 8,714 people who had hip impairments and were receiving physical therapy in 257 outpatient clinics in 31 states (United States) between January 2005 and June 2007. Four approaches were used to clinically interpret outcome data. First, the standard error of the estimate was used to construct the 90% confidence interval for each CAT-generated score estimate. Second, percentile ranks were applied to FS scores. Third, 2 threshold approaches were used to define individual subject-level change: statistically reliable change and clinically important change. The fourth approach was a functional staging method. The precision of a single score was estimated from the FS score +/-4. On the basis of the score distribution, 25th, 50th, and 75th percentile ranks corresponded to intake FS scores of 40, 48, and 59 and discharge FS scores of 50, 61, and 75, respectively. The reliable change index supported the conclusion that changes in FS scores of 7 or more units represented statistically reliable change, and receiver operating characteristic analyses supported the conclusion that changes in FS scores of 6 or more units represented minimal clinically important improvement. Participants were classified into 5 hierarchical levels of FS using a functional staging method. Because this study was a secondary analysis of prospectively collected data via a proprietary database management company, generalizability of results may be limited to participating clinics. The results demonstrated how outcome measures generated from the hip LEFS CAT can be interpreted to improve clinical meaning. This finding might facilitate the use of patient-reported outcomes by clinicians during rehabilitation services.
Article
Unlabelled: Many clinics and payers are beginning programs to collect and interpret outcomes related to quality of care and provider performance (ie, benchmarking). Outcomes: assessment is commonly done using observational research designs, which makes it important for those involved in these endeavors to appreciate the underlying challenges and limitations of these designs. This perspective article discusses the advantages and limitations of using observational research to evaluate quality of care and provider performance in order to inform clinicians, researchers, administrators, and policy makers who want to use data to guide practice and policy or critically appraise observational studies and benchmarking efforts. Threats to internal validity, including potential confounding, patient selection bias, and missing data, are discussed along with statistical methods commonly used to address these limitations. An example is given from a recent study comparing physical therapy clinic performance in terms of patient outcomes and service utilization with and without the use of these methods. The authors demonstrate that crude differences in clinic outcomes and service utilization tend to be inflated compared with the differences that are statistically adjusted for selected threats to internal validity. The authors conclude that quality of care measurement and ranking procedures that do not use similar methods may produce findings that may be misleading.
Article
The construction of an instrument including a number of tests requires an analysis of its structure and its unidimensionality (which allows calculation of global score), and the determination of the difficulty level of various tests. This study examined a tool including 67 tests designed to evaluate the functional ability of patients with an injured upper limb. The patients seen in a rehabilitation centre during 12 months (173 subjects) were evaluated by the occupational therapists familiar with the tool. The statistical analyses were made using the principal component analysis method (PCAM), the Cronbach's coefficient and the Rasch model. The PCAM showed 3 principal factors which explained 44%, 10% and 4% of the total variance respectively in the case of patients with injured dominant limb. The predominance of the first axis and the high ratio of first by second eigenvalues suggested the unidimensionality of the tool. The Cronbach's value of 0.97 attested the good congruence of the items. The results obtained with the Rasch model seemed to be consistent with the hypothesis of the unidimensionality of the tool. This analysis also provided the difficulty scale of various tests. Similar results were obtained in patients with injured non dominant limb or with all the sample. The methods used provide complementary results.
Article
Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods.
Article
The purpose of this research is twofold. First is to extend the work of Smith (1992, 1996) and Smith and Miao (1991, 1994) in comparing item fit statistics and principal component analysis as tools for assessing the unidimensionality requirement of Rasch models. Second is to demonstrate methods to explore how violations of the unidimensionality requirement influence person measurement. For the first study, rating scale data were simulated to represent varying degrees of multidimensionality and the proportion of items contributing to each component. The second study used responses to a 24 item Attention Deficit Hyperactivity Disorder scale obtained from 317 college undergraduates. The simulation study reveals both an iterative item fit approach and principal component analysis of standardized residuals are effective in detecting items simulated to contribute to multidimensionality. The methods presented in Study 2 demonstrate the potential impact of multidimensionality on norm and criterion-reference person measure interpretations. The results provide researchers with quantitative information to help assist with the qualitative judgment as to whether the impact of multidimensionality is severe enough to warrant removing items from the analysis.
Article
Osteoarthritis is one of the most common joint disorders in the elderly, yet few studies have targeted symptomatic osteoarthritis, especially symptomatic hand osteoarthritis. The authors conducted a survey in 1992-1993 among an elderly population to estimate the prevalence of symptomatic hand osteoarthritis and to assess its impact on grip strength and functional activities. Framingham Study subjects received hand radiographs and answered queries on joint symptoms. Functional activities were assessed using an interviewer-administered questionnaire. Grip strength and observed functional performance were evaluated using standard procedures. A hand joint was defined as having symptomatic osteoarthritis if both symptoms and radiographic evidence of osteoarthritis were present. Of 1,041 subjects aged 71-100 years (36% men), the prevalence of symptomatic hand osteoarthritis was higher in women (26.2%) than in men (13.4%). Compared with those without symptomatic hand osteoarthritis, subjects with the disease had 10% reduced maximal grip strength, reported more difficulty writing, handling, or fingering small objects (odds ratio = 3.4), and showed more self-reported and observed difficulty carrying a 10-pound (4.5-kg) bundle (odds ratio = 1.7 and 1.6, respectively). In conclusion, in the context of a remarkable paucity of data on the epidemiology of symptomatic hand osteoarthritis, this study suggests that symptomatic hand osteoarthritis is a common disease among elders and frequently impairs hand function.
Article
Short-form outcomes measures are becoming common in response to demands for increased efficiency in health care. This study examines Rasch measurement as an aid to selecting items for short form tests. The focus of this paper is on maintaining test quality while reducing items. The separation ratio (SR) aids item reduction by indicating how removing items impacts measurement precision. Results of the SR and coefficient alpha are compared. To demonstrate the use of Rasch measurement to shorten clinical outcomes measures and to compare the separation ratio and coefficient alpha in evaluating when item reduction improved efficiency without sacrificing measurement precision. Retrospective analysis of existing health outcomes data. A convenience sample of 58 patients receiving cataract surgery. The 14 items of the VF-14 (a measure of visual functioning), the published subset of items from this test (the VF-7), and 5 other 7-item combinations of the items. The largest coefficient alpha was obtained from the VF14 (.84) while the largest separation ratio (2.67) was obtained from the 7-item subtest with the reduced rating scale. This study demonstrated one way that Rasch measurement can be helpful in selecting items for minimum item sets while maintaining test precision. Both alpha and the separation ratio provide information about how a sample performed with a given test although variations in measurement precision may not always be detected with alpha.
Article
The purpose of this study was to develop an easy-to-use and psychometrically sound outcome instrument that is task-oriented and patient-centred. One hundred fifteen patients with a variety of hand impairments completed a rating scale of perceived manual ability (i.e., the Manual Ability Measure). The first 70 patients also completed two other questionnaires about physical health and psychological well-being. Rasch Analyses were conducted to transform the ordinal ratings into linear measures; Rasch statistics were used to evaluate its measurement properties at both scale and item levels. Eighty-three original items were reduced to 16 common tasks; Rasch reliabilities were good; the easy-to-difficult item hierarchy makes sense clinically. Moderate correlations were found between manual ability, physical function and general sense of well-being. The results of this preliminary study suggest that the MAM is a promising outcome measure that has adequate psychometric properties and can be used to complement other objective clinical measurements.
Article
To test unidimensionality and local independence of a set of shoulder functional status (SFS) items, develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (theta(IRT)) and measures generated using the simulated CAT (theta(CAT)). We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients with shoulder impairments who completed 60 SFS items. Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The theta(IRT) and theta(CAT) measures were highly correlated (r = .96) and resulted in similar classifications of patients. The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good discriminating ability.
Article
The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment.
Article
The Patient-Reported Outcomes Measurement Information System (PROMIS) was initiated to improve precision, reduce respondent burden, and enhance the comparability of health outcomes measures. We used item response theory (IRT) to construct and evaluate a preliminary item bank for physical function assuming four subdomains. Data from seven samples (N=17,726) using 136 items from nine questionnaires were evaluated. A generalized partial credit model was used to estimate item parameters, which were normed to a mean of 50 (SD=10) in the US population. Item bank properties were evaluated through Computerized Adaptive Test (CAT) simulations. IRT requirements were fulfilled by 70 items covering activities of daily living, lower extremity, and central body functions. The original item context partly affected parameter stability. Items on upper body function, and need for aid or devices did not fit the IRT model. In simulations, a 10-item CAT eliminated floor and decreased ceiling effects, achieving a small standard error (< 2.2) across scores from 20 to 50 (reliability >0.95 for a representative US sample). This precision was not achieved over a similar range by any comparable fixed length item sets. The methods of the PROMIS project are likely to substantially improve measures of physical function and to increase the efficiency of their administration using CAT.
Article
The purpose of this paper is to show how the Rasch model can be used to develop a computer adaptive self-report of walking, climbing, and running. Our instrument development work on the walking/climbing/running construct of the ICF Activity Measure was used to show how to develop a computer adaptive test (CAT). Fit of the items to the Rasch model and validation of the item difficulty hierarchy was accomplished using Winsteps software. Standard error was used as a stopping rule for the CAT. Finally, person abilities were connected to items difficulties using Rasch analysis 'maps'. All but the walking one mile item fit the Rasch measurement model. A CAT was developed which selectively presented items based on the last calibrated person ability measure and was designed to stop when standard error decreased to a pre-set criterion. Finally, person ability measures were connected to the ability to perform specific walking/climbing/ running activities using Rasch maps. Rasch measurement models can be useful in developing CAT measures for rehabilitation and disability. In addition to CATs reducing respondent burden, the connection of person measures to item difficulties may be important for the clinical interpretation of measures.
Digest of data on persons with disabilities
  • T Barr
The Michigan Hand Outcomes Questionnaire (MHQ): assessment of responsiveness to clinical change
  • Kc Chung
  • Jb Hamil
  • Mr Walters
  • Ra Hayward
A practical tool for evaluating function: the simple shoulder test. The shoulder: a balance of mobility and stabilityIn
  • S B Lippitt
  • D T Harryman
  • F A Matsen
Rating scale analysisChicago: Measurement, Evaluation, Statistics and Assessment Press
  • B D Wright
  • G N Masters
WINSTEPS Rasch measurement
  • J M Linacre
Best test designChicago: MESA Press
  • B D Wright
  • M H Stone
  • Wainer H
Mplus Los Angeles, CA: Muthen & Muthen
  • L Muthen
  • B Muthen