Article

Da Costa and colleagues' criticism of PEDro scores is not supported by the data

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... 18,19 However, unsurprisingly, there is debate about the pitfalls of using summary scores to assess study quality and risk of bias. 17,20 Although modern validation studies increasingly use item response theory (IRT) to examine the discrimination of different items included in a scale and their coverage of the latent (underlying) construct, 21,22 no such studies have been performed for the PEDro scale. We therefore examined the construct validity of the PEDro scale using IRT models in a large sample of physiotherapy trials. ...
Article
Full-text available
Background: There is agreement that the methodological quality of randomized trials should be assessed in systematic reviews, but debate on how this should be done. We conducted a construct validation study of the Physiotherapy Evidence Database (PEDro) scale, which is widely used to assess the quality of trials in physical therapy and rehabilitation. Methods: We analyzed 345 trials that were included in Cochrane reviews, and for which a PEDro summary score was available. We used one- and two-parameter logistic item response theory (IRT) models to study the psychometric properties of the PEDro scale, and assessed the items' difficulty and discrimination parameters. We ran goodness of fit post-estimations and examined the IRT unidimensionality assumption with a multidimensional IRT model (MIRT). Results: Out of a maximum of 10, the mean PEDro summary score was 5.46 (SD=1.51). The allocation concealment and intention-to-treat scale items contributed most of the information on the underlying construct (with discriminations of 1.79 and 2.05, respectively) at similar difficulties (0.63 and 0.65, respectively). The other items provided little additional information, and did not distinguish trials of different quality. There was substantial evidence of departure from the unidimensionality assumption, suggesting that the PEDro items relate to more than one latent trait. Conclusions: Our findings question the construct validity of the PEDro scale to assess the methodological quality of clinical trials. PEDro summary scores should not be used; rather the physiotherapy community should consider working with the individual items of the scale.
... Two independent reviewers assessed the study quality using the 11 item PEDro scale [27][28][29]. A PEDro score of 6 or greater was considered as an adequate quality study [28,[30][31][32]. ...
Article
Objective: This systematic review to aimed to evaluate the effects of orthopaedic manual therapy (OMT) on pain, improving function, and physical performance in patients with knee osteoarthritis (OA). Data sources: Four databases (PubMed, Web of Science, CENTRAL, and CINAHL) were searched. Study selection: Trials were required to compare OMT alone or OMT in combination with exercise therapy, with exercise therapy alone or control. Data extraction: Data extraction and risk assessment were done by two independent reviewers. Outcome measures were visual analogue scale (VAS), Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) pain score, WOMAC function score, WOMAC global score, and stairs ascending-descending time. Results: Eleven randomized controlled trials were included (494 subjects), four of which had a PEDro score of 6 or higher, indicating adequate quality. The results of the meta-analysis indicated that reduction of VAS score in OMT compared with the control group was statistically insignificant (SDM: -0.59; 95% CI: -1.54 to -0.36; P=0.224). The reduction of VAS score in OMT compared with exercise therapy group was statistically significant (SDM: -0.78; 95% CI: -1.42 to -0.17; P=0.013). The reduction of WOMAC pain score in OMT compared with the exercise therapy group was statistically significant (SDM: -0.79; 95% CI: -1.14 to -0.43; P=0.001). Similarly, the reduction of WOMAC function score in OMT compared with the exercise therapy group was statistically significant (SDM: -0.85; 95% CI: -1.20 to -0.50; P=0.001). However, the reduction of WOMAC global score in OMT compared with the exercise therapy group was statistically insignificant (SDM: -0.23; 95% CI: -0.54 to -0.09; P=0.164). The reduction of stairs ascending-descending time in OMT compared with the exercise therapy group was statistically significant (SDM: -0.88; 95% CI: -1.48 to -0.29; P=0.004). Conclusions: This review indicated OMT compared with exercise therapy alone provides short-term benefits in reducing pain, improving function, and physical performance in patients with knee OA. Review registration: PROSPERO 2016:CRD42016032799.
... [22][23][24] Ten of the 11 items on the scale are scored for a summed score that can range from 0 to 10 with higher scores reflecting higher methodological quality. 22,25 A cutoff score of five or greater was used to indicate high methodological quality. 26 Two independent raters (R.A.C. and K.R.A.) scored each study. ...
Article
Objective: To examine the role of psychosocial interventions in improving patient-reported clinical outcomes, including return to sport/activity, and intermediary psychosocial factors after anterior cruciate ligament reconstruction. Methods: MEDLINE/PubMed, CINAHL, PsycINFO, and Web of Science were searched from each database's inception to March 2017 for published studies in patients after anterior cruciate ligament reconstruction. Studies were included if they reported on the effects of a postoperative psychosocial intervention on a patient-reported clinical measure of disability, function, pain, quality of life, return to sport/activity, or intermediary psychosocial factor. Data were extracted using a standardized form and summary effects from each article were compiled. The methodological quality of randomized trials was assessed using the Physiotherapy Evidence Database Scale and scores greater than 5/10 were considered high quality. Results: A total of 893 articles were identified from the literature search. Of these, four randomized trials ( N = 210) met inclusion criteria. The four articles examined guided imagery and relaxation, coping modeling, and visual imagery as postoperative psychosocial interventions. Methodological quality scores of the studies ranged from 5 to 9. There were inconsistent findings for the additive benefit of psychosocial interventions for improving postoperative function, pain, or self-efficacy and limited evidence for improving postoperative quality of life, anxiety, or fear of reinjury. No study examined the effects of psychosocial interventions on return to sport/activity. Conclusion: Overall, there is limited evidence on the efficacy of postoperative psychosocial interventions for improving functional recovery after anterior cruciate ligament reconstruction.
... In both cases, arbitrary cut points were applied to the total PEDro score and these were compared to an abbreviated Cochrane risk of bias tool (three items onlydsequence generation, allocation concealment, blinding of outcome assessment). Importantly, we have shown that when the full version of each instrument is used (and analyses are corrected for the imperfect reliability of the instruments), the PEDro scale and Cochrane risk of bias tool are highly correlated [31]. This concordance is to be expected as the two scales contain quite similar items. ...
Article
Introduction: The Physiotherapy Evidence Database (PEDro) scale has been widely used to investigate methodological quality in physiotherapy randomised controlled trials; however its validity has not been tested for pharmaceutical trials. The aim of this study was to investigate the validity and inter-rater reliability of the PEDro scale for pharmaceutical trials. The reliability was also examined for the Cochrane Back and Neck (CBN) Group risk of bias tool. Methods: This is a secondary analysis of data from a previous study. We considered randomised placebo controlled trials evaluating any pain medication for chronic spinal pain or osteoarthritis. Convergent validity was evaluated by correlating the PEDro score with the summary score of the CBN risk of bias tool. The construct validity was tested using a linear regression analysis to determine the degree to which the total PEDro score is associated with treatment effect sizes, journal impact factor and the summary score for the CBN risk of bias tool. The inter-rater reliability was estimated using the Prevalence and Bias Adjusted Kappa (PABAK) coefficient and 95%CI for the PEDro scale and CBN risk of bias tool. Results: Fifty-three trials were included, with 91 treatment effect sizes included in the analyses. The correlation between PEDro scale and CBN risk of bias tool was 0.83 (95% CI 0.76 to 0.88) after adjusting for reliability, indicating strong convergence. The PEDro score was inversely associated with effect sizes, significantly associated with the summary score for the CBN risk of bias tool, and not associated with the journal impact factor. The inter-rater reliability for each item of the PEDro scale and CBN risk of bias tool was at least substantial for most items (>0.60). The ICC for the PEDro score was 0.80 (95% CI 0.68 to 0.88), and for the CBN risk of bias tool was 0.81 (95% CI 0.69 to 0.88). Conclusion: There was evidence for the convergent and construct validity for the PEDro scale when used to evaluate methodological quality of pharmacological trials. Both risk of bias tools have acceptably high inter-rater reliability.
... The debate on how best to assess the risk of bias of RCTs included in meta-analytic research has resurfaced recently in the field of physical therapy, where the Physiotherapy Evidence Database (PEDro) scale is widely used [12,15]. Ten items (see S1 Table) contribute to a summary score, where a score of 5 or 6 typically defines adequate trial quality [12,[16][17][18]. ...
Article
Full-text available
There is debate on how the methodological quality of clinical trials should be assessed. We compared trials of physical therapy (PT) judged to be of adequate quality based on summary scores from the Physiotherapy Evidence Database (PEDro) scale with trials judged to be of adequate quality by Cochrane Risk of Bias criteria. Meta-epidemiological study within Cochrane Database of Systematic Reviews. Meta-analyses of PT trials were identified in the Cochrane Database of Systematic Reviews. For each trial PeDro and Cochrane assessments were extracted from the PeDro and Cochrane databases. Adequate quality was defined as adequate generation of random sequence, concealment of allocation, and blinding of outcome assessors (Cochrane criteria) or as trials with a PEDro summary score ≥5 or ≥6 points. We combined trials of adequate quality using random-effects meta-analysis. Forty-one Cochrane reviews and 353 PT trials were included. All meta-analyses included trials with PEDro scores ≥5, 37 (90.2%) included trials with PEDro scores ≥6 and only 22 (53.7%) meta-analyses included trials of adequate quality according to the Cochrane criteria. Agreement between PeDro and Cochrane was poor for PeDro scores of ≥5 points (kappa = 0.12; 95% CI 0.07 to 0.16) and slight for ≥6 points (kappa 0.24; 95% CI 0.16-0.32). When combining effect sizes of trials deemed to be of adequate quality according to PEDro or Cochrane criteria, we found that a substantial difference in the combined effect size (≥0.15) was evident in 9 (22%) out of the 41 meta-analyses for PEDro cutoff ≥5 and 10 (24%) for cutoff ≥6. The PeDro and Cochrane approaches lead to different sets of trials of adequate quality, and different combined treatment estimates from meta-analyses of these trials. A consistent approach to assessing RoB in trials of physical therapy should be adopted.
Article
Introduction: Given the prevalence of motor deficits post-stroke, a large proportion of stroke rehabilitation interventions are directed toward motor recovery. Objectives: The purpose of this study is to present a detailed investigation of the methodological characteristics in the stroke rehabilitation literature with respect to randomized controlled trials (RCTs) designed to facilitate upper extremity motor recovery. Methods: This review was conducted following guidelines from the Preferred Reporting Items for Systematic reviews and Meta Analyses (PRISMA) statement. English articles of RCTs were eligible if they were published before April 1, 2021 and applied an intervention to the hemiparetic upper extremity of individuals post stroke as the primary objective of the study, or recorded at least one upper extremity related outcome measure. Results: The number of RCTs for upper extremity rehabilitation interventions post stroke has been increasing, with over three quarters of RCTs published in the last decade. In total, 1,307 RCTs met inclusion criteria for which the mean sample size (start/finish) was 45.8 (SD 55.4)/41.8 (SD 49.7). The median sample size (start/finish) was 30 (IQR 20-48)/29 (IQR 19-44). The mean PEDro score was 6.12 (SD 1.55). 251 RCTs (19%) were multi-centered trials. Key methodological measures of quality remain low including the blinding of assessors (59%), intention to treat analyses (42%) and concealed allocation (37%). Conclusions: There is a large number of RCTs evaluating stroke rehabilitation upper extremity interventions. Research quality continues to be a challenge (low percentage of key quality indicators, small percentage of multicentred trials, small sample sizes) but is slowly improving.
Article
Full-text available
Low quality clinical trials have a possibility to have errors in the process of deriving the results and therefore distort the study. Quality assessment of clinical trial is necessary in order to prevent any clinical application erroneous results is important. Randomized controlled trial (RCT) is a design for evaluate the effectiveness of medical procedure. This study was conducted by extracting the RCTs from the original articles published in the Journal of Korean Medical Science (JKMS) from 1986 to 2011 and conducting a qualitative analysis using three types of analysis tools: Jadad scale, van Tulder scale and Cochrane Collaboration risk of bias Tool. To compare the quality of articles of JKMS, quality analysis of the RCTs published in Yonsei Medical Journal (YMJ) and Korean Journal of Internal Medicine was also conducted. In the JKMS, YMJ and Korean Journal of Internal Medicine, the quantitative increase of RCT presented over time was observed but no qualitative improvement of RCT was observed over time. From the results of this study, it is required for the researchers to plan for and perform higher quality studies.
Article
Full-text available
The objective of this study was to test the inter-rater reproducibility of the Portuguese version of the PEDro Scale. Seven physiotherapists rated the methodological quality of 50 reports of randomized controlled trials written in Portuguese indexed on the PEDro database. Each report was also rated using the English version of the PEDro Scale. Reproducibility was evaluated by comparing two separate ratings of reports written in Portuguese and comparing the Portuguese PEDro score with the English version of the scale. Kappa coefficients ranged from 0.53 to 1.00 for individual item and an intraclass correlation coefficient (ICC) of 0.82 for the total PEDro score was observed. The standard error of the measurement of the scale was 0.58. The Portuguese version of the scale was comparable with the English version, with an ICC of 0.78. The inter-rater reproducibility of the Brazilian Portuguese PEDro Scale is adequate and similar to the original English version.
Article
Full-text available
To evaluate the risk of bias tool, introduced by the Cochrane Collaboration for assessing the internal validity of randomised trials, for inter-rater agreement, concurrent validity compared with the Jadad scale and Schulz approach to allocation concealment, and the relation between risk of bias and effect estimates. Cross sectional study. Study sample 163 trials in children. Inter-rater agreement between reviewers assessing trials using the risk of bias tool (weighted kappa), time to apply the risk of bias tool compared with other approaches to quality assessment (paired t test), degree of correlation for overall risk compared with overall quality scores (Kendall's tau statistic), and magnitude of effect estimates for studies classified as being at high, unclear, or low risk of bias (metaregression). Inter-rater agreement on individual domains of the risk of bias tool ranged from slight (kappa=0.13) to substantial (kappa=0.74). The mean time to complete the risk of bias tool was significantly longer than for the Jadad scale and Schulz approach, individually or combined (8.8 minutes (SD 2.2) per study v 2.0 (SD 0.8), P<0.001). There was low correlation between risk of bias overall compared with the Jadad scores (P=0.395) and Schulz approach (P=0.064). Effect sizes differed between studies assessed as being at high or unclear risk of bias (0.52) compared with those at low risk (0.23). Inter-rater agreement varied across domains of the risk of bias tool. Generally, agreement was poorer for those items that required more judgment. There was low correlation between assessments of overall risk of bias and two common approaches to quality assessment: the Jadad scale and Schulz approach to allocation concealment. Overall risk of bias as assessed by the risk of bias tool differentiated effect estimates, with more conservative estimates for studies at low risk.
Article
Background and purpose: Assessment of the quality of randomized controlled trials (RCTs) is common practice in systematic reviews. However, the reliability of data obtained with most quality assessment scales has not been established. This report describes 2 studies designed to investigate the reliability of data obtained with the Physiotherapy Evidence Database (PEDro) scale developed to rate the quality of RCTs evaluating physical therapist interventions. Method: In the first study, 11 raters independently rated 25 RCTs randomly selected from the PEDro database. In the second study, 2 raters rated 120 RCTs randomly selected from the PEDro database, and disagreements were resolved by a third rater; this generated a set of individual rater and consensus ratings. The process was repeated by independent raters to create a second set of individual and consensus ratings. Reliability of ratings of PEDro scale items was calculated using multirater kappas, and reliability of the total (summed) score was calculated using intraclass correlation coefficients (ICC [1,1]). Results: The kappa value for each of the 11 items ranged from.36 to.80 for individual assessors and from.50 to.79 for consensus ratings generated by groups of 2 or 3 raters. The ICC for the total score was.56 (95% confidence interval=.47-.65) for ratings by individuals, and the ICC for consensus ratings was.68 (95% confidence interval=.57-.76). Discussion and conclusion: The reliability of ratings of PEDro scale items varied from "fair" to "substantial," and the reliability of the total PEDro score was "fair" to "good."
Article
The Cochrane Collaboration is strongly encouraging the use of a newly developed tool, the Cochrane Collaboration Risk of Bias Tool (CCRBT), for all review groups. However, the psychometric properties of this tool to date have yet to be described. Thus, the objective of this study was to add information about psychometric properties of the CCRBT including inter-rater reliability and concurrent validity, in comparison with the Effective Public Health Practice Project Quality Assessment Tool (EPHPP). Both tools were used to assess the methodological quality of 20 randomized controlled trials included in our systematic review of the effectiveness of knowledge translation interventions to improve the management of cancer pain. Each study assessment was completed independently by two reviewers using each tool. We analysed the inter-rater reliability of each tool's individual domains, as well as final grade assigned to each study. The EPHPP had fair inter-rater agreement for individual domains and excellent agreement for the final grade. In contrast, the CCRBT had slight inter-rater agreement for individual domains and fair inter-rater agreement for final grade. Of interest, no agreement between the two tools was evident in their final grade assigned to each study. Although both tools were developed to assess 'quality of the evidence', they appear to measure different constructs. Both tools performed quite differently when evaluating the risk of bias or methodological quality of studies in knowledge translation interventions for cancer pain. The newly introduced CCRBT assigned these studies a higher risk of bias. Its psychometric properties need to be more thoroughly validated, in a range of research fields, to understand fully how to interpret results from its application.
Article
To evaluate the convergent and construct validity of the Physiotherapy Evidence Database (PEDro) scale used to rate the methodological quality of randomized trials in physiotherapy. PEDro total scores and individual-item scores were extracted from 9,456 physiotherapy trials indexed on PEDro. Convergent validity was tested by comparing PEDro total scores with three other quality scales. Construct validity was tested by regressing the PEDro score and individual-item scores with the Institute for Scientific Information Web of Knowledge impact factors (IF) and SCImago journal rankings (SJR) for the journals in which the trials were published. Testing of convergent validity revealed correlations with the other quality scales ranging from 0.31 to 0.69. The PEDro total score was weakly but significantly associated with IF and SJR (P < 0.0001). Eight of the 10 individual scale items that contribute to the PEDro total score were significantly associated with IF. This study provides preliminary evidence of the convergent and construct validity of the PEDro total score and the construct validity of eight individual scale items.
Article
Osteoarthritis is the most common form of joint disease and the leading cause of pain and physical disability in the elderly. Transcutaneous electrical nerve stimulation (TENS), interferential current stimulation and pulsed electrostimulation are used widely to control both acute and chronic pain arising from several conditions, but some policy makers regard efficacy evidence as insufficient. To compare transcutaneous electrostimulation with sham or no specific intervention in terms of effects on pain and withdrawals due to adverse events in patients with knee osteoarthritis. We updated the search in CENTRAL, MEDLINE, EMBASE, CINAHL and PEDro up to 5 August 2008, checked conference proceedings and reference lists, and contacted authors. Randomised or quasi-randomised controlled trials that compared transcutaneously applied electrostimulation with a sham intervention or no intervention in patients with osteoarthritis of the knee. We extracted data using standardised forms and contacted investigators to obtain missing outcome information. Main outcomes were pain and withdrawals or dropouts due to adverse events. We calculated standardised mean differences (SMDs) for pain and relative risks for safety outcomes and used inverse-variance random-effects meta-analysis. The analysis of pain was based on predicted estimates from meta-regression using the standard error as explanatory variable. In this update we identified 14 additional trials resulting in the inclusion of 18 small trials in 813 patients. Eleven trials used TENS, four interferential current stimulation, one both TENS and interferential current stimulation, and two pulsed electrostimulation. The methodological quality and the quality of reporting was poor and a high degree of heterogeneity among the trials (I(2) = 80%) was revealed. The funnel plot for pain was asymmetrical (P < 0.001). The predicted SMD of pain intensity in trials as large as the largest trial was -0.07 (95% CI -0.46 to 0.32), corresponding to a difference in pain scores between electrostimulation and control of 0.2 cm on a 10 cm visual analogue scale. There was little evidence that SMDs differed on the type of electrostimulation (P = 0.94). The relative risk of being withdrawn or dropping out due to adverse events was 0.97 (95% CI 0.2 to 6.0). In this update, we could not confirm that transcutaneous electrostimulation is effective for pain relief. The current systematic review is inconclusive, hampered by the inclusion of only small trials of questionable quality. Appropriately designed trials of adequate power are warranted.
Article
Does the PEDro scale measure only one construct ie, the methodological quality of clinical trials? What is the hierarchy of items of the PEDro scale from least to most adhered to? Is there any effect of year of publication of trials on item adherence? Are PEDro scale ordinal scores equivalent to interval data? Rasch analysis of two independent samples of 100 clinical trials from the PEDro database scored using the PEDro scale. Both samples of PEDro data showed fit to the Rasch model with no item misfit. The PEDro scale item hierarchy was the same in both samples, ranging from the most adhered to item random allocation, to the least adhered to item therapist blinding. There was no differential item functioning by year of publication. Original PEDro ordinal scores were highly correlated with transformed PEDro interval scores (r = 0.99). The PEDro scale is a valid measure of the methodological quality of clinical trials. It is valid to sum PEDro scale item scores to obtain a total score that can be treated as interval level measurement and subjected to parametric statistical analysis.
Article
Assessment of the quality of randomized controlled trials (RCTs) is common practice in systematic reviews. However, the reliability of data obtained with most quality assessment scales has not been established. This report describes 2 studies designed to investigate the reliability of data obtained with the Physiotherapy Evidence Database (PEDro) scale developed to rate the quality of RCTs evaluating physical therapist interventions. In the first study, 11 raters independently rated 25 RCTs randomly selected from the PEDro database. In the second study, 2 raters rated 120 RCTs randomly selected from the PEDro database, and disagreements were resolved by a third rater; this generated a set of individual rater and consensus ratings. The process was repeated by independent raters to create a second set of individual and consensus ratings. Reliability of ratings of PEDro scale items was calculated using multirater kappas, and reliability of the total (summed) score was calculated using intraclass correlation coefficients (ICC [1,1]). The kappa value for each of the 11 items ranged from.36 to.80 for individual assessors and from.50 to.79 for consensus ratings generated by groups of 2 or 3 raters. The ICC for the total score was.56 (95% confidence interval=.47-.65) for ratings by individuals, and the ICC for consensus ratings was.68 (95% confidence interval=.57-.76). The reliability of ratings of PEDro scale items varied from "fair" to "substantial," and the reliability of the total PEDro score was "fair" to "good."
Transcutaneous electrostimulation for osteoarthritis of the knee Risk of bias versus quality assessment of randomised controlled trials: cross sectional study
  • Aw Rutjes
  • E Nuesch
  • R Sterchi
  • L Kalichman
  • E Hendriks
  • M Osiri
Rutjes AW, Nuesch E, Sterchi R, Kalichman L, Hendriks E, Osiri M, et al. Transcutaneous electrostimulation for osteoarthritis of the knee. Cochrane Database Syst Rev 2009;4:CD002823. [3] Hartling L, Ospina M, Liang Y, Dryden DM, Hooton N, Krebs Seida J, et al. Risk of bias versus quality assessment of randomised controlled trials: cross sectional study. BMJ 2009;339: b4012.
Reproducibility of the Portuguese version of the PEDro scale Cad Sa ude P ublica http://dx.doi.org/10.1016/j.jclinepi.2013.05.007 2 Letter to the Editor
  • Sr Shiwa
  • Lcm Costa
  • Am Moseley
  • Ad Lopes
  • Cr Ruggero
  • Sato
  • To
Shiwa SR, Costa LCM, Moseley AM, Lopes AD, Ruggero CR, Sato TO, et al. Reproducibility of the Portuguese version of the PEDro scale. Cad Sa ude P ublica 2011;27:2063e8. http://dx.doi.org/10.1016/j.jclinepi.2013.05.007 2 Letter to the Editor / Journal of Clinical Epidemiology -(2013)
Reproducibility of the Portuguese version of the PEDro scale
  • S R Shiwa
  • Lcm Costa
  • A M Moseley
  • A D Lopes
  • C R Ruggero
  • T O Sato
Shiwa SR, Costa LCM, Moseley AM, Lopes AD, Ruggero CR, Sato TO, et al. Reproducibility of the Portuguese version of the PEDro scale. Cad Sa ude P ublica 2011;27:2063e8.