Article

A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Objective: Intraclass correlation coefficient (ICC) is a widely used reliability index in test-retest, intrarater, and interrater reliability analyses. This article introduces the basic concept of ICC in the content of reliability analysis. Discussion for researchers: There are 10 forms of ICCs. Because each form involves distinct assumptions in their calculation and will lead to different interpretations, researchers should explicitly specify the ICC form they used in their calculation. A thorough review of the research design is needed in selecting the appropriate form of ICC to evaluate reliability. The best practice of reporting ICC should include software information, "model," "type," and "definition" selections. Discussion for readers: When coming across an article that includes ICC, readers should first check whether information about the ICC form has been reported and if an appropriate ICC form was used. Based on the 95% confident interval of the ICC estimate, values less than 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and greater than 0.90 are indicative of poor, moderate, good, and excellent reliability, respectively. Conclusion: This article provides a practical guideline for clinical researchers to choose the correct form of ICC and suggests the best practice of reporting ICC parameters in scientific publications. This article also gives readers an appreciation for what to look for when coming across ICC while reading an article.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Pearson correlational values of 0.30 or less indicate low correlations, values between 0.31 and 0.60 indicate moderate correlations, and values of 0.61 and higher indicate high correlations [31]. Concerning ICC, values less than 0.50 indicate poor reliability, values between 0.5 and 0.75 indicate moderate reliability, and values of 0.75 or higher indicate good reliability [32]. Estimates of ICCs and confidence intervals should be considered to determine whether a measurement tool is reliable [32]. ...
... Concerning ICC, values less than 0.50 indicate poor reliability, values between 0.5 and 0.75 indicate moderate reliability, and values of 0.75 or higher indicate good reliability [32]. Estimates of ICCs and confidence intervals should be considered to determine whether a measurement tool is reliable [32]. ...
... Still, the retest was administrated in summer when children were provided with more opportunities for outdoor physical activities. Further, although ICC has been considered a valuable index for test-retest reliability, there are no standard values for acceptable test-retest reliability using ICC [32]. Research on absolute test-retest reliability is necessary to generate additional evidence on the reliability of the total scale and subscales. ...
Article
Full-text available
This study examined the test-retest reliability and predictive validity of the East Asia-Pacific Early Child Development Scales (EAP-ECDS) Short Form. In China, preschools typically provide children with educational activities in age-segregated classrooms – Kindergarten Level 1 (K1) (3 to 4 years), Kindergarten Level 2 (K2) (4 to 5 years), and Kindergarten Level 3 (K3) (5 to 6 years). A total of 709 children in K2 (Mage = 57.85 months, SD = 4.77) were randomly selected from 29 kindergartens in Shanghai municipality and Guizhou province of China. Children were assessed using the EAP-ECDS in K2 and K3. School readiness was assessed in K3, and literacy and mathematics achievement were assessed in Grade 2. Pearson’s correlation coefficient and intraclass correlation coefficient (ICC = 0.73) indicated that the tool had good test-retest reliability across K2 and K3. Regarding predictive validity, K2 EAP-ECDS predicted K3 school readiness (β = 0.26), Grade 2 language and literacy (β = 0.18) and mathematics (β = 0.22) after adjusting for age, gender, socioeconomic status, and region. Findings support using the tool to measure the holistic development of preschool-aged children in China and the region.
... The Intraclass Correlation Coefficient was calculated to assess the reliability and absolute agreement between thresholds obtained in the gold standard audiometer versus the web-based audiometer. The extent of correlation was ascertained using the categories given by Koo & Li, (2016) [19]. The Bland Altman plot analysis was used to assess if there was any bias between the mean differences and also estimate the agreement interval [20]. ...
... The Intraclass Correlation Coefficient was calculated to assess the reliability and absolute agreement between thresholds obtained in the gold standard audiometer versus the web-based audiometer. The extent of correlation was ascertained using the categories given by Koo & Li, (2016) [19]. The Bland Altman plot analysis was used to assess if there was any bias between the mean differences and also estimate the agreement interval [20]. ...
... The repeatability based on the correlation between recordings (consistency) and the repeatability based on the exact same scores (absolute agreement) was measured. Intraclass Correlation Coefficient for AC and BC thresholds ( Table 3) are suggestive of an 'excellent reliability' as per classification given by Koo & Li (2016). ...
Article
Full-text available
Aim: The purpose of this study was to verify the accuracy of the web-based audiometer HEARZAP in determining hearing thresholds for both air and bone conduction. Method: Using a cross-sectional validation design, the web-based audiometer was compared to a gold standard audiometer. Participants in the study totaled 50 (100 ears), of which 25 (50 ears) had normal hearing sensitivity and 25 (50 ears) had various types and degrees of hearing loss. All subjects underwent pure tone audiometry, including air and bone conduction thresholds, using the web-based and gold standard audiometers in a random order. A pause between the two tests was allowed if the patient felt comfortable. The testing for the web-based audiometer and gold standard audiometer was done by two different audiologists with similar qualifications in order to eliminate tester bias. Both the procedures were performed in a sound treated room. Results: For air conduction thresholds and bone conduction thresholds, respectively, the mean discrepancies between the web-based audiometer and the gold standard audiometer were 1.22 dB HL (SD = 4.61) and 0.8 dB HL (SD = 4.1). The ICC for air conduction thresholds between the two techniques was 0.94 and for the bone conduction thresholds was 0.91. The Bland Altman plots likewise indicated excellent reliability between the two measurements, with the mean difference between the HEARZAP and the gold standard audiometry falling within the top and lower limits of agreement. Conclusion: The web-based audiometry version of HEARZAP produced precise findings for hearing thresholds that were comparable to those obtained from an established gold standard audiometer. HEARZAP, has the potential to support multi-clinic functionality and enhance service access.
... Intraclass correlation coefficients (ICCs) were calculated between the baseline and 4-year visit values to assess agreement and consistency between them. The agreement was assessed using ICC from the two-way random effect model single measures, which is generally indicated as ICC (2, 1) 33,34 . The ICC (2, 1) values were lower due to age-related changes in BP indices. ...
... The ICC (2, 1) values were lower due to age-related changes in BP indices. The consistency was assessed using ICC from the two-way mixed effect model single measures, or the ICC (3, 1) 33,34 . The ICC (3, 1) model indicates consistency but not an agreement between measurements since it deals with the mean difference between measurements as a systematic error, resulting in ICC values not considering age-related BP changes 33,34 . ...
... The consistency was assessed using ICC from the two-way mixed effect model single measures, or the ICC (3, 1) 33,34 . The ICC (3, 1) model indicates consistency but not an agreement between measurements since it deals with the mean difference between measurements as a systematic error, resulting in ICC values not considering age-related BP changes 33,34 . ...
Article
Full-text available
There is little information about the reproducibility of the white coat effect, which was treated as a continuous variable. To investigate a long-term interval reproducibility of the white-coat effect as a continuous variable. We selected 153 participants without antihypertensive treatment (men, 22.9%; age, 64.4 years) from the general population of Ohasama, Japan, to assess the repeatedly measured white-coat effect (the difference between blood pressures at the office and home) in a 4-year interval. The reproducibility was assessed by testing the intraclass correlation coefficient (two-way random effect model-single measures). The white-coat effect for systolic/diastolic blood pressure slightly decreased by 0.17/1.56 mmHg at the 4-year visit on average. The Bland–Altman plots showed no significant systemic error for the white-coat effects (P ≥ 0.24). The intraclass correlation coefficient (95% confidence interval) of the white-coat effect for systolic blood pressure, office systolic blood pressure, and home systolic blood pressure were 0.41 (0.27–0.53), 0.64 (0.52–0.74), and 0.74 (0.47–0.86), respectively. Change in the white-coat effect was mainly affected by a change in office blood pressure. Long-term reproducibility of the white-coat effect is limited in the general population without antihypertensive treatment. The change in the white-coat effect is mainly caused by office blood pressure variation.
... For example, studies in China, Puerto Rico, France, and Spain showed that the best goodness of fit for the translated questionnaire was a one-factor model [5,13,14,16], which is different from the original version. In addition, when assessing reliability, some studies used the Pearson correlation coefficients to analyze repeated measures of the same sample [14,16], which may not clearly reflect the correlation and consistency between the two levels of measurement [18]. In addition, the interval between tests was not clearly stated, which may affect the test-retest reliability results. ...
... The coefficient of stability was estimated using a two-way mixedeffects model based on a single measurement type and the absolute agreement relationship. The ICC for the results of the scale repeated one month after the first test was 0.83 (p < 0.01), which was greater than the standard of 0.75 [18], indicating good stability and good reliability ( Table 2). ...
... The results of the test-retest reliability were quite good. The scores of the two repeated measurements, with a one-month interval, had a significant correlation, and the ICC exceeded the reference value of 0.8 [18]. The C-BES had good reliability and stability and can be employed in large-scale surveys cost-effectively. ...
Article
Full-text available
Background: The Binge Eating Scale (BES) is a widely used measuring tool to assess binge eating problems in Western countries. However, the psychometric properties of such scales among cross-cultural youth groups are insufficient, and the factor structure continues to be debated; therefore, further research is needed. The aim of this study was to examine the properties of BES among overweight college students in Taiwan. Methods: A cross-sectional design and convenience sampling were adopted to recruit 300 overweight students from five universities. A translated Traditional Chinese version of BES was used for the survey, and the validity of the scale was tested using the Confirmatory Factor Analysis (CFA) and Bulimic Investigatory Test, Edinburgh (BITE). The reliability was evaluated using internal consistency and test-retest reliability. Results: The CFA results showed a reasonable model fit. The first-order two-factor model was consistent with that of the original BES and significantly correlated with the criterion of BITE score. Cronbach's α value, representing internal consistency reliability, and the intraclass correlation coefficient of repeated measures made one month apart were both 0.83, indicating good reliability and stability. Significant correlations were observed between the BES score and sex and BMI; however, no correlation was observed between BES scores and age. Conclusion: The BES presents sound psychometric properties, has good cross-cultural applicability, and can be used as a first-line screening tool by mental health professionals to identify the severity of binge eating behavior among overweight college students in Taiwan. It is recommended that participant diversity and obesity indicators be incorporated into the scale in the future to establish a universal psychometric tool.
... We used the intraclass correlation coefficient (ICC) to assess interrater reliability. Following Koo and Li (2016), we based ICC values and their 95% confidence intervals (CIs) on a single measurement (type to use the measurement from a single rater as the basis of the actual measurement), consistency (definition when the same group of subjects is correlated in an additive manner), and two-way random effects (model to generalize results to any raters with the same characteristics). This type of ICC was selected to account for systematic and random variance between and within raters (Maeng et al., 2017 would be reported as poor-to-good reliability, since the lower bound is less than .50 ...
... Our different findings from previous studies may be because we used more demanding descriptors for our ICC values (Koo & Li, 2016). For example, we only considered ICC values above .75 ...
... Kim et al., 2014;Y. Kim et al., 2012;Maeng et al., 2017;Rintala et al., 2017;Valentini et al., 2017), we followed recommendations from Koo and Li (2016) and used both lower and upper values of the 95% CI, which avoids incomplete or confusing information providing the range in which each ICC lies; according to this, if we had not reported the CIs, assuming ICC values over .50 as moderate reliability, our results would show moderate reliability across the three raters in six of the seven ball skills (not only in four of seven). This, together with the fact that various investigators performed other statistical tests for assessing reliability, such as kappa (Lopes et al., 2016) or Pearson's coefficient (Simons et al., 2008), might explain why interrater reliability was lower in the present study despite our raters' improved agreement. ...
Article
Full-text available
We aimed to calculate the interrater reliability of the Test of Gross Motor Development-Third Edition (TGMD-3) after raters reached a consensus regarding measurement criteria. Three raters measured the fundamental movement skills of 25 children on the TGMD-3 at two different times: (a) once when simply following the measurement criteria in the TGMD-3 manual and (b) after a 9-month washout period, following the raters' consensus building for the measurement criteria for each skill. After calculating and comparing the interrater reliability of these three raters across these two rating times, we found improved interrater reliability after the raters' consensus-building discussions on ratings of both locomotor skills (moderate-to-good reliability on two of six skills initially and at least moderate-to-excellent on four of six skills following criteria consensus building) and ball skills (moderate-to-good reliability on one of seven skills initially and at least moderate-to-excellent reliability on four of seven skills following criteria consensus building). For subtest scores and overall test scores, raters achieved at least moderate-to-good reliability on their second, post-consensus-building ratings. Based on this improved reliability following consensus building, we recommend that researchers include rater consensus building before assessing children’s fundamental movement skills or guiding curriculum interventions in physical education from TGMD-3 data.
... The ICC represents reproducibility in the rank order of athletes over repeated measures (i.e., relative reliability) [43]. In most of our eligible studies, two types of ICC were identified: those describing absolute agreement in a single measure from a two-way random effects model (ICC 2,1 ), and those describing consistency in a single measure from a two-way mixed-effects model (ICC 3,1 ) [43], whereas in other studies ICC type was not clearly specified or attainable (Table 2). ...
... The ICC represents reproducibility in the rank order of athletes over repeated measures (i.e., relative reliability) [43]. In most of our eligible studies, two types of ICC were identified: those describing absolute agreement in a single measure from a two-way random effects model (ICC 2,1 ), and those describing consistency in a single measure from a two-way mixed-effects model (ICC 3,1 ) [43], whereas in other studies ICC type was not clearly specified or attainable (Table 2). Based on the comparisons used in the TE data synthesis, the differences between ICC types were negligible (Additional file 3), and we thereby treated all data the same. ...
... Based on the comparisons used in the TE data synthesis, the differences between ICC types were negligible (Additional file 3), and we thereby treated all data the same. We acknowledge that there are conceptual differences between ICC types [43]. However, it was not possible to examine this effect due to the low number of estimates in some levels (e.g., ICC 2,1 , n = 3 studies). ...
Article
Full-text available
Background Submaximal fitness tests (SMFT) are a pragmatic approach for evaluating athlete’s physiological state, due to their time-efficient nature, low physiological burden and relative ease of administration in team sports settings. While a variety of outcome measures can be collected during SMFT, exercise heart rate (HRex) is the most popular. Understanding the measurement properties of HRex can support the interpretation of data and assist in decision making regarding athlete’s current physiological state and training effects. Objectives The aims of our systematic review and meta-analysis were to: (1) establish meta-analytic estimates of SMFT HRex reliability and convergent validity and (2) examine the moderating influence of athlete and protocol characteristics on the magnitude of these measurement properties. Methods We conducted a systematic literature search with MEDLINE, Scopus and Web of Science databases for studies published up until January 2022 since records began. Studies were considered for inclusion when they included team sports athletes and the reliability and/or convergent validity of SMFT HRex was investigated. Reliability statistics included the group mean difference (MD), typical error of measurement (TE) and intraclass correlation coefficient (ICC) derived from test–retest(s) designs. Pearson’s correlation coefficient (r) describing the relationship between SMFT HRex and a criterion measure of endurance performance was used as the statistic for convergent validity. Qualitative assessment was conducted using risk of bias assessment tool for non-randomised studies. Mixed-effects, multilevel hierarchical models combined with robust variance estimate tests were performed to obtain pooled measurement property estimates, effect heterogeneity, and meta-regression of modifying effects. Results The electronic search yielded 21 reliability (29 samples) and 20 convergent validity (29 samples) studies that met the inclusion criteria. Reliability meta-analysis indicated good absolute (MD = 0.5 [95% CI 0.1 to 0.9] and TE = 1.6 [95% CI 1.4 to 1.9] % points), and high relative (ICC = 0.88 [95% CI 0.84 to 0.91]) reliability. Convergent validity meta-analysis indicated an inverse, large relationship (r = − 0.58 [95% CI − 0.62 to − 0.54]) between SMFT HRex and endurance tests performance. Meta-regression analyses suggested no meaningful influence of SMFT protocol or athlete characteristics on reliability or convergent validity estimates. Conclusions Submaximal fitness test HRex is a reliable and valid proxy indicator of endurance performance in team sport athletes. Athlete and SMFT protocol characteristics do not appear to have a meaningful effect on these measurement properties. Practitioners may implement SMFT HRex for monitoring athlete’s physiological state by using our applied implications to guide the interpretation of data in practice. Future research should examine the utility of SMFT HRex to track within-athlete changes in aerobic capacity, as well as any further possible effects of SMFT protocols design elements or HRex analytical methods on measurement properties. Registration Protocol registration can be found in Open Science Framework and available through https://doi.org/10.17605/OSF.IO/9C2JV.
... The Cronbach's alpha model of internal consistency was used, based on the average inter-item correlation. The interpretation of the ICC was based on the guidelines reported by Koo and Li (2016) [48]; namely, ICC values of <0.5, 0.5-0.75, 0.75-0.9, ...
... The Cronbach's alpha model of internal consistency was used, based on the average inter-item correlation. The interpretation of the ICC was based on the guidelines reported by Koo and Li (2016) [48]; namely, ICC values of <0.5, 0.5-0.75, 0.75-0.9, ...
Article
Full-text available
In order to provide early selection indicators for the breeding of plants used for producing tea seed oil or harvesting tea, we investigated the relationships between flower morphology and fruit yields in tea plants. We analyzed 106 tea varieties to determine the relationships between flower morphological traits and fruit yields. Notably, the homogeneity of flower traits within the same tea plant variety was found to be very high. The average length and width measurements of certain phenotypic traits of tea plants, including pistil length, stamen length, stamen bundle inner width, stamen bundle outer width, and stigma width, were 11.8, 10.9, 2.5, 15.0, 3.7 mm, respectively. In this study, the flower traits that affect fruit yield appear to be related to the difficulty of pollination by insects (e.g., bees), in terms of their contacting the stigma. In 2013, three phenotypic trait variables showed significant effects on yield; namely, the stamen bundle outer width (negative), stigma width (positive), and stigma width minus the stamen bundle inner width (positive). In 2015, only the stamen bundle outer width had a significant negative effect on yield. Regarding pollen viability, in the TTC (2,3,5-triphenyl tetrazolium chloride) staining test, about 84% of the considered tea varieties presented pollen viability exceeding 70%. This indicates that most tea pollen has the ability to germinate normally after contact with the cross-pollinated stigma. The yields of all of the tea varieties exhibited a positively skewed distribution in 2013 and 2015. Although our results indicate that flowers in the anther superior group tend to produce fewer fruits than flowers in the stigma superior group in 2013, in the analysis of the effect of traits on yield, there were no significant differences in the relative positions of stigmas and anthers. In conclusion, we determined that the main trait affecting fruit yield is stamen bundle outer width, while the secondary trait affecting fruit yield is stigma width. However, the efficacy of the stigma width may also be affected by the position of the stigma relative to the anther and the stamen bundle inner width. These two traits have the potential to be used as reference indicators for early selection in future breeding programs.
... In contrast, the contour in SW-B deviates from the vessel wall as it was not possible to adjust the contour freely ICC estimates and their 95% confidence intervals (CI) for intrareader comparison were calculated based on a single rating, absolute-agreement, 2-way mixed-effects model. For interreader, inter-software, and inter-scanner comparisons, ICC estimates and their 95% CI were calculated based on a single rating, absolute-agreement, 2-way random-effects model [14]. ICC values less than 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and greater than 0.9 are indicative of poor, moderate, good, and excellent agreement, respectively [14]. ...
... For interreader, inter-software, and inter-scanner comparisons, ICC estimates and their 95% CI were calculated based on a single rating, absolute-agreement, 2-way random-effects model [14]. ICC values less than 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and greater than 0.9 are indicative of poor, moderate, good, and excellent agreement, respectively [14]. ...
Article
Full-text available
Background Different software programs are available for the evaluation of 4D Flow cardiovascular magnetic resonance (CMR). A good agreement of the results between programs is a prerequisite for the acceptance of the method. Therefore, the goal was to compare quantitative results from a cross-over comparison in individuals examined on two scanners of different vendors analyzed with four postprocessing software packages. Methods Eight healthy subjects (27 ± 3 years, 3 women) were each examined on two 3T CMR systems (Ingenia, Philips Healthcare; MAGNETOM Skyra, Siemens Healthineers) with a standardized 4D Flow CMR sequence. Six manually placed aortic contours were evaluated with Caas (Pie Medical Imaging, SW-A), cvi42 (Circle Cardiovascular Imaging, SW-B), GTFlow (GyroTools, SW-C), and MevisFlow (Fraunhofer Institute MEVIS, SW-D) to analyze seven clinically used parameters including stroke volume, peak flow, peak velocity, and area as well as typically scientifically used wall shear stress values. Statistical analysis of inter- and intrareader variability, inter-software and inter-scanner comparison included calculation of absolute and relative error (E R ), intraclass correlation coefficient (ICC), Bland–Altman analysis, and equivalence testing based on the assumption that inter-software differences needed to be within 80% of the range of intrareader differences. Results SW-A and SW-C were the only software programs showing agreement for stroke volume (ICC = 0.96; E R = 3 ± 8%), peak flow (ICC: 0.97; E R = −1 ± 7%), and area (ICC = 0.81; E R = 2 ± 22%). Results from SW-A/D and SW-C/D were equivalent only for area and peak flow. Other software pairs did not yield equivalent results for routinely used clinical parameters. Especially peak maximum velocity yielded poor agreement (ICC ≤ 0.4) between all software packages except SW-A/D that showed good agreement (ICC = 0.80). Inter- and intrareader consistency for clinically used parameters was best for SW-A and SW-D (ICC = 0.56–97) and worst for SW-B (ICC = -0.01–0.71). Of note, inter-scanner differences per individual tended to be smaller than inter-software differences. Conclusions Of all tested software programs, only SW-A and SW-C can be used equivalently for determination of stroke volume, peak flow, and vessel area. Irrespective of the applied software and scanner, high intra- and interreader variability for all parameters have to be taken into account before introducing 4D Flow CMR in clinical routine. Especially in multicenter clinical trials a single image evaluation software should be applied.
... The test-retest reliability reflects the variation in results taken by an instrument on the same subject under the same conditions. Values above 0.75 indicate good to excellent reliability [58]. Statistical analysis was performed using IBM SPSS Statistics for Windows, Version 26.0 (Armonk, NY, USA: IBM Corp). ...
... Concerning concurrent validity, patients with LV obtained significantly (p < 0.001) lower scores than participants with mild VI and significantly higher (p < 0.001) than those with legal blindness. Finally, the questionnaire was shown to be highly repeatable, which is a fundamental property for evaluating changes in functionality, when considering a specific intervention [58]. In general, the LIFE4LVQ adequately fits the Rasch model, and it can be used as a valid and re-peatable measure, which can accurately detect restrictions on the ability and independence of LV patients due to various causes of irreversible vision loss. ...
Article
Full-text available
Low vision (LV) has a substantial impact on an individual’s daily functionality and patient-reported outcome measures (PROMs) are increasingly incorporated into the evaluation of this problem. The objective of this study was to describe the design of the new “Life for Low Vision Questionnaire (LIFE4LVQ)”, as a measure of daily functionality in LV and to explore its psychometric properties. A total of 294 participants completed the LIFE4LVQ and the data were subjected to Rasch analysis to determine the psychometric properties of the questionnaire, including response category ordering, item fit statistics, principal component analysis, precision, differential item functioning, and targeting. Test–retest reliability was evaluated with an interval of three weeks and intraclass correlation coefficients (ICC) were used. The correlation between the questionnaire score and Best Corrected Visual Acuity (BCVA) was examined using Spearman’s correlation coefficient. Rasch analysis revealed that for most items the infit and outfit mean square fit values were close to 1, both for the whole scale and its subscales (ability and independence). The separation index for person measures was 5.18 with a reliability of 0.96, indicating good discriminant ability and adequate model fit. Five response categories were found for all items. The ICC was 0.96 (p < 0.001; 95% CI, 0.93–0.98), suggesting excellent repeatability of the measure. Poorer BCVA was significantly associated with worse scores (rho = 0.559, p < 0.001), indicating excellent convergent validity. The functional, 40-item LIFE4LVQ proved to be a reliable and valid tool that effectively measures the impact of LV on ability and independence.
... Identificación Screening Elegibilidad Incluidos buena confiabilidad y los valores mayores a 0.90 una confiabilidad excelente (Koo & Li, 2016). Tal procedimiento se realizó mediante el software SPSS (v. ...
... Tres jueces independientes calificaron los datos relativos a los contenidos explicitados por los autores. La fiabilidad entre los jueces fue buena con una Intraclass Correlation (ICC) (3.3) = 0.80, como se menciona en la literatura (Koo & Li, 2016). ...
Article
Full-text available
Cyberbullying is a phenomenon investigated worldwide because of the effect generated on those involved. This study aims to know the current state of research on cyberbullying in Chile. We used the PRISMA statement for systematic reviews and the search was developed in the Web of Science, Scopus, SciELO and EBSCO databases, resulting initially in 27 articles. 10 articles were included in the final analysis. The characteristics of the research were analyzed, and a content analysis was made based on suggestions for future studies. The findings indicate that most of the articles presented a quantitative methodology, addressed the Metropolitan Region and incorporated individual and contextual variables. The content analysis identified two major categories related to the continuity of the studies in Chile. The importance of the findings and the need to continue studying the phenomenon are highlighted.
... To test intrarater reliability, a single observer (MW) analyzed data of 15 patients in a blinded manner twice separated by a 1-month interval. Agreement between the different raters as well as intrarater agreement was calculated with intraclass correlation coefficients (ICC) [mean and 95% confidence interval (CI)] using two-way mixed effects as model and absolute agreement as type [17]. ...
... and 0.90 (95% CI 0.74-0.97) indicating an overall excellent intra-and interrater reliability [17]. Classification of IRVF pattern were consistent between intra-and interrater assessments. ...
Article
Full-text available
Objectives Renal venous congestion due to backward heart failure leads to disturbance of renal function in acute decompensated heart failure (ADHF). Whether decongestion strategies have an impact on renal venous congestion is unknown. Objective was to evaluate changes in intrarenal hemodynamics using intrarenal Doppler ultrasonography (IRD) in patients with heart failure with reduced ejection fraction (HFrEF) and ADHF undergoing recompensation. Methods Prospective observational study in patients with left ventricular ejection fraction (LV-EF) ≤ 35% hospitalized due to ADHF. IRD measurement was performed within the first 48 h of hospitalisation and before discharge. Decongestion strategies were based on clinical judgement according to heart failure guidelines. IRD was used to assess intrarenal venous flow (IRVF) pattern, venous impedance index (VII) and resistance index (RI). Laboratory analyses included plasma creatinine, eGFR and albuminuria. Results A number of 35 patients with ADHF and LV-EF ≤ 35% were included into the study. IRD could be performed in 30 patients at inclusion and discharge. At discharge, there was a significant reduction of VII from a median of 1.0 (0.86–1.0) to 0.59 (0.26–1.0) (p < 0.01) as well as improvement of IRVF pattern categories (p < 0.05) compared to inclusion. Albuminuria was significantly reduced from a median of 78 mg/g creatinine (39–238) to 29 mg/g creatinine (16–127) (p = 0.02) and proportion of patients with normoalbuminuria increased (p = 0.01). Plasma creatinine and RI remained unchanged (p = 0.73; p = 0.43). Discussion This is the first study showing an effect of standard ADHF therapy on parameters of renal venous congestion in patients with HFrEF and ADHF. Doppler sonographic evaluation of renal venous congestion might provide additional information to guide decongestion strategies in patients with ADHF. Graphical abstract
... The ICC ranges from 0 to 1, with higher values indicating greater agreement or consistency between the measurements. ICC values can be interpreted using commonly-accepted guidelines, such as those proposed by several authors [20,21], where an ICC value of less than 0.40 is considered poor agreement, 0.40-0.59 is fair agreement, 0.60-0.74 is good agreement, and 0.75 or higher is excellent agreement. ...
... Prior to conducting planned t-test analyses, sample size calculation was performed using the Soper calculator [22] to determine the necessary sample size to achieve adequate statistical power. The anticipated effect size was estimated to be 0.5 based on previous research in the field [20,21]. With a desired alpha level of 0.05 and power of 0.95, the calculated minimum sample size was 64. ...
Article
Full-text available
Assessment of dynamic balance is typically completed through functional tests, such as the Timed Up and Go (TUG) test, which measures the time it takes for an individual to stand up from a chair, walk a set distance, turn around, and sit back down. This test has been validated in several countries. However, in the Portuguese population there is a gap on testing the reliability of this functional test in a sample of the elderly both living in the community or in nursing homes. Thus, this study aimed at examining the reliability of the TUG in a sample of Portuguese elderly. An Intraclass Correlation Coefficient (ICC) analysis was performed between the first time (T1) and the time score after 16 weeks (T2) in TUG test by 38 males and 79 females aged between 60 and 92 years. The results showed acceptable scores of ICC in community-dwelling and nursing home resident elderly in both moments. In addition, significant differences were found between these groups of older adults, showing that community-dwelling elderly show greater agility and balance capacity compared to those living in nursing homes. Thus, the TUG test can be applied in the Portuguese elderly in both community-dwelling and nursing home resident elderly.
... Intraclass correlation coefficient (ICC) estimate and their 95% confidence interval (CIs) for volumetric measurements were calculated using SPSS version 26 (SPSS Inc, Chicago, IL) based on a mean-rating, absoluteagreement, 2-way mixed-effects model. ICC values were interpreted according to Koo et al. [14]. The nonparametric Kruskal-Wallis test was used to compare volumetric differences among the nine groups. ...
Article
Full-text available
PurposeThe Response Assessment in Neuro-Oncology Brain Metastases (RANO-BM) working group proposed a guide for treatment responses for BMs by utilizing the longest diameter; however, despite recognizing that many patients with BMs have sub-centimeter lesions, the group referred to these lesions as unmeasurable due to issues with repeatability and interpretation. In light of RANO—BM recommendations, we aimed to correlate linear and volumetric measurements in sub-centimeter BMs on contrast-enhanced MRI using intelligent automation software.Methods In this retrospective study, patients with BMs scanned with MRI between January 1, 2018, and December 31, 2021, were screened. Inclusion criteria were: (1) at least one sub-centimeter BM with an integer millimeter-longest diameter was noted in the MRI report; (2) patients were a minimum of 18 years of age; (3) patients with available pre-treatment three-dimensional T1-weighted spoiled gradient-echo MRI scan. The screening was terminated when there were 20 lesions in each group. Lesion volumes were measured with the help of intelligent automation software Jazz (AI Medical, Zollikon, Switzerland) by two readers. The Kruskal-Wallis test was used to compare volumetric differences.ResultsOur study included 180 patients. The agreement for volumetric measurements was excellent between the two readers. The volumes of the following groups were not significantly different: 1–2 mm, 1–3 mm, 1–4 mm, 2–3 mm, 2–4 mm, 3–4 mm, 3–5 mm, 4–5 mm, 5–6 mm, 5–7 mm, 6–7 mm, 6–8 mm, 6–9 mm, 7–8 mm, 7–9 mm, 8–9 mm.Conclusion Our findings indicate that the largest diameter of a lesion may not accurately represent its volume. Additional research is required to determine which method is superior for measuring radiologic response to therapy and which parameter correlates best with clinical improvement or deterioration.
... In the case of the total Cronbach's alpha test it was 0.976, a very similar result to that of the original study testing the SESPI's reliability [2]. In the case of the ICC, the result for the total scale was 0.975, indicating very good interobserver reliability [26]. In fact, for some authors, above 0.81 is considered almost perfect [27]. ...
Article
Full-text available
Purpose To adapt the Scale for the Evaluation of Staff Patient Interactions in Progress Notes to Spanish and to test the psychometric properties. Design and methods The study was conducted in two phases: (1) Adaptation of the instrument to Spanish following the Standards for Educational and Psychological Testing. (2) Psychometric study in a sample of mental health nurses. Findings The Cronbach’s alpha values were 0.97 for the total scale and 0.83 to 0.81 for each dimension. The inter-rater reliability values were between 0.94 and 0.97. Practice implications The scale is a reliable tool for assessing nurses’ clinical notes in relation to the quality of nurse-patient interactions.
... An intraclass correlation coefficient (ICC) of 0.90 was considered acceptable for evaluating the ABCC scale as sufficiently reproducible. 16,38 Statistical significance was credited to P ≤.05. All statistical analyses were performed using IBM SPSS version 25.0 (IBM Corp). ...
Article
Purpose: The Assessment of Burden of Chronic Conditions (ABCC) tool was developed to improve care by facilitating shared decision making and self-management. It assesses and visualizes the experienced burden of 1 or multiple chronic conditions and integrates it in daily care. The aim of this study is to evaluate whether the ABCC scale is valid and reliable in people with chronic obstructive pulmonary disease (COPD), asthma, or type 2 diabetes (T2D). Methods: The Saint George Respiratory Questionnaire (SGRQ), the Standardized Asthma Quality of Life Questionnaire (AQLQ-S), and the Audit of Diabetes Dependent Quality of Life Questionnaire (ADDQoL19) were compared with the ABCC scale to assess convergent validity. The internal consistency was evaluated using Cronbach's α. Test-retest reliability was evaluated at a 2-week interval. Results: A total of 65 people with COPD, 62 with asthma, and 60 with T2D were included. The ABCC scale correlated, in accordance with hypotheses, with the SGRQ (75% of correlations ≥0.7), AQLQ-S (100%), and ADDQoL19 (75%). The ABCC scale was internally consistent with a Cronbach's α of 0.90, 0.92, and 0.91 for the total score for people with COPD, asthma, and T2D, respectively. The ABCC scale had a good test-retest reliability with an intraclass correlation coefficient of 0.95, 0.93, and 0.95 for people with COPD, asthma, and T2D, respectively. Conclusions: The ABCC scale is a valid and reliable questionnaire that can be used within the ABCC tool for people with COPD, asthma, or T2D. Future research should indicate whether this applies to people with multimorbidity, and what the effects and experiences are upon clinical use.
... Test-retest reliability was determined with the r and intraclass correlation coefficient (ICC, two-way mixed effects, single measure, absolute agreement). An ICC of 0.75 indicates good test-retest reliability (Koo & Li, 2016). To examine the ability of the scale to discriminate between 'normal to mild' and 'moderate to extremely severe' levels of anxiety/depression, t-tests with SABAS score as the outcome measure used. ...
Article
Full-text available
The present study evaluated the psychometric properties of the Serbian Smartphone Application-Based Addiction Scale (SABAS) and the original English version of the same scale administered to a Serbian-speaking sample. In Study 1, 599 participants completed Serbian SABAS, with 189 having both test and retest data. Results suggested good internal consistency (α = .81) and test–retest reliability (ICC = .795, p < .001, 95% CI [.731, .844], rtest-retest = .803) of the scale. Convergent validity of the SABAS was evaluated through correlations with the Smartphone Addiction Scale–Short Version (SAS-SV), as well as with anxiety, depression, worry, duration, and purpose of smartphone use. Divergent validity of the SABAS was evaluated through comparing the correlations with entertainment and productive smartphone use. The modified CFA model showed an acceptable fit (χ²(8) = 25.53, p = .001, CFI = .961, TLI = .926, RMSEA = .096, SRMR = .042), confirming the unidimensionality of the SABAS. In the second study, the English SABAS, completed by 335 non-native speakers from Serbia, also showed a good fit of the single-factor model (χ²(9) = 12.56, p = .184, CFI = .990, TLI = .984, RMSEA = .036, SRMR = 0.026), and good psychometric features. Based on the study’s findings, the Serbian version of SABAS is a reliable and valid measure for screening the risk of smartphone addiction. Moreover, the English version can be used among non-native Serbian English speakers.
... A Bland and Altman plot was also created to assess the validity of the Kinect by estimating a 95% confidence interval for the difference in means [30]. The ICC estimates were based on a one-way, consistent, single random effects model, where values below 0.5 indicated poor reliability, values between 0.5 and 0.75 indicated moderate reliability, values between 0.75 and 0.9 indicated good reliability, and values above 0.90 indicated excellent reliability [31]. The significance level for all tests was set at p < 0.05, and all reported coefficients of determination (r 2 ) were statistically significant. ...
Article
Full-text available
Unilateral spatial neglect is a common sensorimotor disorder following the occurrence of a stroke, for which prismatic adaptation is a promising rehabilitation method. However, the use of prisms for rehabilitation often requires the use of specific equipment that may not be available in clinics. To address this limitation, we developed a new software package that allows for the quantification and rehabilitation of unilateral spatial neglect using immersive virtual reality. In this study, we compared the effects of virtual and real prisms in healthy subjects and evaluated the performance of our virtual reality tool (HTC Vive) against a validated motion capture tool. Ten healthy subjects were randomly exposed to virtual and real prisms, and measurements were taken before and after exposure. Our findings indicate that virtual prisms are at least as effective as real prisms in inducing aftereffects (4.39° ± 2.91° with the virtual prisms compared to 4.30° ± 3.49° with the real prisms), but that these effects were not sustained beyond 2 h regardless of exposure modality. The virtual measurements obtained with our software showed excellent metrological qualities (ICC = 0.95, error = 0.52° ± 1.18°), demonstrating its validity and reliability for quantifying deviation during pointing movements. Overall, our results suggest that our virtual reality software (Virtualis, Montpellier, France) could provide an easy and reliable means of quantifying and rehabilitating spatial neglect. Further validation of these results is required in individuals with unilateral spatial neglect.
... The calculation of the ICC was based on a two-way mixed model with a consistency definition, reporting single measures. An ICC value below 0.5 is considered to indicate poor reliability, between 0.5 and 0.75 moderate reliability, between 0.75 and 0.9 good reliability, and any value above 0.9 excellent reliability (28). ...
Article
Full-text available
Introduction In mechanically ventilated adults, thickening fraction of diaphragm (dTF) measured by ultrasound is used to predict extubation success. Whether dTF can also predict extubation success in children is unclear. Aim To investigate the association between dTF and extubation success in children. Second, to assess diaphragm thickness during ventilation and the correlation between dTF, diaphragm thickness (Tdi), age and body surface. Method Prospective observational cohort study in children aged 0–18 years old with expected invasive ventilation for >48 h. Ultrasound was performed on day 1 after intubation (baseline), day 4, day 7, day 10, at pre-extubation, and within 24 h after extubation. Primary outcome was the association between dTF pre-extubation and extubation success. Secondary outcome measures were Tdi end-inspiratory and Tdi end-expiratory and atrophy defined as <10% decrease of Tdi end-expiratory versus baseline at pre-extubation. Correlations were calculated with Spearman correlation coefficients. Inter-rater reliability was calculated with intraclass correlation (ICC). Results Fifty-three patients, with median age 3.0 months (IQR 0.1–66.0) and median duration of invasive ventilation of 114.0 h (IQR 55.5–193.5), were enrolled. Median dTF before extubation with Pressure Support 10 above 5 cmH 2 O was 15.2% (IQR 9.7–19.3). Extubation failure occurred in six children, three of whom were re-intubated and three then received non-invasive ventilation. There was no significant association between dTF and extubation success; OR 0.33 (95% CI; 0.06–1.86). Diaphragmatic atrophy was observed in 17/53 cases, in three of extubation failure occurred. Children in the extubation failure group were younger: 2.0 months (IQR 0.81–183.0) vs. 3.0 months (IQR 0.10–48.0); p = 0.045. At baseline, pre-extubation and post-extubation there was no significant correlation between age and BSA on the one hand and dTF, Tdi- insp and Tdi-exp on the other hand. The ICC representing the level of inter-rater reliability between the two examiners performing the ultrasounds was 0.994 (95% CI 0.970–0.999). The ICC of the inter-rater reliability between the raters in 36 paired assessments was 0.983 (95% CI 0.974–0.990). Conclusion There was no significant association between thickening fraction of the diaphragm and extubation success in ventilated children.
... The intraclass correlation coefficient (ICC) was used to evaluate the agreement between the experts (two-way random effects, absolute agreement, multiple raters average, ICC (2,k)) 23 . An ICC<0.5 was considered as poor, ≥ 0.5 and <0.75 as moderate, ≥ 0.75 and <0.9 as good, and ≥ 0.9 as excellent agreement 23 . Hypothesis testing was considered significant at p-value <0.05 (two-sided). ...
Preprint
Full-text available
Introduction ChatGPT, a novel AI-based chatbot, sparked a lot of interest in the scientific community. Complex central CNS tumour cases require multidisciplinary expert recommendations that incorporate multimodal disease information. Thus, the potential of ChatGPT to integrate comprehensive treatment information may be of tremendous benefit for CNS tumour decision-making. We evaluated the ChatGPT recommendations for glioma management by a panel of CNS tumour experts. Methods We randomly selected 10 patients with primary CNS gliomas discussed at our institution's Tumour Board. Patients' clinical status, surgical, imaging, and immuno-pathology-related information was provided to ChatGPT and seven CNS tumour experts. The chatbot was asked to give the most likely diagnosis, the adjuvant treatment choice, and the regimen while considering the patient's functional status. The experts rated the AI-based recommendations from 0 (complete disagreement) to 10 (complete agreement). An intraclass correlation agreement (ICC) was used to measure the inter-rater agreement. Results Eight patients (80%) met the criteria for glioblastoma and two (20%) were low-grade gliomas. The experts rated the quality of ChatGPT recommendations as poor for diagnosis (median 3, IQR 1-7.8, ICC 0.9, 95% CI 0.7-1.0), good for treatment recommendation (7, IQR 6-8, ICC 0.8, 95% CI 0.4-0.9), good for therapy regimen (7, IQR 4-8, ICC 0.8, 95% CI 0.5-0.9), moderate for functional status consideration (6, IQR 1-7, ICC 0.7, 95% CI 0.3-0.9), and moderate for overall agreement with the recommendations (5, IQR 3-7, ICC 0.7, 95% CI 0.3-0.9). No differences were observed between the glioblastomas and low-grade glioma ratings. Conclusions ChatGPT performed poorly in classifying glioma types but was good for adjuvant treatment recommendations as evaluated by CNS Tumour Board experts. Even though the ChatGPT lacks the precision to replace expert opinion, it may become a promising tool to supplement experts, especially in low-resource settings.
... ICC values less than 0.5 are indicative of poor reliability, values between 0.5 and 0.75 indicate moderate reliability, values between 0.75 and 0.9 indicate good reliability and values greater than 0.90 indicate excellent reliability. 22 ...
Article
Background: Repetitive transcranial magnetic stimulation (rTMS) is recommended in Canadian guidelines as a first-line treatment for major depressive disorder. With the shift towards competency-based medical education, it remains unclear how to determine when a resident is considered competent in applying knowledge of rTMS to patient care. Given inconsistencies between postgraduate training programmes with regards to training requirements, defining competencies will improve the standard of care in rTMS delivery. Objective: The goal of this study was to develop competencies for rTMS that can be implemented into a competency-based training curriculum in postgraduate training programmes. Methods: A working group drafted competencies for postgraduate psychiatry trainees. Fourteen rTMS experts from across Canada were invited to participate in the modified Delphi process. Results: Ten experts participated in all three rounds of the modified Delphi process. A total of 20 items reached a consensus. There was improvement in the Cronbach's alpha over the rounds of modified Delphi process (Cronbach's alpha increased from 0.554 to 0.824) suggesting improvement in internal consistency. The intraclass correlation coefficient (ICC) increased from 0.543 to 0.805 suggesting improved interrater agreement. Conclusions: This modified Delphi process resulted in expert consensus on competencies to be acquired during postgraduate medical education programmes where a learner is training to become competent as a consultant and/or practitioner in rTMS treatment. This is a field that still requires development, and it is expected that as more evidence emerges the competencies will be further refined. These results will help the development of other curricula in interventional psychiatry.
... 26 Spearman's rho correlations required a sample size of 26 with f = 0.5 (large), 26 and the reliability study required at least a sample size of 30. 27 Finally, a multivariate regression analysis required a sample size of 31 with f = 0. 35 (large). 26 In this study, a sample size of 30 participants was deemed appropriate as the estimated sample size varied because smaller sample size may lead to false negatives, inversely, a larger sample size may lead to false positives. ...
... We used the following categories to assess reliability: values less than 0.5 were indicative of poor reliability; values between 0.5 and 0.75 indicated moderate reliability; values between 0.75 and 0.9 indicated good reliability and values greater than 0.90 indicated excellent reliability. 23 Device readings were further compared with the SpotOn temperature using Bland-Altman analysis, assessing the magnitude and direction of mean differences as well as the width of the limits of agreement (LOA). Agreement within the same instrument was assessed using ICCs for IR gun by location (eyes, nose and lips) and target area (forehead or temple at different distances) and for tympanic measurement (left and right). ...
Article
During the coronavirus 2019 (COVID-19) pandemic, the implementation of non-contact infrared thermometry (NCIT) became an increasingly popular method of screening body temperature. However, data on the accuracy of these devices and the standardisation of their use are limited. In the current study, the body temperature of non-febrile volunteers was measured using infrared (IR) thermography, IR tympanic thermometry and IR gun thermometry at different facial feature locations and distances and compared with SpotOn core-body temperature. Poor agreement was found between all IR devices and SpotOn measurements (intra-class correlation coefficient <0.8). Bland-Alman analysis showed the narrowest limits of agreement with the IR gun at 3 cm from the forehead (bias = 0.19°C, limits of agreement (LOA): -0.58°C to 0.97°C) and widest with the IR gun at the nose (bias = 1.40°C, LOA: -1.15°C to 3.94°C). Thus, our findings challenge the established use of IR thermometry devices within hospital settings without adequate standard operating procedures to reduce operator error.
... Regarding procedures performed by an independent researcher (inter-observer reliability), reliability values were: degree centrality (ICC = 0.86); closeness centrality (ICC = 0.82); degree prestige (ICC = 0.96); and proximity prestige (ICC = 0.87). Both assessments revealed good to excellent reliability (Koo & Li, 2016). ...
Article
Prior research has suggested relevance to anthropometric variance of youth athletes at various stages of their maturation, and prior studies of youth players' soccer skills have failed to consider their interdependent interactions during play. Accordingly, to address both of these separate research omissions, we aimed in this study to analyze the relationships between young (U-13 and U-15 groups) soccer players' bone age and body size indicators and centrality measures of their pass interactions during small sided games. We included young 81 athletes (M age = 14.4, SD = 1.1 years) from whom we took anthropometric measurements of body mass, height, and trunk-cephalic height and obtained their bone age using the Tanner-Whitehouse 3 classification method. We also filmed small-sided games in the goal keeper/three player (GK3-3GK) format to analyze the centrality of their passing actions on the following measures: degree of centrality, closeness of centrality, degree of prestige, and proximity of prestige. There were no group differences in the prominence of passing actions across these three measures (t mean = À3.13; p > .05). Canonical correlations of these relationships were significant only in the U-13 group, in which centrality in passing actions was related to body size (r = 0.71; R 2 = 0.21; ʌ = 0.28; p = .03). U-13 players who were physically larger and who presented higher bone age showed centralized main passing actions.
... Intraclass correlations (ICCs) and paired-samples t-tests were performed to compare the scores between the VTC and FTF conditions. We regarded ICC values as follows based on Koo et al [24]: < 0.50, poor; between 0.50 and 0.75, fair; between 0.75 and 0.90, good; above 0.90, excellent. Kappa coe cients and Wilcoxon signedrank test were used for digit span and tapping span because the range of scores was narrow. ...
Preprint
Full-text available
Background To determine the feasibility and reliability of previously unvalidated remote cognitive function tests in Japan using common information and communication technology (ICT) devices, software, and video teleconference (VTC) system compared with face-to-face (FTF) assessment. Methods The sample consisted of 26 participants from senior citizens clubs and an employment service center in Sapporo Japan, including 11 females and 15 males (age averaged 78.6 ± 6.8 years). Tests included the RCPM, Story recall, 10/36 spatial recall, selective reminding test, SDMT, PASAT, FAB, TMT-A, TMT-B, visual cancellation task, digit span, tapping span. The experimental design was a counter-balanced cross-over randomized controlled trial. Intraclass correlations (ICCs), paired-samples t-tests, Cohen's Kappa (κ) coefficients, and Wilcoxon signed-rank test were calculated to compare the scores between VTC and FTF assessments. Results All ICCs were significant and ranged from 0.47 (RCPM time) to 0.92 (RCPM score and PASAT), with a mean ICC of 0.75. Digit span using Cohen's Kappa (κ) coefficient was significant, but the tapping span was not. Paired samples t-test showed statistically significant differences in SDMT, RCPM time, and cancellation time. Conclusions The results suggest that remote video conference-based neuropsychological tests even using familiar devices and software may be able to assess a wide range of cognitive functions in the Japanese older population. As for the processing speed tasks, we need to create our own standards for the remote condition. For the tapping span, we should consider increasing the number of trials.
... The CV, SEM and smallest detectable difference (SDD) were calculated in line with similar research (Dos'santos, . ICCs were interpreted as followed (Koo & Li, 2016): poor (<0.50), moderate (0.50-0.75), good (0.75-0.90), and excellent (>0.90). Minimum acceptable reliability was determined with an ICC >0.7 and CV < 15% (Baumgartner & Chung, 2014). ...
Article
Full-text available
The objective of the study was to evaluate the effectiveness of the Safe Landing (SL), a 6-week technique-modification (TM) programme, on cutting and jump-landing movement quality in football players. In a non-randomized design, 32 male semi-professional football players from two Spanish clubs participated in the study: one served as the control group (CG, n = 11), while the other performed the SL (n = 15). Performance and movement quality of drop vertical jump and 70º change of direction (COD70) were evaluated through 2D video footage pre- and post-intervention. In such tasks, the Landing Error Scoring System for first (LESS1) and second (LESS2) landings, and the Cutting Movement Assessment Score (CMAS) were used for assessing movement quality. Pre-to-post changes and baseline-adjusted ANCOVA were used. Medium-to-large differences between groups at post-test were shown in CMAS, LESS1 and LESS2 (p < 0.082, ղ² = 0.137–0.272), with small-to-large improvements in SL (p < 0.046, ES=0.546–1.307), and CG remaining unchanged (p > 0.05) pre-to-post. In COD70 performance, large differences were found between groups (p < 0.047, ղ² = 0.160–0.253), with SL maintaining performance (p > 0.05, ES=0.039–0.420), while CG moderately decreasing performance (p = 0.024, ES=0.753) pre-to-post. The SL is a feasible and effective TM program to improve movement quality and thus potential injury risk in cutting and landing, while not negatively affecting performance.
... The interpretation was made based on Colton rules [19]. To appreciate the inter-rater agreement, we computed the interclass correlation coefficient using the two-way mixed models, type absolute agreement for average measurement with 95% confidence interval (CI) [20]. The alpha error considered was 0.05. ...
Article
Full-text available
Introduction: This research aims to describe a progressive pattern of ultrasound placental remodeling in patients with a history of SARS-CoV-2 infection during pregnancy. Materials and Methods: This was a longitudinal, cohort study which enrolled 23 pregnant women with a history of former mild SARS-CoV-2 infection during the current pregnancy. Four obstetricians analyzed placental ultrasound images from different gestational ages following COVID infection and identified the presence and degree of remodeling. We assessed the inter-rater agreement and the interclass correlation coefficients. Pathology workup included placental biometry, macroscopic and microscopic examination. Results: Serial ultrasound evaluation of the placental morphology revealed a progressive pattern of placental remodeling starting from 30–32 weeks of gestation towards term, occurring approximately 8–10 weeks after the SARS-CoV-2 infection. Placental changes—the “starry sky” appearance and the “white line” along the basal plate—were identified in all cases. Most placentas presented normal subchorionic perivillous fibrin depositions and focal stem villi perivillous fibrin deposits. Focal calcifications were described in only 13% of the cases. Conclusions: We identified two ultrasound signs of placental remodeling as potential markers of placental viral shedding following mild SARS-CoV-2. The most likely pathology correspondence for the imaging aspect is perivillous and, respectively, massive subchorionic fibrin deposits identified in most cases.
Preprint
Full-text available
Background Tissue Doppler-derived left ventricular systolic velocity (mitral S’) has shown excellent correlation to left ventricular ejection fraction (LVEF) in non-critically patients. However, their correlation in septic patients remains poorly understood and its impact on mortality is undetermined. We investigated the relationship between mitral S’ and LVEF in a large cohort of critically-ill septic patients. Methods We conducted a retrospective cohort study between 01/2011 and 12/2020. All adult patients (≥ 18 years) who were admitted to the medical intensive care unit (MICU) with sepsis and septic shock that underwent a transthoracic echocardiogram (TTE) within 72 hours were included. Pearson correlation test was used to assess correlation between average MASV and LVEF. Pearson correlation was used to assess correlation between average mitral S’ and LVEF. We also assessed the association between mitral S’, LVEF and 28-day mortality Results 2,519 patients met the inclusion criteria. The study population included 1,216 (48.3%) males with a median age of 64 (IQR: 53–73), and a median APACHE III score of 85 (IQR: 67, 108). The median septal, lateral, and average MASV were 8 cm/sec (IQR): 6.0, 10.0], 9 cm/sec (IQR: 6.0, 10.0), and 8.5 cm/sec (IQR: 6.5, 10.5) respectively. MASV noted to have moderate correlation with LVEF (r = 0.46). In multivariable logistic regression analysis, average MASV was associated with an increase in both 28-day ICU and in-hospital mortality with odds ratio (OR) 1.04 (95% CI: 1.01–1.08, p = 0.02) and OR 1.04 (95% CI: 1.01–1.07, p = 0.02) respectively. Conclusion Even though MASV and LVEF may be related, they are not exchangeable and were only found to have moderate correlation in this study. LVEF is U-shaped, while MASV has a linear relation with 28-day ICU mortality. An increase in average mitral S’ was associated with higher 28-day mortality.
Article
PURPOSE: This study aims to evaluate the value of applying X-ray and magnetic resonance imaging (MRI) models based on radiomics feature to predict response of extremity high-grade osteosarcoma to neoadjuvant chemotherapy (NAC). MATERIALS AND METHODS: A retrospective dataset was assembled involving 102 consecutive patients (training dataset, n = 72; validation dataset, n = 30) diagnosed with extremity high-grade osteosarcoma. The clinical features of age, gender, pathological type, lesion location, bone destruction type, size, alkaline phosphatase (ALP), and lactate dehydrogenase (LDH) were evaluated. Imaging features were extracted from X-ray and multi-parametric MRI (T1-weighted, T2-weighted, and contrast-enhanced T1-weighted) data. Features were selected using a two-stage process comprising minimal-redundancy-maximum-relevance (mRMR) and least absolute shrinkage and selection operator (LASSO) regression. Logistic regression (LR) modelling was then applied to establish models based on clinical, X-ray, and multi-parametric MRI data, as well as combinations of these datasets. Each model was evaluated using sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) with a 95% confidence interval (CI). RESULTS: AUCs of 5 models using clinical, X-ray radiomics, MRI radiomics, X-ray plus MRI radiomics, and combination of all were 0.760 (95% CI: 0.583–0.937), 0.706 (95% CI: 0.506–0.905), 0.751 (95% CI: 0.572–0.930), 0.796 (95% CI: 0.629–0.963), 0.828 (95% CI: 0.676–0.980), respectively. The DeLong test showed no significant difference between any pair of models (p > 0.05). The combined model yielded higher performance than the clinical and radiomics models as demonstrated by net reclassification improvement (NRI) and integrated difference improvement (IDI) values, respectively. This combined model was also found to be clinically useful in the decision curve analysis (DCA). CONCLUSION: Modelling based on combination of clinical and radiomics data improves the ability to predict pathological responses to NAC in extremity high-grade osteosarcoma compared to the models based on either clinical or radiomics data.
Article
Objectives: To evaluate image quality, diagnostic acceptability, and lesion conspicuity in abdominal dual-energy CT (DECT) using deep learning image reconstruction (DLIR) compared to those using adaptive statistical iterative reconstruction-V (Asir-V) at 50% blending (AV-50), and to identify potential factors impacting lesion conspicuity. Methods: The portal-venous phase scans in abdominal DECT of 47 participants with 84 lesions were prospectively included. The raw data were reconstructed to virtual monoenergetic image (VMI) at 50 keV using filtered back-projection (FBP), AV-50, and DLIR at low (DLIR-L), medium (DLIR-M), and high strength (DLIR-H). A noise power spectrum (NPS) was generated. CT number and standard deviation values of eight anatomical sites were measured. Signal-to-noise (SNR), and contrast-to-noise ratio (CNR) values were calculated. Five radiologists assessed image quality in terms of image contrast, image noise, image sharpness, artificial sensation, and diagnostic acceptability, and evaluated the lesion conspicuity. Results: DLIR further reduced image noise (p < 0.001) compared to AV-50 while better preserved the average NPS frequency (p < 0.001). DLIR maintained CT number values (p > 0.99) and improved SNR and CNR values compared to AV-50 (p < 0.001). DLIR-H and DLIR-M showed higher ratings in all image quality analyses than AV-50 (p < 0.001). DLIR-H provided significantly better lesion conspicuity than AV-50 and DLIR-M regardless of lesion size, relative CT attenuation to surrounding tissue, or clinical purpose (p < 0.05). Conclusions: DLIR-H could be safely recommended for routine low-keV VMI reconstruction in daily contrast-enhanced abdominal DECT to improve image quality, diagnostic acceptability, and lesion conspicuity. Key points: • DLIR is superior to AV-50 in noise reduction, with less shifts of the average spatial frequency of NPS towards low frequency, and larger improvements of NPS noise, noise peak, SNR, and CNR values. • DLIR-M and DLIR-H generate better image quality in terms of image contrast, noise, sharpness, artificial sensation, and diagnostic acceptability than AV-50, while DLIR-H provides better lesion conspicuity than AV-50 and DLIR-M. • DLIR-H could be safely recommended as a new standard for routine low-keV VMI reconstruction in contrast-enhanced abdominal DECT to provide better lesion conspicuity and better image quality than the standard AV-50.
Article
The use of marker-less methods to automatically obtain kinematics of movement is expanding but validity to high-velocity tasks such as cycling with the presence of the bicycle on the field of view is needed when standard video footage is obtained. The purpose of this study was to assess if pre-trained neural networks are valid for calculations of lower limb joint kinematics during cycling. Motion of twenty-six cyclists pedalling on a cycle trainer was captured by a video camera capturing frames from the sagittal plane whilst reflective markers were attached to their lower limb. The marker-tracking method was compared to two established deep learning-based approaches (Microsoft Research Asia-MSRA and OpenPose) to estimate hip, knee and ankle joint angles. Poor to moderate agreement was found for both methods, with OpenPose differing from the criterion by 4-8° for the hip and knee joints. Larger errors were observed for the ankle joint (15-22°) but no significant differences between methods throughout the crank cycle when assessed using Statistical Parametric Mapping were observed for any of the joints. OpenPose presented stronger agreement with marker-tracking (criterion) than the MSRA for the hip and knee joints but resulted in poor agreement for the ankle joint.
Article
Background There are limited data on new ischemic brain lesions after endovascular treatment for symptomatic intracranial atherosclerotic stenosis (ICAS). Purpose To investigate the (a) characteristics of new ischemic brain lesions at diffusion-weighted MRI (new diffusion abnormalities) after endovascular treatment, (b) characteristics between those treated with balloon angioplasty and stent placement procedures, and (c) predictors of new ischemic brain lesions. Materials and Methods Patients with symptomatic ICAS in whom maximum medical therapy failed were prospectively enrolled between April 2020 and July 2021 from a national stroke center and underwent endovascular treatment. All study participants underwent thin-section diffusion-weighted MRI (voxel size, 1.4 × 1.4 × 2 mm3 with no section gap) before and after treatment. The characteristics of new ischemic brain lesions were recorded. Multivariable logistic regression analysis was performed to determine potential predictors of new ischemic brain lesions. Results A total of 119 study participants (mean age, 59 years ± 11 [SD]; 81 men; 70 treated with balloon angioplasty and 49 with stent placement) were enrolled. Of the 119 participants, 77 (65%) had new ischemic brain lesions. Five of the 119 participants (4%) had symptomatic ischemic stroke. New ischemic brain lesions were located in (61%, 72 of 119) and/or beyond (35%, 41 of 119) the territory of the treated artery. Of the 77 participants with new ischemic brain lesions, 58 (75%) had lesions located in peripheral brain areas. There was no evidence of a difference in the frequency of new ischemic brain lesions between the balloon angioplasty and stent groups (60% vs 71%, P = .20). In adjusted models, cigarette smoking (odds ratio [OR], 3.6; 95% CI: 1.3, 9.7) and more than one operative attempt (OR, 2.9; 95% CI: 1.2, 7.0) were independent predictors of new ischemic brain lesions. Conclusion New ischemic brain lesions on diffusion-weighted MRI scans were common after endovascular treatment for symptomatic intracranial atherosclerotic stenosis, and occurrence may be associated with cigarette smoking and the number of operative attempts. Clinical trial registration no. ChiCTR2100052925 © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Russell in this issue.
Article
Full-text available
Background To develop a machine learning model based on tumor-to-bone distance and radiomic features derived from preoperative MRI images to distinguish intramuscular (IM) lipomas and atypical lipomatous tumors/well-differentiated liposarcomas (ALTs/WDLSs) and compared with radiologists. Methods The study included patients with IM lipomas and ALTs/WDLSs diagnosed between 2010 and 2022, and with MRI scans (sequence/field strength: T1-weighted (T1W) imaging at 1.5 or 3.0 Tesla MRI). Manual segmentation of tumors based on the three-dimensional T1W images was performed by two observers to appraise the intra- and interobserver variability. After radiomic features and tumor-to-bone distance were extracted, it was used to train a machine learning model to distinguish IM lipomas and ALTs/WDLSs. Both feature selection and classification steps were performed using Least Absolute Shrinkage and Selection Operator logistic regression. The performance of the classification model was assessed using a tenfold cross-validation strategy and subsequently evaluated using the receiver operating characteristic curve (ROC) analysis. The classification agreement of two experienced musculoskeletal (MSK) radiologists was assessed using the kappa statistics. The diagnosis accuracy of each radiologist was evaluated using the final pathological results as the gold standard. Additionally, we compared the performance of the model and two radiologists in terms of the area under the receiver operator characteristic curves (AUCs) using the Delong’s test. Results There were 68 tumors (38 IM lipomas and 30 ALTs/WDLSs). The AUC of the machine learning model was 0.88 [95% CI 0.72–1] (sensitivity, 91.6%; specificity, 85.7%; and accuracy, 89.0%). For Radiologist 1, the AUC was 0.94 [95% CI 0.87–1] (sensitivity, 97.4%; specificity, 90.9%; and accuracy, 95.0%), and as to Radiologist 2, the AUC was 0.91 [95% CI 0.83–0.99] (sensitivity, 100%; specificity, 81.8%; and accuracy, 93.3%). The classification agreement of the radiologists was 0.89 of kappa value (95% CI 0.76–1). Although the AUC of the model was lower than of two experienced MSK radiologists, there was no statistically significant difference between the model and two radiologists (all P > 0.05). Conclusions The novel machine learning model based on tumor-to-bone distance and radiomic features is a noninvasive procedure that has the potential for distinguishing IM lipomas from ALTs/WDLSs. The predictive features that suggested malignancy were size, shape, depth, texture, histogram, and tumor-to-bone distance.
Article
The potential sex-specific differences in animal personality traits (i.e., consistent inter-individual variation in observed behavior) are an active field of inquiry in behavioral ecology. Sexual horn dimorphism, a special type of trait divergence where males develop large and elaborate horns, presents an opportunity to test whether sex-specific morphologies covary with changes in personality expression. We compared the activity personality trait between sexes in two dung beetle species: hornless Onthophagus ruficapillus and sexually horn dimorphic Onthophagus furcatus. We measured speed and distance moved in artificially constructed circular tracks to simulate physical activity in brood tunnels. Both measures were positively correlated and showed moderate levels of repeatability in two species, hence representing a personality axis. Sex-specific differences in locomotory performance emerged only in the horn dimorphic O. furcatus: males exhibited a more active personality than females. Season, body size, and the interaction of body size with sex did not alter the observed activity levels. Finally, O. furcatus not only showed stronger relationship between the activity measures, but it also presented lower within-individual variation for both metrics. Our results contribute to the growing body of literature on how consistent individual differences can either interact with or be a result of sex-based biological processes.
Article
Recording transcranial magnetic stimulation-derived measures during a closed kinetic chain task can serve as a functional technique to assess corticomotor function, which may have implications for activities of daily living or lower extremity injury in physically active individuals. Given the novelty of TMS use in this way, our purpose was to first determine the intersession reliability of quadriceps corticospinal excitability during a single-leg squat. We used a descriptive laboratory study to assess 20 physically active females (22.1 ± 2.5 years, 1.7 ± 0.7 m, 66.3 ± 13.6 kg, Tegner Activity Scale: 5.90 ± 1.12) over a 14-day period. Two-way mixed effects Intraclass Correlation Coefficients (3,1) (ICC) for absolute agreement were used to assess intersession reliability. The active motor threshold (AMT) and normalized motor evoked potential (MEP) amplitudes were assessed in the vastus medialis of each limb. The dominant limb AMTs demonstrated moderate-to-good reliability (ICC = 0.771, 95% CI = 0.51-0.90; p < 0.001). The non-dominant limb AMTs (ICC = 0.364, 95% CI = 0.00-0.68, p = 0.047), dominant limb MEPs (ICC = 0.192, 95% CI = 0.00-0.71; p = 0.340), and non-dominant limb MEPs (ICC = 0.272, 95% CI = 0.00-0.71; p = 0.235) demonstrated poor-to-moderate reliability. These findings may provide insight into corticomotor function during activities requiring weight-bearing, single-leg movement. However, variability in agreement suggests further work is warranted to improve the standardization of this technique prior to incorporating in clinical outcomes research.
Article
Background: The BHOHB system (Bhohb S.r.l., Italy) is a portable non-invasive photographic marker-based device for postural examination. Objective: To assess the test-retest reliability of the BHOHB system and compare its reliability with an optoelectronic system (SMART-DX 700, BTS, Italy). Methods: Thirty volunteers were instructed to stand upright with five markers on the spinous processes of C7, T6, T12, L3 and S1 vertebrae to define the dorsal kyphosis and lumbar lordosis (sagittal plane) angles. Three markers were placed on the great trochanter, apex of iliac crest and lateral condyle of the femur to detect pelvic tilt. Finally, to define angles between the acromion and the spinous processes (frontal plane), two markers were placed on the right and left acromion. Postural angles were recoded simultaneously with BHOHB and optoelectronic systems during two consecutive recording sessions. Results: The BHOHB system revealed excellent reliability for all the angles (ICCs: 0.92-0.99, SEM: 0.78∘-3.33∘) as well as a shorter processing time compared to the optoelectronic system. Excellent reliability was also found for all the angles detected through the optoelectronic system (ICCs: 0.91-0.99, SEM: 0.84∘-2.80∘). Conclusion: The BHOHB system resulted as a reliable non-invasive and user-friendly device to monitor spinal posture, especially in subjects requiring repeat examinations.
Article
Saccharomyces cerevisiae is the yeast of choice for most inoculated wine fermentations worldwide. However, many other yeast species and genera display phenotypes of interest that may help address the environmental and commercial challenges the wine industry has been facing in recent years. This work aimed to provide, for the first time, a systematic phenotyping of all Saccharomyces species under winemaking conditions. For this purpose, we characterized the fermentative and metabolic properties of 92 Saccharomyces strains in synthetic grape must at two different temperatures. The fermentative potential of alternative yeasts was higher than expected, as nearly all strains were able to complete fermentation, in some cases more efficiently than commercial S. cerevisiae strains. Various species showed interesting metabolic traits, such as high glycerol, succinate and odour-active compound production, or low acetic acid production, compared to S. cerevisiae. Altogether, these results reveal that non-cerevisiae Saccharomyces yeasts are especially interesting for wine fermentation, as they may offer advantages over both S. cerevisiae and non-Saccharomyces strains. This study highlights the potential of alternative Saccharomyces species for winemaking, paving the way for further research and, potentially, for their industrial exploitation.
Article
Rationale and objectives: Although low muscle mass is associated with decreased lung function, studies exploring the relationship between muscle fat content and lung function impairment are scarce. This study aimed to evaluate the association of muscle mass and fatty infiltration with lung function in young adults with obesity. Materials and methods: We performed a retrospective cross-sectional study of patients aged 18-45 years with obesity who had impaired pulmonary function (case group, n = 66) and those with normal pulmonary function (control group, n = 198) by matching age, sex, body mass index (BMI), and height to assess whether muscle characteristics differed. Muscle mass and muscle fat content were assessed by MRI using a chemical shift-encoded sequence (IDEAL-IQ). Results: A total of 264 patients were enrolled (124 females; mean age 32.0 years). The case group had lower muscle mass than the control group (p = 0.012), and there was an association between low muscle mass and lung function impairment (odds ratio (OR), 3.74; 95% confidence interval (CI), 1.57-8.93). Furthermore, muscle fat content was significantly higher in cases compared to controls (7.4 (2.7) % vs. 6.2 (2.5) %, p = 0.001). Multiple logistic regression analysis showed that muscle fat content was associated with a higher risk of impaired lung function (OR, 2.10; 95% CI, 1.65-2.66), regardless of adiposity and muscle mass. Conclusion: Both muscle fat content and muscle mass are associated with impaired lung function in young adults with obesity.
Article
Full-text available
Structured Abstract Hintergrund: Die Kompetenz im Umgang mit Modellen ist für den Naturwissenschaftsunterricht zentral. Bei einer unachtsa-men Herangehensweise können Lernendenvorstellungen aufgebaut und automatisiert werden, die einem adäquaten Modellver-ständnis entgegenwirken. Die gängigsten Lernendenvorstellungen entstehen durch den Transfer von lebensweltlichen Eigen-schaften von Stoffen in die Welt der Atome. Des Weiteren ist es wichtig, das Bewusstsein zu fördern, dass ein Modell immer eine Interpretation von Beobachtungen und Messungen ist und somit ein Denkmodell. Obwohl bereits viel in diesem Bereich geforscht wurde, bestehen noch einige Forschungslücken auf der Ebene der Wirksamkeit von einzelnen Aufgaben zum Aufbau eines adäquaten Modellverständnisses. Absicht: Um diese Forschungslücke zu schmälern, wurden in der vorliegenden Masterarbeit im Sinne einer Pilotstudie zwei konkrete Aufgaben zur Einführung eines möglichst adäquaten Modellverständnisses in Form einer Interventionsstudie (N=270) untersucht und miteinander verglichen. Stichprobe: An der explorativen Interventionsstudie haben elf Zentralschweizer Lehrpersonen der Sekundarstufe mit insge-samt 16 Klassen und 270 Schüler:innen teilgenommen. Forschungsdesign und Methode: Für den Vergleich der beiden Aufgaben wurden zwei Interventionsgruppen (Blackbox-Interventionsgruppe und Spuren-Interventionsgruppe) sowie eine Kontrollgruppe gebildet. Die Interventionsstudie wurde nach dem Prä-, Post-, Follow-Up-Design durchgeführt. Resultate: Die Ergebnisse der vorliegenden Untersuchung weisen darauf hin, dass der Lernzuwachs bei den Schüler:innen durch die Blackbox (BB)-Intervention höher ist als durch die Spuren (SP)-Intervention (p<.001, d>0.81). Schlussfolgerung: Die explorative Studie deutet mit ersten Resultaten darauf hin, dass der BB-Ansatz im Unterricht vielver-sprechend ist. Jedoch führt auch der SP-Ansatz zu einem Lernzuwachs. Zukünftige Forschung könnte in Form einer breit angelegten mehrebenen-analytischen Studie dieser Frage vertieft nachgehen und weitere Fragen, wie z.B. Genderaspekte, klären.
Article
Purpose: This study aimed to develop a radiographic measurement to evaluate the femoroacetabular space using 3-dimenstional hip models in asymptomatic hips, and to evaluate the reliability and validity of the femoroacetabular excursion angle (FAEA) in symptomatic patients. Methods: From January/2020 to December/2020, we recruited healthy hips to establish 3-dimensional models. Through the simulation of fourteen activities of daily living (ADLs), anterior and lateral impingement-free FAEAs were measured. Another cross-sectional cohort was formed from consecutive symptomatic subjects with impingement signs during the same period. In the validation cohort, anterior and lateral FAEAs were assessed on modified Dunn's and anteroposterior views of the hip, respectively. We evaluated the reliability and clinical implications of the FAEAs. Results: In the discovery cohort (n=33), hips with collisions tended to have smaller computed tomography-based FAEAs than collision-free hips, although alpha and lateral center-edge (CE) angles were comparable. Additionally, hips with a lower quartile of FAEAs had a significantly higher number of ADLs with collisions. In the validation cohort (n=411), the FAEA measurement was highly reliable (kappa statistics >0.95 for both inter- and intra-observer reliabilities). The femoroacetabular impingement syndrome (FAIS) group (n=165) showed significantly smaller anterior and lateral FAEAs than the non-FAIS group (all P<0.001, Cramer V=0.420). The optimal cut-off values for anterior and lateral FAEAs were 32.6° and 48.9°, respectively. In univariate regression, anterior (odds ratio (OR)=0.91 [95% confidence interval (CI), 0.89-0.94]) and lateral (OR=0.91 [95% CI, 0.89-0.93]) FAEAs were significantly associated with FAIS. Moreover, in multivariate regression adjusted for alpha and lateral CE angles, anterior FAEA remained a significant predictor (OR=0.96 [95% CI, 0.93-0.99]), and small FAEA was an independent risk factor for FAIS (OR=1.99 [95% CI, 1.06-3.71] for any small FAEA; OR=2.88 [95% CI, 1.32-6.31] for both small FAEAs). Conclusion: The FAEA is a valid measurement for FAIS with high reliability.
Article
New findings: What is the central question of this study? We sought to establish between-day reproducibility in estimates of middle cerebral artery blood velocity (MCAv) and cerebrovascular reactivity (CVR) in young, healthy male and female adults in tightly controlled experimental conditions. What is the main finding and its importance? Measures of MCAv assessed during morning, afternoon and evening hours are reproducible between days. There is diurnal variation in CVR, with values being highest during the evening compared with the morning. Greater diurnal variation in CVR is associated with more efficient sleep and greater nocturnal blood pressure dipping. These data enhance our understanding of modulators of MCAv and CVR. Abstract: Transcranial Doppler (TCD) is used to assess cerebral blood velocity (CBV) and cerebrovascular reactivity (CVR). Assessments of TCD reproducibility are limited, and few include multiple within-day measurements. We sought to establish reproducibility of CBV and CVR in healthy adults during three time periods (morning, afternoon and evening). We hypothesized that CBV and CVR measured at the same time of day are reproducible between days. We also hypothesized that CBV and CVR exhibit diurnal variation, with measurements being higher in the evening compared with morning/afternoon hours. Twelve adults [six male and six female, 27 years (95% CI, 22-31 years)] completed three measurements (morning, afternoon and evening) on two separate days in controlled conditions (e.g., meals, activity and sleep). Middle cerebral artery blood velocity (MCAv, TCD) was measured continuously at rest and during two CVR tests (end-expiratory apnoea and carbogen inhalation). Intraclass correlation coefficients for resting MCAv showed moderate to good reproducibility, which did not differ between morning, afternoon and evening (0.87, 0.56 and 0.67, respectively; P > 0.05). Intraclass correlation coefficients for peak MCAv during apnoea (0.80, 0.46 and 0.65, respectively; P > 0.05) and minute 2 of carbogen inhalation (0.81, 0.74 and 0.73, respectively; P > 0.05) were also not different from morning compared with afternoon/evening. Time of day had no effect on resting MCAv (F = 0.69, P = 0.51, ƞp 2 = 0.06) or the peak response to apnoea (F = 1.00, P = 0.39, ƞp 2 = 0.08); however, peak MCAv during carbogen breathing exhibited diurnal variation, with highest values in the evening (F = 3.41, P = 0.05, ƞp 2 = 0.24). Measures of CBV and CVR assessed via TCD during morning, afternoon and evening hours are reproducible between days. There is diurnal variation in the MCAv response to carbogen exposure, with CVR being highest during evening compared with morning hours.
Article
Previous studies have suggested that parents may support the development of theory of mind (ToM) in their child by talking about mental states (mental state talk; MST). However, MST has not been sufficiently explored in deaf children with cochlear implants (CIs). This study investigated ToM and availability of parental MST in deaf children with CIs (n = 39, Mage = 62.92, SD = 15.23) in comparison with their peers with typical hearing (TH; n = 52, Mage = 52.48, SD = 1.07). MST was measured during shared storybook reading. Parents’ narratives were coded for cognitive, emotional, literal, and non-mental references. ToM was measured with a parental questionnaire. Children with CIs had lower ToM scores than their peers with TH, and their parents used more literal references during shared storybook reading. There were no significant differences in the frequencies of cognitive and emotional references between groups. Parental emotional references contributed positively to children’s ToM scores when controlling for the child’s age and receptive grammar only in the CI group. These results indicated some distinctive features in parents of deaf children with CIs’ MST and highlighted the role of MST in the development of ToM abilities in this group.
Article
Full-text available
Background: Two-dimensional synthetic MRI of the breast has limited spatial coverage. Three-dimensional (3D) synthetic MRI could provide volumetric quantitative parameters that may reflect the immunohistochemical (IHC) status in invasive ductal carcinoma (IDC) of the breast. Purpose: To evaluate the feasibility of 3D synthetic MRI using an interleaved Look-Locker acquisition sequence with a T2 preparation pulse (QALAS) for discriminating the IHC status, including hormone receptor (HR), human epidermal growth factor receptor 2 (HER 2), and Ki-67 expression in IDC. Study type: Prospective observational study. Population: A total of 33 females with IDC of the breast (mean, 52.3 years). Field strength/sequence: A 3-T, 3D-QALAS gradient-echo and fat-suppressed T1-weighted 3D fast spoiled gradient-echo sequences. Assessment: Two radiologists semiautomatically delineated 3D regions of interest (ROIs) of the whole tumors on the dynamic MRI that was registered to the synthetic T1-weighted images acquired from 3D-QALAS. The mean T1 and T2 were measured for each IDC. Statistical tests: Intraclass correlation coefficient for assessing interobserver agreement. Mann-Whitney U test to determine the relationship between the mean T1 or T2 and the IHC status. Multivariate logistic regression analysis followed by receiver operating characteristics (ROC) analysis for discriminating IHC status. A P value <0.05 was considered statistically significant. Results: The interobserver agreement was good to excellent. There was a significant difference in the mean T1 between HR-positive and HR-negative lesions, while the mean T2 value differed between HR-positive and HR-negative lesions, between the triple-negative and HR-positive or HER2-positive lesions, and between the Ki-67 level > 14% and ≤ 14%. Multivariate analysis showed that the mean T2 was higher in HR-negative IDC than in HR-positive IDC. ROC analysis revealed that the mean T2 was predictive for discriminating HR status, triple-negative status, and Ki-67 level. Data conclusion: 3D synthetic MRI using QALAS may be useful for discriminating IHC status in IDC of the breast. Evidence level: 1. Technical efficacy: Stage 2.
Article
Full-text available
Background The use of patient-reported questionnaires to collect information on costs associated with routine healthcare services, such as chiropractic, represents a less labour intensive alternative to retrieving these data from patient files. The aim of this paper was to compare patient-report versus patient files for the collection of data describing healthcare usage in chiropractic clinics. Methods As part of a prospective single cohort multi-centre study, data on the number of visits made to chiropractic clinics determined using patient-reported questionnaires or as recorded in patient files were compared three months following the start of treatment. These data were analysed for agreement using the Intraclass Correlation Coefficient (ICC) and the 95% Limits of Agreement. Results Eighty-nine patients that had undergone chiropractic care were included in the present study. The two methods yielded an ICC of 0.83 (95% CI = 0.75 to 0.88). However, there was a significant difference between the data collection methods, with an average of 0.6 (95% CI = 0.25 to 1.01) additional visits reported in patient files. The 95% Limits of Agreement ranged from 3 fewer visits to 4 additional visits in patient files relative to the number of visits recalled by patients. Conclusion There was some discrepancy between the number of visits made to the clinic recalled by patients compared to the number recorded in patient files. This should be taken into account in future evaluations of costs of treatments.
Article
Full-text available
AIthough intraclass correlation coefficients (lCCs) are commonIy used in behavioral measurement, pychometrics, and behavioral genetics, procodures available for forming inferences about ICC are not widely known. Following a review of the distinction between various forms of the ICC, this article presents procedures available for calculating confidence intervals and conducting tests on ICCs developed using data from one-way and two-way random and mixed-efFect analysis of variance models. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Intra-class correlation coefficients (ICCs) provide a statistical means of testing the reliability. However, their interpretation is not well documented in the orthopedic field. The purpose of this study was to investigate the use of ICCs in the orthopedic literature and to demonstrate pitfalls regarding their use. First, orthopedic articles that used ICCs were retrieved from the Pubmed database, and journal demography, ICC models and concurrent statistics used were evaluated. Second, reliability test was performed on three common physical examinations in cerebral palsy, namely, the Thomas test, the Staheli test, and popliteal angle measurement. Thirty patients were assessed by three orthopedic surgeons to explore the statistical methods testing reliability. Third, the factors affecting the ICC values were examined by simulating the data sets based on the physical examination data where the ranges, slopes, and interobserver variability were modified. Of the 92 orthopedic articles identified, 58 articles (63%) did not clarify the ICC model used, and only 5 articles (5%) described all models, types, and measures. In reliability testing, although the popliteal angle showed a larger mean absolute difference than the Thomas test and the Staheli test, the ICC of popliteal angle was higher, which was believed to be contrary to the context of measurement. In addition, the ICC values were affected by the model, type, and measures used. In simulated data sets, the ICC showed higher values when the range of data sets were larger, the slopes of the data sets were parallel, and the interobserver variability was smaller. Care should be taken when interpreting the absolute ICC values, i.e., a higher ICC does not necessarily mean less variability because the ICC values can also be affected by various factors. The authors recommend that researchers clarify ICC models used and ICC values are interpreted in the context of measurement.
Article
Objective: To provide an entry-level, new technology reliability assessment of the PulStar computer-assisted, differential compliance spinal instrument. Subjects: Eighteen college students (9 male and 9 female) were recruited by announcements and personal contacts. Methods: Following approval of the consent process by the Institutional Review Board of Mississippi State University, a PulStar Function Recording and Analysis System (PulStarFRAS) device was evaluated for clinical reliabilitv. Two examiners. blinded from data collection, used the instrument on individual subjects in random order (lying prone with their backs exposed) to administer light impulses (congruent to .9 J which produced a 3- to 4-lb force) at each segmental level throughout the cervical, dorsal, and lumbar spine using probe tips spaced 3 cm apart, straddling the spinous processes, while a computer recorded the findings (resistance on a scale of 0 to 25.5 lb force). Data were analyzed by Exploratory Data Analysis (EDA) with analysis of variance (ANOVA) testing and by use of the intraclass correlation coefficient (ICC). In addition, a mean test (ANOVA) was conducted to determine if a trend in variation occurred as a result of repeated light thrusts to the spine, independent of variance explained by different examiners. Results: Using EDA analysis and ANOVA, intraexaminer reliability for the 2 practitioners was very high but not perfect. This was confirmed by ICC statistics demonstrating good to excellent reliability for both practitioners (0.89 for the experienced practitioner, 0.78 for the newly trained practitioner). Interexaminer reliability of PulStar was similarly very high but not perfect based on EDA/ANOVA analysis and good to excellent (ICC = 0.87). Conclusion: The PulStar mechanical adjusting device set to analysis mode appears to have good to excellent reliability when used by either an experienced or a novice (but trained) examiner. In addition, as a measure for resistance to a light thrust or spinal compliance, reliability was similarly good to excellent between the 2 doctors using the PulStar instrument.
Article
Objective The purpose of this study was to investigate the reliability of the Goutallier classification system (GCS) for grading muscle fatty degeneration in the lumbar multifidus (LM) using magnetic resonance imaging (MRI) examinations. Methods Lumbar spine MRI scans were obtained retrospectively from the radiology department imaging system. Two examiners (a chiropractic diagnostic imaging resident and a board certified chiropractic radiologist with 30 years of experience) independently graded each LM at the L4/5 and L5/S1 intervertebral level. ImageJ pixel analysis software (version 1.47; National Institutes of Health, Bethesda, MD) was used independently by 2 observers to quantify the percent fat of the LM and allow correlation between LM percent fat and GCS grade. Twenty-five subject MRIs were randomly selected. Magnetic resonance imaging scans were included if they were obtained using a 1.5 T imaging system and were excluded if there was evidence of spinal infection, tumor, fracture, or postoperative changes. For all tests, P < .05 was defined as significant. Results Intraobserver reliability grading LM fat ranged from a weighted κ (κw) of 0.71 to 0.93. Mean interobserver reliability grading LM fat was κw, 0.76 to κw, 0.85. There was a significant (P < .001) correlation between LM percent fat and GCS grade. Furthermore, interobserver reliability in determining percent fat was between intraclass correlation coefficient, 0.73 to intraclass correlation coefficient, 0.90. Conclusions In this study, the GCS was reliable in grading LM fatty degeneration and correlated positively with a quantified percent fat value. In addition, ImageJ software (National Institutes of Health) was reliable between raters when quantifying LM percent fat.
Article
Reports 3 errors in the original article by K. O. McGraw and S. P. Wong (Psychological Methods, 1996, 1[1], 30–46). On page 39, the intraclass correlation coefficient (ICC) and r values given in Table 6 should be changed to r = .714 for each data set, ICC(C,1) = .714 for each data set, and ICC(A,1) = .720, .620, and .485 for the data in Columns 1, 2, and 3 of the table, respectively. In Table 7 (p. 41), which is used to determine confidence intervals on population values of the ICC, the procedures for obtaining the confidence intervals on ICC(A,k) need to be amended slightly. Corrected formulas are given. On pages 44–46, references to Equations A3, A,4, and so forth in the Appendix should be to Sections A3, A4, and so forth. (The following abstract of this article originally appeared in record 1996-03170-003.). Although intraclass correlation coefficients (ICCs) are commonly used in behavioral measurement, psychometrics, and behavioral genetics, procedures available for forming inferences about ICC are not widely known. Following a review of the distinction between various forms of the ICC, this article presents procedures available for calculating confidence intervals and conducting tests on ICCs developed using data from one-way and two-way random and mixed-effect analysis of variance models. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Some doctors and therapists believe that wearing high-heeled shoes causes increased lumbar lordosis and that this may be a cause of low back pain. The purpose of this study was to evaluate whether high-heeled shoes increase lumbar lordosis and to do so with more reliable methods and a larger sample size than used in previous studies. Fifty participants from a chiropractic university were included in a test group (32 female and 18 male) and 9 in a control group (3 female and 6 male). A Spinal Mouse was used to measure lumbar lordosis in test participants barefoot and then again with 3- or 4-in high-heeled shoes after a 10-minute adaptation period of walking and sitting and standing while wearing the shoes. Reliability of the testing conditions was evaluated with 9 barefoot control participants before and after an identical adaptation period, and intra- and interexaminer reliability of Spinal Mouse measurements was tested by use of a wooden model built to mimic the proportions of a human spine. Both groups showed non-significant decreases in lordosis between the first and second scans (high heels: 23.4° to 22.8°, = .17; control: 18.8° to 17.6°, = .16). Scans of the wooden spine model were highly reliable (intra- and interexaminer intraclass correlation coefficients > .999). Consistent with most previous studies, high-heeled shoes did not affect lumbar lordosis in most people while standing. Future research could investigate the effect of shoes during dynamic conditions or identify affected subgroups.
Article
To provide an entry-level, new technology reliability assessment of the PulStar computer-assisted, differential compliance spinal instrument. Eighteen college students (9 male and 9 female) were recruited by announcements and personal contacts. Following approval of the consent process by the Institutional Review Board of Mississippi State University, a PulStar Function Recording and Analysis System (PulStarFRAS) device was evaluated for clinical reliability. Two examiners, blinded from data collection, used the instrument on individual subjects in random order (lying prone with their backs exposed) to administer light impulses (approximately equal to .9 J which produced a 3- to 4-lb force) at each segmental level throughout the cervical, dorsal, and lumbar spine using probe tips spaced 3 cm apart, straddling the spinous processes, while a computer recorded the findings (resistance on a scale of 0 to 25.5 lb force). Data were analyzed by Exploratory Data Analysis (EDA) with analysis of variance (ANOVA) testing and by use of the intraclass correlation coefficient (ICC). In addition, a mean test (ANOVA) was conducted to determine if a trend in variation occurred as a result of repeated light thrusts to the spine, independent of variance explained by different examiners. Using EDA analysis and ANOVA, intraexaminer reliability for the 2 practitioners was very high but not perfect. This was confirmed by ICC statistics demonstrating good to excellent reliability for both practitioners (0.89 for the experienced practitioner, 0.78 for the newly trained practitioner). Interexaminer reliability of PulStar was similarly very high but not perfect based on EDA/ANOVA analysis and good to excellent (ICC = 0.87). The PulStar mechanical adjusting device set to analysis mode appears to have good to excellent reliability when used by either an experienced or a novice (but trained) examiner. In addition, as a measure for resistance to a light thrust or spinal compliance, reliability was similarly good to excellent between the 2 doctors using the PulStar instrument.
Article
The poor reliability of lateral shift detection has been attributed to lack of rater training, biologic variation, and test reactivity. This study aimed to remove the potential confounding arising from biological variation and test reactivity and control the level of rater experience/training in making judgments of lateral shift. One hundred forty-eight raters with 3 levels of clinical physical therapy experience and training in the McKenzie method participated. The raters viewed photographic slides of 45 patients with low back pain. Slides were judged on a numerical scale for presence and direction of a shift. Intrarater reliability was evaluated using the intraclass correlation coefficient (ICC) and interrater reliability was evaluated using both the ICC and kappa statistic. Reliability of shift judgments was only moderate for all groups (eg, ICC [2,1] values ranged from 0.48 to 0.64). Lateral shift judgements have only moderate reliability, even when trained raters judge stable stimuli. We propose that the photo model employed can be used to explore the source of error in this process.
Article
SummaryTherapists regularly perform various measurements. How reliable these measurements are in themselves, and how reliable therapists are in using them, is clearly essential knowledge to help clinicians decide whether or not a particular measurement is of any value. The aim of this paper is to explain the nature of reliability, and to describe some of the commonly used estimates that attempt to quantify it. An understanding of reliability, and how it is estimated, will help therapists to make sense of their own clinical findings, and to interpret published studies.Although reliability is generally perceived as desirable, there is no firm definition as to the level of reliability required to reach clinical acceptability. As with hypothesis testing, statistically significant levels of reliability may not translate into clinically acceptable levels, so that some authors' claims about reliability may need to be interpreted with caution. Reliability is generally population specific, so that caution is also advised in making comparisons between studies.The current consensus is that no single estimate is sufficient to provide the full picture about reliability, and that different types of estimate should be used together.
Article
Soft tissue exhibits nonlinear stress-strain behavior under compression. Characterizing its nonlinear elasticity may aid detection, diagnosis, and treatment of soft tissue abnormality. The purposes of this study were to develop a rate-controlled Mechano-Acoustic Indentor System and a corresponding finite element optimization method to extract nonlinear elastic parameters of soft tissue and evaluate its test-retest reliability. An indentor system using a linear actuator to drive a force-sensitive probe with a tip-mounted ultrasound transducer was developed. Twenty independent sites at the upper lateral quadrant of the buttock from 11 asymptomatic subjects (7 men and 4 women from a chiropractic college) were indented at 6% per second for 3 sessions, each consisting of 5 trials. Tissue thickness, force at 25% deformation, and area under the load-deformation curve from 0% to 25% deformation were calculated. Optimized hyperelastic parameters of the soft tissue were calculated with a finite element model using a first-order Ogden material model. Load-deformation response on a standardized block was then simulated, and the corresponding area and force parameters were calculated. Between-trials repeatability and test-retest reliability of each parameter were evaluated using coefficients of variation and intraclass correlation coefficients, respectively. Load-deformation responses were highly reproducible under repeated measurements. Coefficients of variation of tissue thickness, area under the load-deformation curve from 0% to 25% deformation, and force at 25% deformation averaged 0.51%, 2.31%, and 2.23%, respectively. Intraclass correlation coefficients ranged between 0.959 and 0.999, indicating excellent test-retest reliability. The automated Mechano-Acoustic Indentor System and its corresponding optimization technique offers a viable technology to make in vivo measurement of the nonlinear elastic properties of soft tissue. This technology showed excellent between-trials repeatability and test-retest reliability with potential to quantify the effects of a wide variety of manual therapy techniques on the soft tissue elastic properties.
Article
In clinical measurement comparison of a new measurement technique with an established one is often needed to see whether they agree sufficiently for the new to replace the old. Such investigations are often analysed inappropriately, notably by using correlation coefficients. The use of correlation is misleading. An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.
Article
This purpose of this study was to assess the reliability of measurements made of the zygapophysial (Z) joint space from the magnetic resonance imaging scans of subjects with acute low back pain using new equipment and 2 different methods of statistical analysis. If found to be reliable, the methods of Z joint measurement can be applied to scans taken before and after spinal manipulation in a larger study of acute low back pain subjects. Three observers measured the central anterior-to-posterior distance of the left and right L4/L5 and L5/S1 Z joint space from 5 subject scans (20 digitizer measurements, rounded to 0.1 mm) on 2 separate occasions separated by 4 weeks. Observers were blinded to each other and their previous work. Intra- and interobserver reliability was calculated by means of intraclass correlation coefficients and also by mean differences using the methods of Bland and Altman (1986). A mean difference of less than +/-0.4 mm was considered clinically acceptable. Intraclass correlation coefficients showed intraobserver reliabilities of 0.95 (95% confidence interval, 0.87-0.98), 0.83 (0.62-0.92), and 0.92 (0.83-0.96) for each of the 3 observers and interobserver reliabilities of 0.90 (0.82-0.95), 0.79 (0.61-0.90), and 0.84 (0.75-0.90) for the first and second measurements and overall reliability, respectively. The mean difference between the first and second measurements was -0.04 mm (+/-1.96 SD = -0.37 to 0.29), 0.23 (-0.48 to 0.94), 0.25 (-0.24 to 0.75), and 0.15 (-0.44 to 0.74) for each of the 3 observers and the overall agreement, respectively. Both statistical methods were found to be useful and complementary and showed the measurements to be highly reliable.
Article
In 1969 the first edition of this book introduced the concepts of statistics and their medical application to readers with no formal training in this area. While retaining this basic aim, the authors have expanded the coverage in each subsequent edition to keep pace with the increasing use and sophistication of statistics in medical research. This fifth edition has undergone major restructuring, with some sections completely rewritten; it is now more logically organized and more user friendly (with the addition of 'summary boxes' throughout the text). It incorporates new statistical techniques and approaches that have made an appearance since the last edition. In addition, some chapters or chapter headings are specifically marked to signify material that is more difficult than the material in which it is embedded - such sections or chapters can be omitted at first reading. Several new chapters have been added . "Associations: Chance, Confounded and Causal?" explains without any formulae the concepts underlying confounding, confidence intervals and p values, and the interpretation of associations observed in research investigations. Another new chapter considers sample size calculations in some detail and provides, in addition to the relevant formulae, useful tables that should give the researcher an indication of the order of magnitude of the number of subjects he or she might require in different situations.
Article
A procedure for estimating the reliability of sets of ratings, test scores, or other measures is described and illustrated. This procedure, based upon analysis of variance, may be applied both in the special case where a complete set of ratings from each ofk sources is available for each ofn subjects, and in the general case wherek 1,k 2, ...,k n ratings are available for each of then subjects. It may be used to obtain either a unique estimate or a confidence interval for the reliability of either the component ratings or their averages. The relations of this procedure to others intended to serve the same purpose are considered algebraically and illustrated numerically.
Article
Reliability coefficients often take the form of intraclass correlation coefficients. In this article, guidelines are given for choosing among 6 different forms of the intraclass correlation for reliability studies in which n targets are rated by k judges. Relevant to the choice of the coefficient are the appropriate statistical model for the reliability study and the applications to be made of the reliability results. Confidence intervals for each of the forms are reviewed. (23 ref) (PsycINFO Database Record (c) 2006 APA, all rights reserved).
Article
A procedure for estimating the reliability of sets of ratings in terms of the intraclass correlation coefficient is discussed. The procedure is based upon the analysis of variance and the estimation of variance components. For the one-way classification the intraclass correlation coefficient defined as the ratio of variances can be interpreted as a correlation coefficient. Caution, however, is urged in the application of the definition to a two-way model, i.e., one in which between-rater variance is removed. It is maintained that the frequent use of the standard definition of the one-way intraclass correlation coefficient applied to the two-way classification is not a proper procedure if in fact the coefficient is to be interpreted as a correlation coefficient. Definitions for reliability obtained from the two-way models are given which can legitimately be considered correlation coefficients.
Article
Reliability refers to the reproducibility of values of a test, assay or other measurement in repeated trials on the same individuals. Better reliability implies better precision of single measurements and better tracking of changes in measurements in research or practical settings. The main measures of reliability are within-subject random variation, systematic change in the mean, and retest correlation. A simple, adaptable form of within-subject variation is the typical (standard) error of measurement: the standard deviation of an individual's repeated measurements. For many measurements in sports medicine and science, the typical error is best expressed as a coefficient of variation (percentage of the mean). A biased, more limited form of within-subject variation is the limits of agreement: the 95% likely range of change of an individual's measurements between 2 trials. Systematic changes in the mean of a measure between consecutive trials represent such effects as learning, motivation or fatigue; these changes need to be eliminated from estimates of within-subject variation. Retest correlation is difficult to interpret, mainly because its value is sensitive to the heterogeneity of the sample of participants. Uses of reliability include decision-making when monitoring individuals, comparison of tests or equipment, estimation of sample size in experiments and estimation of the magnitude of individual differences in the response to a treatment. Reasonable precision for estimates of reliability requires approximately 50 study participants and at least 3 trials. Studies aimed at assessing variation in reliability between tests or equipment require complex designs and analyses that researchers seldom perform correctly. A wider understanding of reliability and adoption of the typical error as the standard measure of reliability would improve the assessment of tests and equipment in our disciplines.
Article
Paraspinal thermography is used by chiropractors as an aid in assessing the presence of vertebral subluxation. Few reliability studies have been carried out, with mixed results. Digital infrared scanning equipment is now available with location tracking that may enhance reproducibility. Digitized scans enable a computer-aided interpretation of thermographic patterns. To assess the ability of examiners to reproduce thermal patterns. Repeated measures with 2 examiners assessing the same patient on 2 occasions. Thirty asymptomatic students served as subjects. A TyTron C-3000 handheld thermographic scanner interfaced to a Microsoft Windows compatible personal computer was used for all recordings. Each examiner recorded 2 scans on each patient. It took an average of 3 minutes to complete all 4 scans. Data were exported to a spreadsheet for initial analysis, then SPSS was used for calculation of intraclass correlation coefficients (ICC). Since the starting and stopping points of scans were not always the same, care was taken to align scans visually, using well-distinguished peaks on the charts as guides. Scans were cropped to remove artifacts that might have occurred at the beginning and end of the scans. Intraexaminer and interexaminer ICCs were calculated. Skin temperatures ranged from 35.4 degrees C to 30.0 degrees C over all scans. The average temperatures changed little from the first to the last scans, indicating that subjects' overall skin temperatures were stable during the scanning procedure. Intraexaminer ICCs ranged from 0.953 to 0.984. The left and right channel data show slightly higher congruence than the Delta channel. The interexaminer reliability coefficients ranged from 0.918 to 0.975. Again, the Delta channel shows slightly less reliability, although the ICCs were quite high for all channels. Intraexaminer and interexaminer reliability of paraspinal thermal scans using the TyTron C-3000 were found to be very high, with ICC values between 0.91 and 0.98. Changes seen in thermal scans when properly done are most likely due to actual physiological changes rather than equipment error.
Estimation of the reliabilty of ratings
  • R L Ebel
Ebel RL. Estimation of the reliabilty of ratings. Psychometrika 1951;16:407-24.