Article

Statistical-Methods For Assessing Agreement Between 2 Methods Of Clinical Measurement

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In clinical measurement comparison of a new measurement technique with an established one is often needed to see whether they agree sufficiently for the new to replace the old. Such investigations are often analysed inappropriately, notably by using correlation coefficients. The use of correlation is misleading. An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The Bland-Altman plot reveals that the mean difference falls below zero (Figure 7). The data points for the three studies included all lie within the calculated limits of agreement, and thus this observation must be interpreted with caution due to the limited number of included studies [25]. On average, OMT may lead to lower VAS scores compared to non-OMT, as suggested by the negative bias [25]. ...
... The data points for the three studies included all lie within the calculated limits of agreement, and thus this observation must be interpreted with caution due to the limited number of included studies [25]. On average, OMT may lead to lower VAS scores compared to non-OMT, as suggested by the negative bias [25]. The low sample size limits the ability to make a definitive conclusion regarding the agreement between the two measurement methods [25]. ...
... On average, OMT may lead to lower VAS scores compared to non-OMT, as suggested by the negative bias [25]. The low sample size limits the ability to make a definitive conclusion regarding the agreement between the two measurement methods [25]. OMT, osteopathic manipulative treatment ...
... In addition, the results of each parameter were presented in a Bland-Altman plot to visualize variation in the measurements on radiographs and MRI. 29 We considered a priori differences of 5 degrees or greater between the imaging modalities to be clinically significant. ...
... A reduction in LL and an increase in SS on supine MRI are expected because of changes in the positioning of the spine. 18,20,21,23,24,[29][30][31] Our finding that the mean sLL was similar between supine MRI and standing lateral radiographs indicates that the changes in LL in the supine position might be attributed to flexion of the upper segments of the lumbar spine above the L4-level. Another possible explanation might be a slight anterior truncal inclination as a compensatory mechanism for symptom relief, leading to smaller observed LL in standing radiographs. ...
Article
Full-text available
Study Design Radiologic cross-sectional study based on a prospective cohort study (level III). Objective Investigate whether lumbar lordosis (LL) and sacral slope (SS) differ significantly on supine magnetic resonance imaging (MRI) versus standing radiographs in nondeformity lumbar spinal stenosis (LSS). Secondly, to quantify the amount of magnification on standing lumbar radiographs. Summary of Background Data Supine MRI is routinely performed when diagnosing LSS. Standing radiographs are often supplemented to measure spinopelvic angles. Little research has been done on whether LL and SS translate from standing radiographs to supine MRI. Previous studies have trended to significant changes in LL and SS; however, none have been performed exclusively in nondeformity LSS. Materials and Methods Review of preoperative standing lateral lumbar radiographs and midsagittal T2-weighted supine lumbar MRI in 211 patients with LSS without concomitant degenerative spondylolisthesis, measuring LL (L1–S1), segmental lumbar lordosis (sLL) (L4–S1) and SS, in addition to the anteroposterior diameter and height of the L3 vertebral body. We conducted a reliability study and performed a Pearson’s correlation analysis. Data was presented in Bland-Altman plots. Results Interobserver reliability was good to excellent, with ICC ranging from 0.77 to 0.94 for all parameters. Statistically significant differences were observed in LL and SS between image modalities. The mean radiographic measurements were as follows: LL 48.9 (SD: 12.8), sLL 32.3 (SD: 8.1), and SS 37.3 (SD: 8.7) degrees. The mean MRI measurements were as follows: LL 46.0 (SD: 10.5), sLL 32.3 (SD: 7.1), and SS 38.1 (SD: 7.1) degrees. Mean vertebral body magnification was between 21% and 23% for L3 anteroposterior diameter and height. Conclusions Our results suggest that supine lumbar MRI might be a viable alternative to standing lateral lumbar radiographs for measuring LL and SS in routine follow-up for patients with LSS without concomitant spinal deformity. Standing radiographs are recommended as part of the initial investigation for LSS. Standing lumbar radiographs may yield high grades of magnification.
... (RStudio Inc.) and SPSS 22.0 (SPSS Inc.). Continuous data are presented as mean ± SD, and comparisons were performed with Student's unpaired t-tests or paired t-tests and the non-parametric Wilcoxon-Mann-Whitney test or [26,27]. To compare differences in axis measurements between the two modalities and readers, respectively, the method of Bland and Altman was used [27]. ...
... Continuous data are presented as mean ± SD, and comparisons were performed with Student's unpaired t-tests or paired t-tests and the non-parametric Wilcoxon-Mann-Whitney test or [26,27]. To compare differences in axis measurements between the two modalities and readers, respectively, the method of Bland and Altman was used [27]. p < 0.05 was considered statistically significant. ...
Article
Full-text available
Purpose This multicenter trial was conducted to evaluate MRI for the longitudinal management of incidental pulmonary nodules in heavy smokers. Materials and methods 239 participants (63.9 ± 8.4 years, 43–82 years) at risk of or with COPD GOLDI-IV from 16 centers prospectively underwent two rounds of same-day low-dose computed tomography (LDCT1&2) and MRI1&2 at an interval of three years in the nationwide COSYCONET trial. All exams were independently assessed for incidental pulmonary nodules in a standardized fashion by two blinded readers, incl. axis measurements and Lung-RADS categorization, with consensual LDCT results serving as the standard of reference. A change in diameter ≥ 2 mm was rated as progress. 11 patients underwent surgery for suspicious nodules after the first round. Results Two hundred twenty-four of two hundred forty nodules (93.3%) persisted from LDCT1 to LDCT2, with a sensitivity of MRI2 of 82.8% and 81.5% for readers 1 and 2, respectively. Agreement in Lung-RADS categories between LDCT2 and MRI2 was substantial in per-nodule (κ = 0.62–0.70) and excellent in a per-patient (κ = 0.86–0.88) approach for both readers, respectively. Concordance between LDCT2 and MRI2 for growth was excellent to almost perfect (κ = 0.88–1.0). The accuracy of LDCT1 and MRI1 for lung cancer was 87.5%. Lung-RADS ≥ 3 category on MRI1 had higher accuracy for predicting progress (23.1% and 21.4%, respectively) than LDCT1 (15.8%). Conclusion Compared to LDCT, MRI shows similar capabilities for the longitudinal evaluation of incidental nodules in heavy smokers. Decision-making for nodule management guided by Lung-RADS seems feasible based on longitudinal MRI. Key Points Question Can MRI serve as an alternative to low-dose CT (LDCT) for the longitudinal management of pulmonary nodules in heavy smokers, addressing concerns over radiation exposure ? Findings MRI demonstrated substantial agreement with LDCT in detecting nodule growth, accurately categorizing Lung-RADS, and comparable accuracy in identifying malignancy over a three-year follow-up . Clinical relevance Longitudinal MRI demonstrates high consistency with LDCT in assessing the growth of incidental pulmonary nodules and categorizing per-patient Lung-RADS, offering a reliable, radiation-free alternative for monitoring and early malignancy detection in high-risk populations . Graphical Abstract
... To improve the validity of reliability analyses in neurocognitive research and psychological measurements assessing executive and lower order functions, we supplemented common relative reliability statistics with advanced measurement error analyses. Specifically, in addition to the ICC, standard error of measurement (SEM), and minimal detectable change (MDC), an agreement analysis based on Bland and Altmann (1986), Hopkins (2000), and Atkinson and Nevill (1998) was conducted to evaluate random scattering (secondary variance) using the limits of agreement as a measure of precision (Barnhart et al. 2007). Furthermore, learning effects were evaluated using the sampled t test, as proposed by Atkinson and Nevill (1998) and Warneke et al. (2025), assuming that no systematic performance improvements should occur from test to retest if no learning und habituation effects were present (which is critical for internal data validity). ...
... The magnitude of mean differences is further visualized in Bland-Altman plots, as recommended for reliability assessments (Grgic et al. 2020). These plots display the variability around a systematic shift (Atkinson and Nevill 1998;Bland and Altmann 1986;Hopkins 2000). To enhance the analysis of the limits of agreement using the mean difference as a reference, random error was also quantified through the mean absolute error (MAE) (Willmott and Matsuura 2005) and mean absolute percentage error (MAPE) (Kim and Kim 2016), calculated as follows: ...
Article
Full-text available
Testing neurocognitive function is receiving growing attention in psychological and physical health research. To counteract the costs, reduced accessibility, and complexity of brain imaging (e.g., CT scans, fMRI) or function tests, neurocognitive performance tests, such as the Stroop test, the Trail Making Test or the Choice Reaction Task, are commonly implemented in a variety of neurocognitive evaluations. Although reliability is considered of paramount when interpreting intervention effects, a detailed quantification of systematic and random errors is scarce. By recruiting 68 healthy participants from different age groups (7 to 64 years), we quantified population-specific measurement errors in the aforementioned neurocognitive tasks to raise awareness of learning effects on reliability and how they may bias current effect interpretations. By performing five testing sessions with 2 trials per day, we observed significant learning effects from repeated testing, expressed in trial-to-trial improvements of up to 50%, which were accompanied by a random measurement error reduction from day-to-day. These learning effects were task- and population specific, highlighting the need for caution when transferring reliability coefficients from other studies. The quantification of both systematic and random measurement errors underscores the importance of coducting sufficient habituation sessions in neurocognitive tasks, as test protocols lack validity if they do nor ensure reliability. Therefore, sufficient habituation sessions - until no meaningful learning effects can be observed - may be warranted when testing is repeated within short timeframes.
... Bland-Altman plots were used to assess the agreement between the complete pairs of GSP and IOM samplers in measuring inhalable aerosol and only for samples > LOD for SP concentrations (Bland and Altman 1986). The mean ratio between complete pairs of GSP and IOM samplers and Pearson's correlation analysis were performed to explore the relationship between inhalable aerosol and SP concentrations. ...
... Bland-Altman analyses were performed to compare the differences in thickness measurements, illustrating the mean differences (agreement at the cohort level) and the limits of agreement with a 95% confidence interval (CI) [reflecting variability and agreement at the individual level]. 28 The consistency of IEDs was evaluated by analyzing the difference-in-differences between eyes across separate tests, determined as the absolute value of the difference between IEDs from two repeated measurements. ...
Article
Full-text available
Background Optical coherence tomography (OCT) allows evaluation of inter-eye differences (IEDs) in peri-papillary retinal nerve fiber layer (pRNFL) and macular ganglion cell-inner plexiform layer (GCIPL) thicknesses to identify unilateral optic nerve involvement (UONI), which is included in the 2024 revised McDonald diagnostic criteria for multiple sclerosis (MS). Objectives To evaluate the test–retest reliability of pRNFL and GCIPL thicknesses/IEDs in people with MS, other neurological disorders, and healthy controls using Cirrus HD-OCT. Methods 509 participants underwent Cirrus HD-OCT, acquiring two macular and optic disc scans per eye within each session. Scans meeting OSCAR-IB quality control criteria were included in final analyses (959 eyes), with no clinical/demographic exclusions (reflecting a real-world clinical setting). Reliability was assessed using coefficients of variation (COVs), intraclass correlation coefficients (ICCs), and Bland–Altman limits of agreement (LOA). IED consistency was evaluated using difference-in-differences (DiDs) of test–retest measurements. Results GCIPL demonstrated superior reliability (ICC: 0.998, COV: 0.40%, LOA: −1.29 to 1.35 μm) to pRNFL (ICC: 0.989, COV: 1.18%, LOA: −3.59 to 3.70 μm) thickness. Inter-eye absolute DiDs [pRNFL: 2.00 μm (standard deviation (SD) 1.73); GCIPL: 0.64 μm (SD 0.67)] were lower than IED thresholds proposed for identifying UONI. Conclusions The excellent reliability of GCIPL and pRNFL thicknesses/IEDs support OCT for identifying UONI to diagnose MS.
... As correlation coefficients give no information on the actual difference between two methods, it is argued that Bland-Altman plots can provide more relevant insight for clinical decision-making. 29 The LoA of ±89.2 µmol/L found in this study are wider than reported in previously testing of the Picterus system as well as by other means of transcutaneous bilirubin estimations and are wider than recommended safety margins for the use of TcB in neonatal jaundice. 1 17 30 This could have clinical implications, with unwanted effects on clinical decisionmaking. ...
Article
Full-text available
Background Neonatal hyperbilirubinemia (NHB) is a significant cause of morbidity and mortality, particularly in low and middle-income countries (LMICs). Transcutaneous bilirubinometers offer a non-invasive method for assessing NHB but have limited availability due to cost and maintenance requirements. Visual assessment of jaundice is shown to be inaccurate. Smartphone-based technologies have the potential to provide innovative and accessible healthcare solutions. This study aimed to evaluate the Picterus system, a smartphone-based tool for screening of NHB, in three non-Caucasian populations in LMICs. Methods Between 2018 and 2022, cross-sectional studies were conducted in three countries: Mexico, Nepal and the Philippines. Newborns meeting the inclusion criteria were recruited, and data on demographic characteristics, skin type and visual assessment of jaundice were collected. Bilirubin levels were measured using both the Picterus system and total serum bilirubin (TSB) analysis. Correlation analyses, Bland-Altman plots and receiver operating characteristic (ROC) curves were used to evaluate the Picterus system. Results A total of 416 infants were included in the analysis. The Picterus smartphone system demonstrated a significant positive correlation with TSB levels across all sites (r=0.76). The correlation coefficient was significantly higher in Mexico compared with Nepal and the Philippines. Bland-Altman plots showed limits of agreement ±89.2 µmol/L. Picterus values were underestimated in Mexico, whereas they were overestimated in Nepal and the Philippines. ROC analysis for detection of infants with TSB >225 µmol/L indicated that the Picterus system had higher sensitivity and specificity compared with visual assessment using the Kramer scale. Discussion This study shows that the Picterus system can potentially be used in screening for neonatal jaundice in populations with moderate dark skin types. Further studies are needed before the system can be used in clinical practice.
... Statistical significance was set at p < 0.05. In addition, the intra-subject Bland-Altman method [18] was adapted to compare the ratio of RMS angular velocity between the left and right sides for both groups. To illustrate bias and limits of agreement between both limbs, the graphs were created using the mean difference and ± 2 standard deviations (SD) across individuals. ...
Article
Full-text available
This study investigated the 3-axis gait kinematics of people with stroke to healthy individuals. The specific focus was stroke effects on shank movements in the sagittal, frontal and transverse planes. Sixteen stroke patients and sixteen healthy participants walked along a 10-meter walkway at self-comfortable walking speed, with shank angular velocity measured using two gyroscopes integrated into an Inertial Measurement Unit (IMU). Gait events in all motion planes, temporal gait parameters, kinematic data, asymmetry indexes (ASI), and Bland-Altman correlations were also computed. Greater diversity gait patterns were observed in stroke patients. Compared with healthy controls, stroke patients showed reduced stance time on their affected side and lower non-affected swing time, causing gait asymmetry, as reflected in higher ASIs. The stroke patients’ shank exhibited limited motion in all motion planes resulting in lower angular velocity and displacement. Sagittal plane angular velocity results were validated using intra-subject comparisons that showed good agreement between limbs in healthy subjects. These empirical findings in this study provide evidence that shank-based IMUs are effective in revealing temporal-spatial gait alterations in people with stroke compared to healthy individuals which can be exploited to develop a targeted rehabilitation plan to reduce abnormalities in the stroke patient group.
... Mean differences (actual five-year MTPM -estimated MTPMemax), standard deviations of differences, intraclass correlation coefficients (ICCs), and Bland and Altman limits of agreement were calculated. 11 The MTPMemax and K-values were determined for TKA designs and fixations from a previous systematic review based on their pooled migration patterns from multiple study groups. 3 These MTPMemax and K-values may serve as future benchmarks for specific TKA designs and fixation (i.e. ...
Article
Full-text available
Aims Thresholds for acceptable amounts of migration tibial components measured with radiostereometric analysis (RSA) are limited to specific follow-up moments and do not use the full migration pattern. The Michaelis-Menten (MM) model, a non-linear model from biochemistry, could potentially be used to model the entire migration pattern. The purpose of this study was therefore to determine if MM models can be fitted to RSA migration data of tibial components, and if these fitted model parameters can be used for early detection of tibial components at high risk for aseptic loosening. Methods Migration patterns of tibial component maximum total point motion (MTPM) over six months, one year, and two years, as well as revision rates for aseptic loosening from previously published systematic reviews, were used. Fitted MM models gave the estimated maximal MTPM (MTPMemax) and a constant (K), which is the time in months at which half the MTPMemax is reached for tibial component designs. To assess model fit, intraclass correlation coefficients (ICCs) were calculated for the modelled MTPMemax and reported five-year MTPM values. The estimated MTPMemax and K-values were plotted against their corresponding five-year revision rate for aseptic loosening. Results For six-month, one-year, and two-year migration patterns, the ICC was 0.81, 0.83, and 0.91, respectively, suggesting excellent agreement between calculated MTPMemax values and the known five-year MTPM values. MTPMemax up to 1.3 mm was considered to be safe based on association with aseptic loosening revision rate, while MTPMemax of more than 1.3 mm was unsafe. The K-value could not be used as a predictor for safe versus unsafe implants. Conclusion MTPMemax values may be used for early detection of tibial components at high risk for aseptic loosening, possibly offering improvements over the older threshold system. Cite this article: Bone Joint Res 2025;14(5):398–406.
... Furthermore, the agreement between criterion 4C and the alternative estimations was evaluated using the Bland-Altman method [39], including the calculation of the 95% limits of agreement (LOA). This analysis also included the analysis of the correlation between the mean values and the differences to detect potential proportional bias. ...
Article
Full-text available
The criterion four‐compartment (4C) model combines dual‐energy x‐ray absorptiometry (DXA), deuterium dilution, and air displacement plethysmography (ADP), but its complexity limits practicality. This study aimed to externally validate a DXA‐derived body volume (BV) equation (DXA‐BVSilva) and assess the accuracy of a rapid 4C model replacing deuterium dilution total body water (TBW) with bioelectrical impedance spectroscopy (BIS) alongside DXA‐derived BV in athletes. A total of 143 athletes (27.3% females) validated DXA‐BV, with 115 (33.9% females) assessed for FM. Criterion 4C used DXA for bone mineral content, ADP for BV, and deuterium dilution for TBW. Several rapid 4C models were tested. DXA‐BVSilva and DXA‐BVHeymsfield reliably estimated BV, showing minimal mean differences and narrow 95% limits of agreement (LOA) compared to ADP. Among rapid 4C models, 4CTBWBIS, 4C BVSilvaTBWBIS, and 4C BVHeymsfieldTBWBIS provided the most accurate FM estimates with small, nonsignificant differences to the criterion (MD [SD]: −0.18 [1.25] kg; −0.23 [1.82] kg; 0.18 [1.87] kg, respectively) and narrow 95% LOA (−2.62 to 2.27, −3.79 to 3.33 and − 3.48 to 3.83 kg, respectively) with no proportional bias. This research supports the implementation of rapid 4C models in settings where the criterion 4C model is impractical. By improving accuracy in body composition assessment, our findings have implications for sports nutrition, sports science, and academic research, offering a viable alternative to traditional 2C and 3C models (DXA).
... The rationale for applying the Bland-Altman analysis in this study lies in evaluating the consistency of two different measurement methods [21,22]. Bland-Altman analysis is a reliable method developed to determine the systematic bias and the limits of agreement (LOA) between two measurement methods for continuous data. ...
Article
Full-text available
Background Objective Structured Clinical Examinations (OSCEs) are widely used in medical education to assess students’ clinical and professional skills. Recent advancements in artificial intelligence (AI) offer opportunities to complement human evaluations. This study aims to explore the consistency between human and AI evaluators in assessing medical students’ clinical skills during OSCE. Methods This cross-sectional study was conducted at a state university in Turkey, focusing on pre-clinical medical students (Years 1, 2, and 3). Four clinical skills—intramuscular injection, square knot tying, basic life support, and urinary catheterization—were evaluated during OSCE at the end of the 2023–2024 academic year. Video recordings of the students’ performances were assessed by five evaluators: a real-time human assessor, two video-based expert human assessors, and two AI-based systems (ChatGPT-4o and Gemini Flash 1.5). The evaluations were based on standardized checklists validated by the university. Data were collected from 196 students, with sample sizes ranging from 43 to 58 for each skill. Consistency among evaluators was analyzed using statistical methods. Results AI models consistently assigned higher scores than human evaluators across all skills. For intramuscular injection, the mean total score given by AI was 28.23, while human evaluators averaged 25.25. For knot tying, AI scores averaged 16.07 versus 10.44 for humans. In basic life support, AI scores were 17.05 versus 16.48 for humans. For urinary catheterization, mean scores were similar (AI: 26.68; humans: 27.02), but showed considerable variance in individual criteria. Inter-rater consistency was higher for visually observable steps, while auditory tasks led to greater discrepancies between AI and human evaluators. Conclusions AI shows promise as a supplemental tool for OSCE evaluation, especially for visually based clinical skills. However, its reliability varies depending on the perceptual demands of the skill being assessed. The higher and more uniform scores given by AI suggest potential for standardization, yet refinement is needed for accurate assessment of skills requiring verbal communication or auditory cues.
... To assess the agreement between different sleep measurement modalities, Bland-Altman plots 42 were created by plotting the differences between paired measurements against their mean. These plots help visualize the differences between two modalities, revealing any consistent bias and the range where the measurements align. ...
Preprint
Full-text available
Study Objectives: Sleep plays a crucial role for mental health. This study examines sleep tracking in naturalistic settings for patients with major depressive episodes (MDE) using actigraphy, smartphone data, bed sensors, and the ecological momentary assessment (EMA) and assesses discrepancies between these modalities. Methods: We measured sleep onset, offset, and total sleep time (TST) over two weeks for 172 participants, including healthy controls and three MDE subgroups (borderline personality disorder, major depressive disorder, and bipolar disorder). Agreement between measurement modalities was assessed using Bland-Altman plots and Pearson correlation. Predictors of sleep alignment were analyzed using mixed-effects models, accounting for demographics, daylight length, and participant subgroup. Results: Patients showed greater sleep variability than healthy controls. Actigraphy overestimated TST compared to bed sensors (0.48 min) and smartphones (0.99 min), while the smartphone underestimated TST compared to other modalities. Older age improved alignment between actigraphy and bed sensors, as well as smartphone and bed sensor sleep offset. TST alignment (smartphone vs. bed sensor) was worse in females and bipolar/borderline patients. Longer daylight duration improved TST and sleep offset alignment across modalities. Conclusions: Our study highlights measurement biases, seasonal effects, and demographic factors associated with discrepancies in objective sleep measures. While these modalities show potential and offer several advantages in assessing sleep over longer periods, the discrepancies and factors associated with misalignment should be considered in future studies or clinical settings.
... Since the data were not normally distributed, they were log-transformed before calculating the ICCs. In addition, Bland-Altman plots [2] of the difference in results between trial 1 and trial 2 against the mean of trial 1 and trial 2 for each participant were generated to calculate the 95 % limits of agreement (LOA). The presence of heteroscedasticity in the datasets was objectively assessed by plotting the absolute differences against the means of trials 1 and 2 and calculating the correlation coefficient [1]. ...
Article
Full-text available
Postural stability provides important data on sporting and health outcomes. A new, affordable, portable balance mat (BM) must be properly evaluated against the force plate (gold standard) for postural stability assessment. We aimed to assess the reliability and comparability of a new BM for assessing postural sway compared with a force plate. Seventeen participants (age range: 18-67 years; 8 males, 9 females) performed nine balance tests, with two trials performed for each test. Balance mat and centre of pressure (COP) data from a force platform were collected simultaneously during the tests. Correlation analyses were performed to test the strength of the relationships between the BM and COP data and the reliability of the balance mat. Spearman's rank-order correlation coefficients between the BM and COP data ranged from 0.63 to 0.79. The intraclass correlation coefficients of the BM data between the two trials ranged from 0.78 to 0.84. The 95 % ratio limits of agreement of the BM data ranged from 4.28 to 13.23 times the difference between the two trials. The results suggest that the balance mat has good relative reliability for assessing postural sway but poor absolute reliability, and the measurements are comparable to the postural sway data from a force plate. Clinicians can use the BM to screen people at risk of poor health and sporting outcomes inexpensively and easily and refer them for further evaluation.
... Bland-Altman analysis was employed to evaluate measurement consistency for intra-and inter-rater agreement and to identify the presence of systematic error. 21 The Standard Error of Measurement (SEM) and the Smallest Detectable Change (SDC) were calculated to assess measurement reliability and the minimal change beyond measurement error. SEM was computed using the formula: SEM = SD × √(1 − r), where SD represents the standard deviation, and r denotes the reliability coefficient. ...
Article
Introduction The aim of the study was to evaluate the intra- and inter-rater reliability, and validity of the Trunk Control Measurement Scale (TCMS) for tele-assessment in children with cerebral palsy (CP). Method A cross-sectional study was conducted with 36 children aged 4–18 years, diagnosed with hemiplegic CP. Participants underwent four TCMS assessments: in-person assessment, tele-assessment via videoconferencing, and two video-based tele-assessments scored by same rater and by a second rater. Reliability was analyzed using intraclass correlation coefficients (ICC). Discriminant validity was assessed by comparing TCMS tele-assessment scores between children with Gross Motor Function Classification System (GMFCS) levels I and II, while criterion validity was evaluated by examining the correlation between face-to-face and tele-assessment TCMS scores. Results Excellent reliability was observed between face-to-face and tele-assessment (ICC: 0.91; 95%CI: 0.83–0.95). TCMS tele-assessment also demonstrated excellent intra-rater reliability (ICC: 0.90, 95%CI: 0.80–0.94) and high inter-rater reliability (ICC: 0.82, 95%CI: 0.66–0.90). Criterion validity was confirmed by strong correlations between face-to-face and tele-assessment scores ( r = 0.925, and r = 0.892, p < 0.001 for rater-1 and rater-2, respectively). The TCMS successfully discriminated children by functional levels, demonstrating discriminative validity ( p = 0.002). Bland-Altman analysis revealed minimal systematic error, with internal consistency remaining high across all assessments (>0.88). Discussion TCMS is a valid and reliable tool for teleassessing trunk control in children with hemiplegic CP. These results may pave the way for developing child-specific, targeted telerehabilitation programs, bringing telerehabilitation closer to its primary aim of ensuring equal opportunities. This study was registered as a clinical trial (NCT06707831). https://clinicaltrials.gov/study/NCT06707831
... All statistical analyses were conducted using Python 3.11.11. A Bland-Altman plot analysis was performed to assess agreement between the three CKD-EPI equations further, evaluating mean differences and potential biases in eGFR estimation [48]. This method provided insight into the interchangeability and reliability of the equations across different CKD stages. ...
Preprint
Full-text available
Abstract: Chronic Kidney Disease (CKD) is a progressive condition that requires accurate diagnosis and staging for effective clinical management. Conventional CKD diagnosis relies on estimated Glomerular Filtration Rate (eGFR), a measure of kidney function derived from serum biomarkers such as serum creatinine (SCr) and cystatin C (SCysC). However, eGFR calculations may be inaccurate when applied to diverse patient populations. This study proposes a machine learning (ML) system that integrates regression-based eGFR estimation, metaheuristic optimization using the Grey Wolf Optimizer (GWO), and multi-class classification with various ML models to enhance CKD staging and classification. The model estimates eGFR using three established CKD Epidemiology Collaboration (CKD-EPI) equations incorporating SCr, SCysC, and their combined values. Regression models assess predictive performance, specifically Linear Regression (LR) and Support Vector Regression (SVR). SVR demonstrates superior performance compared to LR for CKD-EPISCr-SCysC\text{CKD-EPI}_{\text{SCr-SCysC}} achieved a root mean squared error (RMSE) of 3.03, a mean absolute percentage error (MAPE) of 2.97%, and a coefficient of determination (R2\text{R}^2) score of 0.97. The application of GWO for hyperparameter tuning has resulted in a 37.3% reduction in root mean square error (RMSE), a 37.4% drop in mean absolute percentage error (MAPE), and a 2.06% improvement in R2\text{R}^2 to improve the precision of prediction. Once the model fine-tunes the eGFR estimations, it feeds them into various algorithms for CKD stage classification, including Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). Among these, XGBoost achieves the highest classification accuracy of 97.76%, along with an F1-score of 97.45%, demonstrating its effectiveness in CKD staging. Shapley Additive Explanations (SHAP) provide global and local feature importance insights, enhancing clinical decision-making and model transparency. Future research will validate the model using more extensive and more diverse datasets. Additionally, it will incorporate extra clinical parameters, including biomarkers and genetic data, to enhance the precision of CKD risk prediction. This research enhances AI-driven nephrology by providing a scalable, interpretable, and highly accurate solution for diagnosing and managing CKD.
... The agreement between the results of HMC and OT-II was examined using intraclass correlation coefficients (ICC) and Bland-Altman analysis [16][17][18][19][20]. The maximum, median, and minimum red-green mixture and yellow monochromatic equal values obtained from HMC and OT-II were examined. ...
Article
Full-text available
Purpose To determine the degree of agreement between the results of the Heidelberg Multi-Color Anomaloskop (HMC) and NEITZ anomaloscope OT-II (OT-II). Study design Retrospective. Methods The study included 53 patients who underwent color-vision testing at Kariya Toyota General Hospital between March 2019 and April 2022. The participants included 2 patients with normal color vision, 10 with protanopia, 3 with protanomaly, 22 with deuteranopia, and 16 with deuteranomaly. Color-vision testing was performed using the Ishihara Test for Colour Deficiency, Standard Pseudoisochromatic Plates Part 1, Farnsworth Dichotomous Test for Color Blindness Panel-15, HMC, and OT-II. An agreement was determined using intraclass correlation coefficients (ICC) and Bland–Altman analysis. The minimum, median, and maximum red–green mixture and yellow monochromatic values of the equal values obtained from HMC and OT-II were examined. Results The ICCs between the results of HMC and OT-II were 0.979, 0.979, and 0.985, for the minimum, median, and maximum red–green mixture and 0.943, 0.755, and 0.919 for the yellow monochromatic values, respectively (p < 0.0001 in all). In the Bland–Altman analysis, the differences were mostly within the limits of agreement. Fixed errors were observed for the maximum red–green mixture and minimum yellow monochromatic values. Proportional errors were observed for the maximum red–green mixture and yellow monochromatic values. Conclusions HMC and OT-II showed high agreement for all values in the ICC and Bland–Altman analyses. In the Bland–Altman analysis, systemic errors were observed in the maximum red–green mixture value and the minimum and maximum yellow monochromatic values.
... A reliable variable on an individual participant level needs to be repeatable and stable within participants. The performed reliability and repeatability analysis in this study includes both the intraclass correlation coefficient (3,1) (ICC) (21) and the Bland-Altman model (22,23). The ICC was calculated with the two-way mixed-effect absolute agreement model, mainly focusing on repeatability between patients (24). ...
Article
Full-text available
Introduction Eye movements have been proposed as biomarkers to track disease progression and treatment effects in neurological diseases. Before such variables are used in the clinic or in drug trials, properties such as measurement error must be documented. In this study, we assessed repeatability, reliability, and stability of fixation, smooth pursuit, and saccade measurements in patients with Parkinson’s disease, cerebellar ataxia, and healthy adults. Methods Fixation, smooth pursuit, and saccade metrics were measured in 16 patients with Parkinson’s disease, 16 patients with ataxia, and 25 healthy adults with an eye tracker (BulbiCam). The same operator repeated the measurements six times over 2 days in the patient group and two times the same day in the healthy adults. Reliability, repeatability, and stability were assessed with the intraclass correlation coefficient (ICC), Bland–Altman plots with the Agreement Index, and the Stability Index, respectively. Results Mean pupil size in the fixation test and latency, accuracy and peak velocity in the pro-saccade test were found reliable, repeatable, and stable. Mean and max fixation in the fixation test were found reliable and stable. Smooth pursuit measurements were found repeatable within patients and stable, but not reliable. Conclusion The saccade and pupil variables may be used both on a population level and for individual patient follow-up. Mean and max fixation duration may be used on the population level but used in the clinical evaluation on individual patients they need to be repeated.
... Bias and agreement were assessed using Bland-Altman analysis [25] for repeated measures. To account for repeated measures within patients, linear mixed-effects models were used to estimate within-patient limits of agreement and to compute the mean and betweenpatient standard deviation of the bias [26]. ...
Preprint
Full-text available
Background Mechanical ventilation is essential for treating respiratory failure. However, ventilator over-assistance can lead to ventilator-induced diaphragm dysfunction (VIDD), and inadequate assistance can necessitate excessive effort, which can be detrimental to lung mechanics and damage diaphragm function. Current monitoring methods face clinical implementation challenges due to invasiveness and complexity. This study introduces and validates a novel non-invasive real-time respiratory muscle pressure (N-Pmus) monitoring method. Methods 1) The bench study involved developing a non-invasive, real-time respiratory muscle pressure monitoring algorithm (N-Pmus) based on respiratory mechanics equations and validated against an ASL5000 lung simulator across 270 clinical scenarios. 2) A clinical validation was conducted as a self-randomized controlled study(n = 23) comparing N-Pmus with the Pmus derived from simultaneously monitored esophageal pressure (Pes-Pmus) to assess correlation and agreement. The association between N-Pmus and the established Pmus benchmarks was analyzed using linear mixed-effects models. Bias and agreement were evaluated through Bland–Altman analysis for repeated measures. Results 1) The bench study demonstrated that N-Pmus correlated well with ASL5000-Pmus, with marginal R²=0.993 and conditional R²=0.997. The bias was − 0.23 cmH₂O, with limits of agreement ranging from − 1.51 to 1.04 cmH₂O. 2) The clinical validation revealed strong N-Pmus/Pes-Pmus agreement with marginal R²=0.97 and conditional R²=0.971. The bias was − 0.2 cmH₂O, with limits of agreement ranging from − 2.22 to 1.83 cmH₂O. Conclusions N-Pmus, a novel, non-invasive real-time monitoring method, demonstrates a strong correlation and agreement with the established Pmus benchmarks (ASL5000-Pmus or Pes-Pmus), offering an effective means of assessing respiratory effort in mechanically ventilated patients. Clinical trial retrospectively registered with www.chictr.org.cn . Registration number : ChiCTR2300076940, registered 24 October 2023.
... The theoretical difference in log copies/mL for a pool of three as compared to an individual result is log 10 (1/3) or -0.48 log copies/mL. Two method comparison approaches, the Bland-Altman method 45,46 and Passing-Bablok regression 47 , were conducted to compare the difference in viral load measurements for specimens tested both in pools of three and individually to this theoretical difference. These analyses were conducted for specimens with results over the range 500 copies/mL to 5,000,000 copies/mL, which is the range over which the manufacturer reports that the assay was designed to achieve an inter-assay standard deviation of less than or equal to 0.25 copies/mL 48 , Viral load values were log-transformed prior to these analyses due to the non-normal distribution. ...
Article
Full-text available
For the 30 million people living with HIV on antiretroviral therapy, routine viral load testing is recommended to monitor treatment effectiveness. However, only an estimated 77% of eligible people accessed viral load testing in 2022, due to barriers including the high costs of tests. Here we assessed implementation of pooled testing to increase viral load testing efficiency at a reference laboratory in Cameroon. Plasma specimens were tested in pools of three using the Abbott RealTime HIV-1 Viral Load assay. For pools with HIV-1 detected, each specimen was then tested and reported individually; for the negative pools, the pooled result was reported with no further testing. From July to December 2023, results for 12,396 specimens tested in pools were produced using 6,797 assays, or 0.55 assays per result, with 3.6% (449) reported as unsuppressed (> 1,000 copies/mL), enabling an additional 5,409 people (+ 80%) to have test results. When testing pools of three, the limit of detection per specimen increases from < 40 copies/mL to < 120 copies/mL, with only an estimated 0.01% of specimens with results of ≥ 1,001 copies/mL (unsuppressed) having results misclassified as suppressed. These results demonstrate that pooled testing can be an efficient and accurate approach to increase access to viral load monitoring.
... The lobar ventilation for all lobes across the patient cohort was compared to Galligas PET and between the CTVI methods using a Bland-Altman analysis [32]. ...
Article
Full-text available
Background Computed Tomography (CT) ventilation imaging (CTVI) is an emerging ventilation imaging technique. CTVI implementations have been widely validated against alternative ventilation imaging techniques but have been limited to clinical research only. The first CTVI commercial product, CT LVAS (4DMedical, Melbourne, Australia), was recently released enabling its use in clinical practice. This study quantitatively compares ventilation images from CT LVAS and previously validated research CTVI algorithms to Galligas PET ventilation. Methods 16 patients with Galligas PET and paired inhale/exhale breath-hold CT images were taken from a publicly available dataset on The Cancer Imaging Archive. Ventilation images were produced using CT LVAS and two previously published algorithms: (1) utilising the Hounsfield Unit difference (CTVI_HU); and (2) utilising the Jacobian determinant (CTVI_Jac). CTVI images were compared to the reference standard Galligas PET using Bland-Altman analysis of lobar ventilation, voxel-wise Spearman correlation, and Dice similarity coefficient (DSC) of regions of interest representing the top 85% and 15% of ventilation function. Results Bland-Altman analysis showed overall bias of < 0.01% for all CTVI methods (95% confidence interval: ±7.4% for CT LVAS, ± 9.1% for CTVI_HU, ± 7.9% for CTVI_Jac). The mean Spearman correlation between CTVI and Galligas PET was 0.61 ± 0.14 (p < 0.01) for CT LVAS, 0.68 ± 0.10 (p < 0.01) for CTVI_HU, and 0.57 ± 0.15 (p < 0.01) for CTVI_Jac. The mean DSC for the top 85% was 0.91 ± 0.03 for CT LVAS, 0.92 ± 0.02 for CTVI_HU, and 0.91 ± 0.03 for CTVI_Jac, with the DSC for CTVI_HU significantly higher than the other two CTVI methods. The DSC for the top 15% was 0.47 ± 0.17 for CT LVAS, 0.53 ± 0.16 for CTVI_HU, and 0.47 ± 0.18 for CTVI_Jac. Conclusions In a comparison to Galligas PET ventilation imaging, CT LVAS performs similarly to previous CTVI methods. Bland-Altman analysis for quantification of lobar ventilation demonstrates negligible bias. Mean voxel-wise Spearman correlations are moderate to good. DSC of functionally thresholded lung regions are similar for all CTVI methods. These results warrant further investigation of CT LVAS as a readily available ventilation imaging tool in disease characterisation, lung health assessment, and surgical and targeted treatment planning. Trial registration Australian New Zealand Clinical Trials Registry (ANZCTR) registration number ACTRN12612000775819, registered on 23/07/2012.
... Measures of within-subject variation are advocated to be the most important type of reliability measure in sports medicine and science, such as standard error of measurement, coefficient of variation, and limits of agreement (Bland & Altman, 2010;Hopkins, 2000;Koo & Li, 2016;Weir, 2005). The test reliability in our study was assessed through a test-retest design using two standard stimuli (at 70%VO 2max and 80%VO 2max ) separated by one week. ...
Article
Endurance runners need to self-regulate their pace continuously in a race so that the ideal performance can be sustained without fatigue. Hence, we are interested in validating an approach to measure individual perceptual acuity ability using just noticeable differences (JND) in a physical stimulus, and its related psychophysiological demands. Fifteen male runners ( M age = 34.27, SD = 6.91 years) first performed a maximal treadmill test to determine the speed of a standard exercise bout for the JND trials. The JND trials consisted of four 5-min running bouts on a treadmill with 5-min rest between bouts. For bouts 1 and 3, participants ran at the standard stimulus (SS) pace, but for bouts 2 and 4, they adjusted their speeds to achieve a level of exertion at a JND above/below the SS. They achieved differences in the final 30 seconds of the VO 2 between each JND bout and the previous SS at just above (JND-A) and just below (JND-B) the JND perceived exertions. We assessed the JND approach validity by intraclass correlation coefficient (ICC), coefficient of variation (CoV), concordance correlation coefficient (CCC), Bland-Altman and Cohen’s d for VO 2 of two standard stimuli within each JND trial. All validity statistical tests indicated a high level of concordance and agreement within both JND at 70%VO 2max and 80%VO 2max (ICC = .896 and .940; CoV 2.77 and 2.05; CCC = .889 and .936; respectively); with low standard error of measurement and of the estimate (1.261 and 1.0105; 1.6932 and 1.3868; respectively) (all p = .05). The data also showed a high level of agreement since the measures are within 95% limits in each JND trial. Our findings established the validity and reproducibly of the JND approach to identify the perceptual acuity ability applied to endurance male runners.
... To answer the research question, a sample of sixteen elite male volleyball players (mean [range] = 18 [16][17][18][19][20] years, mean height = 196.59 ± 9.44 m & mean weight = 86.92 ...
Article
Full-text available
Training load monitoring and performance testing are crucial components within high-performance sports. In volleyball, jump height both contributes to the training load and is an important performance measure to monitor. In practice, various different measurement systems, each with their own methods of estimation, are used. Therefore, the aim of this study is to compare the accuracy of different approaches in order to provide a “best practice” suggestion. To answer this research question, sixteen elite male volleyball players (16-18 years) completed several jumps for seven different (sport-specific) jump types. The jumps were measured with the following systems: three direct flight time (FT) based systems (force plate, high-speed camera, and Optojump), two wearable inertial measurement units (IMUs; Vert and Kinexon), two take-off velocity (TOV) based systems (force plate and high-speed camera), and two direct displacement systems (high speed camera and Yardstick). Validity was examined with the (standardized) typical error of the estimate and Bland Altman statistics with Optojump as the Benchmark system. The results show that FP FT, Kinexon, Vert and Yardstick can be use almost interchangeable showing trivial or small differences in sTEE (.10- 0.35). In contrast Video TOV, FP TOV and Video disp were less accurate with medium to large sTEE differences (0.66-1.31). In practice, care should be taken when different measurement systems are used alongside one another given different degrees of agreement between different measurement systems.
... For the DunedinPACE clock, the mean difference was interpreted as the mean difference in the pace of ageing. To standardise the difference across various DNAm clocks, the mean absolute percentage error (MAPE) was calculated to compare across DNAm clocks44 . Box plots were used to visualize the difference of DNAm ages between EPICv1 and EPICv2 and significance differences were tested by Wilcoxon Signed-Rank test. ...
Article
Full-text available
This study aims to evaluate differences between Infinium MethylationEPIC (EPICv1) and Infinium MethylationEPICv2 (EPICv2) arrays in estimating DNAm age with eleven DNAm clocks using buffy coat, peripheral blood mononuclear cell (PBMC), and saliva from 16 healthy middle-aged individuals. DNAm ages were estimated using six principal component-based (PC) clocks (PCHorvath1, PCHorvath2, PCHannum, PCPhenoAge, PCGrimAge, and PCDNAmTL) and five non-PC clocks (DunedinPACE, DNAmFit, YingCausAge, YingAdaptAge, and YingDamAge) across all biological samples. Agreement between arrays was assessed using Spearman correlation, Bland-Altman plots, and Wilcoxon Signed-Rank test. The 16 individuals with median age of 48 [43.5;53.8] years, were predominantly female, Chinese and non-smokers. High correlations (ρ > 0.8) were observed between EPICv1 and EPICv2 except for DunedinPACE, YingDamAge and YingAdaptAge. PC-based clocks showed lower systematic bias (MAPE:0.118-8.98%) compared to non-PC-based clocks (MAPE:5.31-21.2%). Saliva samples demonstrated greatest variability between arrays. EPICv2 introduces systematic biases especially in non-PC-based clocks and between different biological samples.
... Passing-Bablok regressions and Bland-Altman plots for method comparison are reported in Figure 3. Briefly, the two datasets show good agreement, according to the slope values in the Passing-Bablok regressions that are close to 1, and to the 95% limits of agreement (LoA) of the Bland-Altman plots that include the null bias for testosterone and androstenedione [68]. However, 9 of 16 Comparison of absolute concentrations measured for endogenous testosterone, androstenedione, and cortisol in 94 male serum samples, using a one-point internal calibration strategy versus a conventional external calibration approach. ...
Article
Steroids are a major set of endogenous bioactive compounds. Although increasingly popular, their analysis in biofluids by LC‐MS is associated with enduring challenges, such as their low endogenous concentrations or the coexistence of numerous isobaric compounds. Their natural presence in biological matrices complicates their absolute quantification in blood, as the obtention of a blank matrix to establish an external calibration curve is impossible. This protocol describes a strategy for developing an LC‐MS/MS method for the extended profiling of steroids in serum and plasma, including as much as 171 target compounds, with the additional absolute quantification of four main steroids (cortisol, testosterone, progesterone, and androstenedione). The proposed sample preparation involves protein precipitation in organic solvents and subsequent filtration of the sample on HLB cartridge. The LC method is developed to resolve most isobaric species thanks to a biphenyl stationary phase. MS detection is performed in multiple reaction monitoring mode with post‐column addition of ammonium fluoride to enhance sensitivity. A one‐point internal calibration strategy is presented for the absolute quantification of endogenous steroids. The application of this method to the NIST Plasma Reference Material (SRM 1950) led to the identification of 69 distinct endogenous steroids, making it the most comprehensive profiling of these compounds in this reference matrix to date. The quantitative performance of the method is assessed with two certified materials and shows satisfactory precision and trueness.
... 40 Bland-Altman analysis provides a more appropriate measure by assessing both bias and variability, offering a clearer and more comprehensive picture of test agreement. 41 Table 4 summarizes the results of this analysis and Picterus JP's accuracy for detecting TSB ≥ 250 μmol/L across different settings. ...
Article
Full-text available
Objective To evaluate the performance of Picterus Jaundice Pro (Picterus JP), a mobile health device for neonatal jaundice (NNJ) screening, in Mexican newborns. Methodology A cross-sectional study was conducted from January 2023 to June 2024 at a hospital in a resource-limited setting in Mexico. The main outcomes were Picterus JP measurements and total serum bilirubin (TSB) levels. Results A total of 177 term newborns, aged 1 to 14 days, were enrolled. Picterus JP showed a significant positive Spearman’s rho correlation with TSB (ρ = .68), sensitivity of 85.7% and specificity of 80.1% to detect TSB ≥ 250 using Picterus cut off value of 202 µmol/L. However, it tended to underestimate higher bilirubin levels. Conclusion Picterus JP shows potential to be a useful tool for NNJ screening, particularly in resource-limited areas. Further validation across more diverse populations and clinical environments, alongside accuracy improvements, is necessary to enhance its utility and support wider implementation. Registered in ClinicalTrials.gov (https://clinicaltrials.gov/study/NCT06276582)
... While a strong correlation may suggest a relationship, it does not guarantee high reliability. To visually assess inter-assessment agreement, Bland-Altman plots were used, showing the difference between two measurements against their mean [17]. The limits of agreement, defined as ± 2 standard deviations from the mean difference, were plotted to demonstrate the range within which 95% of differences would fall, assuming a normal distribution. ...
Article
Full-text available
Objective Speech matrix tests offer information about a person's capacity to comprehend speech in noisy environments, which is an essential component of everyday communication, in contrast to pure tone audiometry, which primarily assesses hearing sensitivity. This study aimed to assess the test–retest reliability of the Arabic Speech Matrix test. Methods This is a prospective cohort study that included three groups: normal hearing individuals, cochlear implant users, and those using hearing aids. Seventy‐five participants were included in the study. The test was administered in two different settings with noise presented from various angles. The test was re‐administered to participants after a 7–14 days interval, and Intra‐class Correlation Coefficient (ICC) and Bland–Altman plots were used to evaluate reliability. Results Moderate to excellent reliability was demonstrated, with higher consistency observed among hearing‐impaired groups using cochlear implants and other devices. Minor learning effects were noted in the normal hearing group, with better reliability observed in the left setting. Conclusion The Arabic Speech Matrix test demonstrated strong test–retest reliability overall, indicating that it can be successfully incorporated into regular clinical audiological evaluations. Level of Evidence 4
... As there is no consensus on CCC thresholds, especially when considering repeated measurements, the value of 0.68 observed by Parker et al. (2020) was considered as an acceptable agreement. Second, as originally proposed by Bland and Altman (1986), the limits of agreement were derived from a linear mixed model based on the paired differences between devices with random intercept for participants and condition. All confidence intervals were obtained through a bootstrap procedure (1000 replications). ...
... Each item was examined to detect dif according to 2 variables (Aryadoust et al., 2019): gender (female/male) and age (≥70 and <70 years). Comparisons of 2 groups were carried out using the Bland and Altman limits of agreement (Bland & Altman, 1986). ...
Article
Full-text available
Objective. To evaluate by Rasch analysis the validity of the FRAIL scale for measuring frailty in older people in Colombia. Methods. Cross-sectional study including 2506 people aged ≥60 years living in Bucaramanga, Medellín, Pereira, Popayán, and Santa Marta in 2021. Guidelines for analysis of the FRAIL scale were followed and the Raschmodel was used with adjustment of response categories, items and people, differential functioning of the items, dimensionality, local independence of the items, Wright reliability, and adjustments of the infit and outfit mean squares. Results. Overfitting of the weight loss item was identified although it did not compromise the unidimensionality or the total score. Wright reliability was 0.80; the measure explained 45.2% of the variance in raw scores. Conclusions. TheFRAIL scale is a valid tool for assessing frailty in elderly people. It is unidimensional, reliable and unbiased by age forthe frail state but not for the prefrail condition. Inclusion of the gender variable and categorization of the age variable with 70 years as cut-off point are suggested.
... This comparison aimed not only to evaluate the performance of the suggested methodology against standard laboratory methods but also to assess how it compares to the traditional, human interpretation of test strips. Statistical analyses, including Pearson correlation coefficients (Pearson, 1895) and Bland-Altman plots (Bland & Altman, 2010), were used to quantify the level of agreement between the datasets. These analyses provided insights into the validity of the suggested colorimetric method for water quality assessment, aligning with established practices for evaluating agreement between measurement techniques (Giavarina, 2015). ...
Article
Full-text available
Effective water quality monitoring is important for environmental protection and public health, yet conventional field and laboratory methods each present significant limitations. Field tools such as colorimetric test strips offer affordability and accessibility but are prone to subjective interpretation and environmental variability. In contrast, laboratory-based techniques provide high precision but are costly, resource-intensive, and less feasible in decentralized contexts. This study presents a hybrid human–machine methodology that improves the accuracy and reproducibility of colorimetric test strip analysis while maintaining field-level accessibility. A total of 34 water samples collected along a 7-km stretch of Seunggi Stream in Incheon, South Korea, were analyzed using a web-based platform that extracts RGB values from images of test strips and reference charts. To translate color into concentration, the system calculates Euclidean distances between test strip colors and known reference values, then applies inverse distance weighting (IDW) to interpolate continuous estimates from the closest matches. This approach overcomes the limitations of discrete reference charts, enabling more precise and reproducible readings without the need for complex machine learning models. Validation against standard laboratory methods revealed strong correlations (r > 0.85 for pH, lead, and total hardness), supporting the reliability of the approach. Spatial trends in pollutants were successfully mapped, demonstrating the method’s utility for environmental monitoring. This cost-effective, scalable solution bridges the gap between subjective field testing and laboratory precision, offering a practical tool for resource-limited settings, citizen science, and preliminary assessments. Future research will refine analyte-specific accuracy and expand applicability to more diverse conditions.
... Spearman's correlation was used to assess correlations between estimated and measured VO 2 max within and between conditions. Finally, Bland-Altman plots showing mean differences and limits of agreements (LoA) were used to assess random and systematic errors between estimated and measured VO 2 max [23,24]. A power analysis indicated that a sample size of 28 participants was required to receive sufficient statistical power to detect a meaningful difference within the group, combining both male and female participants (i.e., a power of 0.80 and a significant level set to 0.95). ...
Article
Full-text available
Purpose Usage of beta-blockers may influence heart rate response during exercise testing. The study aimed to investigate the impact of Propranolol, a non-selective beta-blocker, on the predictive validity of a commonly used submaximal cycle test, the Ekblom-Bak test (EB-test), for the estimation of maximal oxygen uptake (VO2 max). Methods The study was a double-blinded crossover study including 28 participants (14 women), aged 21–59 years. VO2 max was estimated by the EB-test and measured during a maximal incremental exercise test during placebo and 10 mg Propranolol beta-blockade condition. The EB-test estimates VO2 max based on the difference in heart rate, ΔHR, between two work rates (4 min cycling on each), a factor corresponding to the higher rate, heart rate at the lower rate, and age. Results Maximal heart rate (mean ± SD; 165.5 ± 16.5 vs. 181.4 ± 9.8 bpm), ΔHR (36.4 ± 13.2 vs. 43.0 ± 11.4 bpm), and measured VO2 max (48.2 ± 6.2 vs. 50.4 ± 7.0 ml/kg/min, 3.55 ± 0.74 vs. 3.67 ± 0.71 L/min) were significantly lower in the beta-blockade condition compared to placebo (P = < .001). This led to an overestimation of VO2 max by the EB-test during beta-blockade, + 0.377 L/min (95% CI 0.281–0.473 L/min, + 11.2%), with no overestimation seen in the placebo condition, + 0.030 L/min (95% CI (− 0.112)–0.172 L/min + 0.8%). The coefficient of variance, indicating variance of estimated VO2 max on an individual level, was lower during beta-block condition, 6.6%, compared to placebo, 9.9%. Conclusion The EB-test systematically overestimated VO2 max compared to measured VO2 max during beta-blockade condition. Future research is needed to refine the test equation for usage in populations using beta-blockers.
... Continuous and categorical descriptive statistics are reported as mean (SD) and n (%), respectively. The absolute agreement between STS power methods was assessed by calculating an intraclass correlation coefficient (ICC) using the icc R package (v2.4.0) and constructing a Bland-Altman plot [42] using the blandr (v0.6.0) and ggplot2 (v3.5.2) R packages. The ICC and Bland-Altman methods are recommended to evaluate the agreement between continuous variables [43]. ...
Article
Full-text available
Background/Objectives: Muscle power, estimated from the sit-to-stand (STS) test, is an important indicator of physical function (PF) in aging adults. Therefore, its assessment may be implemented into future clinical practice. The agreement between different STS power assessments is unknown, and the associations between methods and PF outcomes have not been compared. Methods: A total of 49 aging adults (mean age = 60.9 ± 10.9; 67% female) participated in this cross-sectional study. STS power from a validated equation (EQ) and a linear position transducer (LPT) were estimated. Handgrip strength (HGS), timed up-and-go (TUG), usual gait speed (UGS), fast gait speed (FGS), the 400-m walk test (400MWT), and self-reported total, basic lower-body, and advanced lower-body PF were assessed. The agreement of STS power methods was assessed with an intraclass correlation coefficient (ICC) and a Bland–Altman plot. Multiple linear regression evaluated the associations between STS power and PF outcomes. Results: EQ and LPT STS power demonstrated only moderate agreement (ICC = 0.69). EQ STS power was independently associated with TUG (β = −0.45), UGS (β = 0.37), FGS (β = 0.48), 400MWT (β = −0.55), self-reported total (β = 0.30), basic lower-body (β = 0.30), and advanced lower-body PF (β = 0.30), but not HGS (β = 0.14). LPT STS power was independently associated with HGS (β = 0.44), FGS (β = 0.40), 400MWT (β = −0.51), self-reported total (β = 0.31), basic lower-body (β = 0.29), and advanced lower-body PF (β = 0.32), but neither TUG (β = −0.26) nor UGS (β = 0.28). Conclusions: EQ and LPT STS power demonstrate limited agreement, and EQ STS power may be a superior indicator of PF in aging adults. Future research should examine the feasibility of implementing STS power tests in clinical settings to screen and refer patients with low muscle power to effective therapeutic interventions.
... For this reason, torque profiles were calculated for the static and dynamic conditions from 0 to 154°. The repeatability of the trials without exoskeleton under static and dynamic conditions, respectively, was checked using Bland-Altman diagrams (Martin Bland and Altman, 1986) to average these trials of each condition. These averages were subtracted from the total torque measured in the corresponding condition with exoskeleton. ...
Article
Full-text available
Adjusting the assistive torque of upper limb occupational exoskeletons is essential to optimize their effectiveness and user acceptance in companies. This adjustment enables a balance to be struck between the expected benefits and potential undesirable effects associated with their use, particularly for the shoulder joint, which is sensitive to the balance of forces. Despite this, no study has yet evaluated these assistive torques in static and dynamic conditions representative of work situations. The aim of this article is therefore to evaluate these assistive torques under these two conditions, using an isokinetic dynamometer. Angular velocities ranging from 0 to 240°/s and four levels of assistance were investigated. The results showed that the maximum assistive torques in flexion (energy restitution phase) were lower than those in extension (tensioning phase) by 20 to 36% and were median in static conditions. It was also observed that the level of assistance and the exoskeleton opening angles had a strong impact on the assistive torques, unlike the angular velocity in dynamic conditions, which had a minimal effect. Quantifying these assistive torques is crucial for assessing their biomechanical impact and adjusting the exoskeleton’s assistance to the operator and the task performed.
... The Bland-Altman graphical analysis revealed that these methods agreed well with radiography, as the data were well dispersed within the limits of agreement, with only one measurement falling outside these limits. In addition, the mean difference between the methods and radiography was approximately zero [22,23]. However, these two methods presented proportional errors, despite presenting lower RMSE values ( Table 2) and higher coefficients of determination ( Table 4). ...
Article
Full-text available
Background: Evidence supporting the validity of photogrammetry for assessing body segment alignment remains limited, with most studies focusing on spinal evaluation. Thus, there is a lack of robust research examining its use for other body segments such as the lower limbs. Objective: This study aimed to evaluate the concurrent validity of three photogrammetric methods for measuring knee alignment in the sagittal plane with and without corrections for potential rotational deviations in the participant’s thigh and leg. Methods: A total of 21 adults underwent sequential evaluations involving panoramic radiography of the lower limbs and photogrammetry at a private radiology clinic. Photogrammetric analysis involved identifying the following anatomical landmarks: the greater trochanter of the femur (GTF), the lateral condyle of the femur (LCF), the head of the fibula (HF), and lateral malleolus (LM). Three photogrammetric methods were employed: (1) the condylar angle (CA) defined by the GTF, LCF, and LM points; (2) the fibula head angle (FHA) defined by the GTF, HF, and LM points; and (3) the four-point angle (4PA) incorporating the GTF, LCF, HF, and LM. Concurrent validity was assessed using correlation analysis, agreement with radiographic measurements, and the root mean square error (RMSE). Each photogrammetric method was tested using raw (CA, FHA, and 4PA) and corrected (CAcorr, FHAcorr, and 4PAcorr) values, accounting for thigh and/or leg rotational deviations. Results: Correcting for thigh and leg rotations significantly improved the validity metrics for all methods. The best performance was observed with the corrected condylar angle (CAcorr: r = 0.746; adjusted r² = 0.533; RMSE = 2.9°) and the corrected four-point angle (4PAcorr: r = 0.733; adjusted r² = 0.513; RMSE = 3.0°); however, the measurements presented proportional errors, possible due the method of assessment of rotations. Conclusions: The findings validate the evaluated photogrammetric methods for assessing sagittal knee alignment. Accounting for thigh and leg rotational deviations is critical for achieving accurate measurements, raising the need of accurate tools for measuring rotational changes in the lower limbs to avoid errors.
... There were influential outliers for weight and derived BMI, we therefore winsorized extreme values by replacing their true values with the 1 st and 99 th percentile value. Bland-Altman plots that included means of differences between selfreported and objective measurements as well as upper and lower limits of agreement (LoA) were used to visualize agreement [33][34][35][36][37][38]. As described in prior literature [33,34,36,38], Bland-Altman plots allow for visualization of bias and the corresponding LoA, which show where 95% of the differences between measures fall. ...
Article
Full-text available
Objective Underreporting of weight and overreporting of height is consistently shown among women, thereby reducing accuracy of estimation of body mass index—and thus obesity—in epidemiologic studies that rely on self-reported data. Additionally, misreporting has been shown to differ by socioeconomic status and race and ethnicity, which can result in differential misclassification and bias that can obfuscate associations with obesity across groups in multiethnic and socioeconomically varying populations. Therefore, we sought to assess agreement between self-reported and objectively measured weight, height, and derived body mass index (BMI) across levels of educational attainment within racial and ethnic groups in a population-based cohort of US women. Methods Among self-identified White, Black, and Latina women enrolled in the Sister Study (2003–2009), we assessed mean differences in self-reported vs. objectively measured weight, height, and derived BMI. Using adjusted linear and multinomial logistic regression, we compared measurement error among participants reporting some college/vocational school or ≥college vs. ≤high school. We assessed BMI agreement using Bland-Altman plots and weighted kappa (k) statistics. Results Among 18,638 participants (age: mean ± standard deviation = 56 ± 9.0 years), 84% identified as White, 10% Black, and 5% Latina. Approximately half (49%) attained a college education. Weight and height were generally underreported. Higher underreporting of weight among participants with ≥college vs. ≤high school was of larger magnitude among Black and Latina vs. White participants. Bland-Altman results revealed that agreement in continuous BMI was good among White participants but generally fair among Black and Latina participants. Categorical BMI agreement was consistently high with minor variation by race and ethnicity and educational attainment (weighted k range = 0.92–0.93). Conclusions Despite higher measurement error in weight among Black and Latina participants with ≥college education, self-reported and objectively measured BMI categories showed high agreement across groups. Results support the utility of self-reported data that reliably estimate BMI category across socioeconomic, racial, and ethnic groups in this cohort.
... Based on the study by Koyachi et al. (21,22), this study used 3D surface analysis and the Bland-Altman method to determine the accuracy of genioplasty using CAD/CAM in combination with MR technology (50). Traditionally, positional differences of <2 mm and orientational differences of <4° were considered clinically insignificant in the interpretation of accuracy in orthognathic surgery. ...
... For the volunteer cohort 1, Bland-Altman analysis [24] was used to determine the limits of agreement and repeatability coefficient for each sequence. Fat fraction estimates from different T2w-and PDw-TSE Dixon sequences (volunteer cohort 2) were compared using correlation plots to assess agreement. ...
Article
Full-text available
Objective This study aimed to assess the accuracy of fat fraction estimation with clinically available Dixon sequences in normal-appearing marrow and bone metastases in the pelvis of metastatic prostate cancer patients. Methods A prospective single-centre study was conducted with metastatic prostate cancer patients and healthy volunteers. Linearity and bias of fat fraction estimates from clinically available Dixon sequences were assessed against a 6-point PDw gradient echo (q-Dixon) sequence measuring the reference standard proton density fat fraction. Lesion fat fraction estimates were cross-compared using the Friedman test. Repeatability in volunteers was evaluated with Bland-Altman plots. Sensitivity of fat fraction estimates using TSE-Dixon sequences to specific absorption rate (SAR) related modifications were evaluated with correlation plots. Results Thirty-three patients were recruited for this study. Significant ( p < 0.05) absolute bias (12.4%) was demonstrated in the T1-weighted (T1w) Dixon measurements against the q-Dixon. Significant differences ( p < 0.05) between fat fraction estimates provided by the T1w Dixon and PDw Dixon sequences were observed in 13 active and 6 treated lesions. Repeatability coefficients for fat fraction estimates ranged from 5.9 to 9.0% in the pelvic tissues of healthy volunteers. Reduction of slice number with repetition time for SAR had the greatest effect, reaching a maximum difference in fat fraction of 14.7% from the q-Dixon for the T2w-TSE Dixon in bone marrow. Conclusions T1w Dixon methods can detect post-treatment changes but remain confounded by relaxation time biases. While all Dixon methods showed good repeatability, careful choice of SAR-related modifications is critical to maintaining accuracy for PD- and T2-weighted TSE sequences. Key Points Question The clinical validity of signal-weighted fat fraction estimates versus proton density fat fraction for characterising metastatic bone lesions has not been fully assessed. Findings T1-weighted Dixon sequences in line with whole-body MRI international guidelines demonstrate significant fat fraction bias, particularly in lesions and muscle. Clinical relevance Fat fraction estimation using T1-weighted Dixon sequences recommended in international guidelines are highly sensitive to relaxation time biases, making underlying physiological changes potentially ambiguous. Graphical Abstract
... Additionally, manuals should specify the reference methods used to enable proper comparison and interpretation of measurement results. Also, the shortcoming of this study is also reflected in the lack of a "sufficiently" large sample when applying the Bland-Altman analysis method, where recommendations for a minimum number of respondents range from 100 to 200 (Bland and Altman, 1986), and even from 100 to 300 (according to Brtková et al., 2014). Furthermore, this study focused exclusively on female student athletes, a physically active population engaged in both academic and extracurricular sports activities. ...
Article
Nowadays, due to their widespread application, various models of bioelectrical impedance analysis (BIA) devices have become available. However, differences in multiple parameters among these BIA devices pose a significant challenge despite their common underlying principle. The primary aim of this research was to assess the correlation, agreement, and differences in specific body composition parameters-body fat percentage (BF%) and muscle mass percentage (Muscle%)-measured by three distinct BIA devices (Omron BF300, Omron BF511, and InBody 770). Measurements were conducted by the same examiner, on the same day, in a sample of 35 women aged between 21 and 26 years. Participants’ baseline characteristics (age, body height, body mass, body mass index) were recorded, alongside their body composition values (BF% and Muscle%) obtained using the three, i.e. two BIA devices, respectively. Data analysis employed SPSS 26.0 software, applying descriptive statistics, Pearson’s correlation coefficient, concordance coefficient, repeated measures ANOVA, paired sample t-test, and Bland-Altman plots. Results demonstrated statistically significant correlations (p=0.000) for BF% across all three BIA devices, and for Muscle% between Omron BF511 and InBody 770. Additionally, a significant correlation (p=0.000) with high concordance was noted (W=0.939 for BF%, W=0.926 for Muscle%), although significant differences among devices were also evident (p<0.001). Given the absence of a universally accepted reference method and considering the prior validation of InBody devices against DXA, the InBody 770 is recommended for accuracy when possible. However, for field assessments requiring portable solutions, the Omron BF511 is preferred due to its practicality in terms of size, weight, and portability.
... The group level MDC (MDC group ) can be calculated by dividing the individual level MDC by the square root of the sample size [22]. We constructed Bland-Altman plots with 95% Limits of Agreement to assess the agreements [23]. The mean (95% CI) difference between two digital self-assessments as well as between digital self-assessment and in-person physiotherapist assessment were computed using paired t-tests. ...
Article
Full-text available
Objective: The 30-second Chair Stand Test (30s CST), a valid and reliable test evaluating lower extremity physical function, has been integrated into a digital eHealth program. We aimed to evaluate the agreement (inter-rater reliability) between digital self-assessment and in-person physiotherapist assessment as well as intra-rater test-retest reliability of digital self-assessment among persons with hip or knee osteoarthritis. Design: Eligible participants with hip or knee osteoarthritis were identified from the digital treatment database. The 30s CST was performed through a digital self-assessment and in-person physiotherapist assessment. The inter-rater reliability study was conducted at a physiotherapy clinic and for the intra-rater test-retest reliability, the participants performed the digitally self-assessment test twice in their home. Results: The inter-rater reliability one-day study, included 18 participants (mean age 67 years and 89% females) and one physiotherapist. The intra-rater test-retest, separated by 10-14 days, included 54 participants (mean age 69 years, 78% females). There were, on average, 1.5 (95% CI 0.6 to 2.4) more self-reported sit-to-stand repetitions for the digital self-assessment compared with in-person physiotherapist assessment. The digital self-assessment of 30s CST showed low to excellent inter-rater reliability with an intraclass correlation coefficient (ICC) of 0.87 (95% CI 0.47 to 0.96) and good to excellent intra-rater test-retest reliability, ICC 0.88 (95% CI 0.79 to 0.93). Bland-Altman plots suggested good levels of inter- and intra-rater reliability. Conclusion: Results suggest that the 30s CST can be measured digitally as a self-administered and self-reported measurement of lower extremity physical function in older adults with hip and/or knee osteoarthritis.
... A Bland-Altman plot is a statistical method used to evaluate the agreement between two measurement techniques [38]. In this case, the Bland-Altman plot was applied to compare the expert manual counts and the cell counting algorithm's performance, as shown in Figure 8a. ...
Article
Full-text available
Chikungunya virus, a member of the Alphavirus genus, continues to present a global health challenge due to its widespread occurrence and the absence of specific antiviral therapies. Accurate detection of viral infections, such as chikungunya, is critical for antiviral research, yet traditional methods are time-consuming and prone to error. This study presents the development and validation of an automated image processing algorithm designed to improve the accuracy and speed of high-throughput screening for potential anti-chikungunya virus compounds. Using MvTec Halcon software (Version 22.11), the algorithm was developed to detect and classify infected and uninfected cells in viral assays, and its performance was validated against manual counts conducted by virology experts, showing a strong correlation with Pearson correlation coefficients of 0.9807 for cell detection and 0.9886 for virus detection. These values indicate a high correlation between the algorithm and manual counts performed by three virology experts, demonstrating that the algorithm’s accuracy closely matches expert manual evaluations. Following statistical validation, the algorithm was applied to screen antiviral compounds, demonstrating its effectiveness in enhancing the throughput and accuracy of drug discovery workflows. This technology can be seamlessly integrated into existing virological research pipelines, offering a scalable and efficient tool to accelerate drug discovery and improve diagnostic workflows for vector-borne and emerging viral diseases. By addressing critical bottlenecks in speed and accuracy, it holds promise for tackling global virology challenges and advancing research into other viral infections.
... For other outcomes, Spearman correlation coefficient was used to estimate the relationship between two variables; a correlation greater than 0.6 was considered as strong. The Bland-Altman plot was also used to compare the 2 methods [16]. Briefly a Bland-Altman plot consists of a plot of the difference between paired readings of two variables (i.e. ...
Article
Question: The reference test for the functional evaluation of pulmonary fibrosis (PF) during exercise is the 6-minute walk test (6MWT). However, the 6MWT involves temporal and spatial constraints that the 1-minute sit-to-stand test (1-MSTST) does not have. Previous studies have not validated 1-MSTST use in this context, mainly because of far less oxygen desaturation. We hypothesize that the modified 1-MSTST (m1-MSTST), taking into account the recovery phase, could compensate this shortcoming. Patients and Methods: Randomized, cross-over, single-center trial conducted in 36 patients with PF. 6MWT and 1-MSTST were performed 30 min apart for each patient in a randomized order. An equivalence test was performed on the saturation (SpO2) nadir. Results: The 36 patients included 8 idiopathic pulmonary fibrosis, 5 nonspecific idiopathic pneumonia, 8 collagen tissue disease-associated PF, 4 hypersensitivity pneumonitis, 2 sarcoidosis and 9 other PF. MeanSD nadir desaturation was 84.94.3 % at 6MWT and 883.5 % at m1-MSTST with a strong correlation between both tests. 33 patients (91.7%) had concordant result in the two tests regarding significant desaturation (SpO2 delta > 4% or nadir < 88%) known as prognosis factor. Conclusion: The m1-MSTST, taking into account the recovery phase, is a sensible compromise to the 6MWT in measuring exercise performance in people with PF. As many clinical endpoints are transferring from hospital to outpatient care, m1-MSTST is technically easier and more practical for patients. Further studies are warranted for determination of minimal clinically important difference and norms in healthy subjects. on March 25, 2025 by guest. Please see licensing information on first page for reuse rights.
... Correlation coefficients of 0-0.19 are interpreted as very weak, 0.2-0.39 as weak, 0.4-0.59 as moderate, 0.6-0.79 as strong, and 0.8-1 as very strong [40]. For graphical description of the agreement between the 2 systems ("up&go app" and reference measure), Bland-Altman plots were used [41], including the lower and upper limits of agreement (LLoA and ULoA). From the data collected during the first measurement ("up&go app" and reference measure), ICC 3,1 were calculated to analyze the agreement between the 5 repetitions. ...
... To assess significant enlargement, changes in the AC between the two time points were compared. The Bland-Altman method was used to determine the cut-off value for AC enlargement [18,19]. A significant change was defined by an increase exceeding 1.96 times the intersession standard deviation (SD), which corresponds to the 95% confidence interval (CI) for the true measurement value. ...
Article
Full-text available
Background To compare the longitudinal change of choroidal microvascular dropout (CMvD) and its relationship with glaucoma progression between primary open angle glaucoma (POAG) and pseudoexfoliation glaucoma (PXG). Methods The analysis included 114 eyes of 114 patients, with 57 POAG and 57 PXG eyes matched by age and visual field (VF) mean deviation (MD). The angular circumference (AC) of CMvD was measured using the en-face choroidal layer images of optical coherence tomography angiography at baseline and the final follow-up. Progression of CMvD was defined as an increase in AC beyond measurement variability (-3.85° to + 3.28°) or the appearance of new CMvD during follow-up. Glaucoma progression was determined by MD change rate. Results The prevalence of CMvD was significantly higher in POAG than in PXG eyes (68.4% vs. 43.9%, p = 0.008) at baseline. However, by the final visit, the difference in prevalence between the groups was not significant (68.4% vs. 56.1%, p = 0.178). During the study period, seven PXG eyes developed new CMvD. There was no significant difference in MD progression rate between the stable and progressed CMvD subgroups in POAG (–0.7 ± 0.8 dB/year vs. − 0.8 ± 0.5 dB/year, p = 0.715). In contrast, PXG eyes showed a significantly faster MD progression rate in the progressed CMvD subgroup than in those with stable CMvD subgroup (–0.4 ± 0.7 dB/year vs. − 1.2 ± 0.8 dB/year, p = 0.010). Conclusions The progression of CMvD was more frequently observed in PXG eyes than POAG eyes and showed association with faster VF progression in PXG eyes.
Article
Background: The accuracy of measurement of cardiometabolic outcomes in terms of gaseous exchange and energy expenditure of individuals is crucial. The objective of this study was to compare the validity and reliability of the PNOE¯ in measuring cardiometabolic outcomes from the respiratory gaseous exchange of healthy individuals during treadmill walking exercise. Methods: A total of 21 healthy subjects (15 male and 6 female) aged 22.76 ± 3.85 years took part in this study. Oxygen uptake (VO2), carbon dioxide production (VCO2), respiratory exchange ratio (RER), metabolic equivalents (METs), tidal volume (VT), and energy expenditure (EE) were measured using the PNOE¯ and COSMED K5 portable systems during a twenty-eight-minute, four-stage incremental protocol, where speed increased from 1.7 mph to 4.2 mph with a 2% incline on a treadmill. Test–retest reliability was tested on separate days with trail repetition. Validity was evaluated by Bland–Altman plots, intraclass correlation coefficients (ICCs) and mean percentage difference. Results: ICCs showed that VCO2 was in the good range (0.75–0.90). The ICC of the RER from stages 1 to 3 of the incremental protocol and the VT from stages 2 to 4 of the incremental protocol showed good to excellent reliability. No clear trend was seen for VO2, VCO2, and EE datapoints with variations in speed. Pearson’s correlation coefficients were moderately high (r = 0.60–0.79) between VO2, VCO2, RER, METs, VT, and EE measured by the PNOE¯ and K5 systems. All subjects, except for a few cases in VT, were within the upper and lower 95% confidence intervals of the acceptable range of the Bland–Altman plots. Conclusions: The PNOE¯ system is a valid and reliable measure of cardiometabolic outcomes and is comparable to the COSMED K5 system.
Article
Background and Objectives: The benefit of carotid artery stenting (CAS) for stroke prevention has been established, but less is known about CAS’s effect on cognition. Here, we investigate (1) changes in the blood flow in both treated and non-treated carotid arteries, (2) associations between the severity of artery occlusion and CAS-induced flow change, and (3) whether the flow changes relate to cognitive improvement. Materials and Methods: We used quantitative flow magnetic resonance imaging to assess blood flow and computerized neurocognitive assessment to evaluate cognitive performance. Fourteen patients identified for CAS as part of their standard care participated in this study; ten completed the CAS procedure and the pre- and post-CAS assessments (age = 77.0 ± 5.6; 70% males). Results: An increased ipsilateral flow following CAS was seen in 70% of the participants, while 50% also showed an increase in the total flow. The participants with ≥90% stenosis showed the greatest flow changes, such that the post-CAS flow was 60% higher relative to pre-CAS (p < 0.050). Cognitive responses to the flow increase were variable: attention showed a positive association; in comparison, higher cognitive flexibility and memory were only seen when treated stenosis was below 80%. Conclusions: Our preliminary findings highlight the impact of CAS and the complex relationship between blood flow and cognitive changes post-CAS, warranting larger-scale studies with extended follow-up periods.
Article
Full-text available
El estudio tuvo como objetivo evaluar la precisión de la fórmula de Hadlock para estimar el peso fetal mediante ecografía en mujeres embarazadas atendidas en la Clínica Ibarra durante 2022 y 2023. Se empleó un diseño cuantitativo observacional con un análisis de regresión lineal para comparar las estimaciones ecográficas con los pesos reales al nacer. La muestra incluyó gestantes con diferentes edades gestacionales, permitiendo evaluar la eficacia del método en distintos contextos clínicos. Los resultados indicaron una correlación significativa entre el peso fetal estimado y el peso neonatal, validando la fiabilidad del método. Además, el análisis de Bland-Altman mostró una adecuada concordancia entre las estimaciones y los pesos al nacer, con un margen de error clínicamente aceptable. Se identificaron factores como la edad gestacional y las variaciones individuales que influyeron en la precisión de las estimaciones. A pesar de las ligeras discrepancias en algunos casos, la fórmula de Hadlock demostró ser una herramienta confiable para el monitoreo del crecimiento fetal. Se recomienda su uso continuo en la práctica clínica por su eficacia y facilidad de aplicación. Asimismo, se sugiere complementar las evaluaciones con otros parámetros ecográficos y clínicos para minimizar errores en situaciones complejas y mejorar la atención obstétrica.
Article
The primary aim of this study was to determine the reliability and divergent validity of several weighted physical assessments for the Army, including the counter movement jump (CMJ), plyometric push-ups (PPU), an incremental fire and movement assessment (IMFA), and a repeated sprint ability (RSA) test. Male infantry soldiers (n 5 30) completed the CMJ, PPU, IFMA, and RSA during both unweighted and weighted conditions with a 48-hour interval between sessions, and then repeated the tests during a weighted condition after a 7-day washout period. Intraclass correlation coefficient (ICC) and coefficient of variation (CV) assessed between-session reliability. Divergent validity between weighted and unweighted conditions was determined using Pearson’s correlation coefficient (r), with correlation effect size (ES) calculated between the r-values using a Fisher Z-transformation. Good test-retest reliability and divergent validity were demonstrated for most CMJ (ICC 0.50–0.99, CV% 1.18–7.73, ES 0.50–0.69), PPU (ICC 0.61–0.99, CV% 1.03–12.33, ES 0.31–0.68), RSA (ICC 0.50–0.94, CV% 1.34–8.41, ES 0.37–0.75), and IFMA (ICC 0.65–0.94, CV% 2.80–10.99, ES 0.32–0.39) measures. It was concluded that the weighted CMJ, PPU, IMFA, and RSA were reliable tests for Army-specific fitness to determine combat task readiness. Good divergent validity between weighted and unweighted conditions for most test measures supported practitioner’s use of weighted assessments for Army-specific capability, while unweighted assessments were recommended for fitness optimization and monitoring training for Army personnel.
Article
Continuous monitoring of physiological parameters in non-human primates (NHPs) necessitates a precise, non-invasive, and convenient method. This study aimed to validate the use of smartwatches with integrated pulse oximetry and heart rate (HR) monitoring capabilities for use in NHPs. Currently, the clinical standard for non-invasive continuous monitoring of peripheral oxygen saturation (SpO 2 ) in NHPs has been the use of a transmittance pulse oximeter (TPO) affixed to a location of highly vascularized tissue. In a clinical setting, HR is monitored through electrocardiogram (ECG) or associated with SpO 2 measurement from a TPO probe utilizing photoplethysmography technology. Challenges in obtaining precise readings with TPOs stem from technological limitations and probe placement restrictions. To address these limitations, simultaneous HR and SpO 2 measurements were obtained from 15 cynomolgus macaques ( Macaca fascicularis ) using the Apple Watch 7 (AW 7), Apple Watch 9 (AW 9), and a clinical-grade TPO probe with integrated optical HR measurement technology (iM70, ELAN). Arterial blood gas (ABG) analysis was used as a reference method for SpO 2 . We found that a TPO device significantly underestimated SpO 2 compared to the AW 7 and AW 9 when referenced against ABG values. Smartwatch-derived HR and SpO 2 measurements demonstrated good agreement and minimal bias compared to the gold standard method. Overall, the AW 7 and AW 9 exhibited good agreement with clinical reference standards for HR and good agreement with the gold standard for SaO 2 in sedated cynomolgus macaques.
Article
Aim This study evaluates the accuracy of a newly derived blood volume estimation formula based on the Boer equation for lean body mass, comparing its performance against the Nadler, Allen and Lemmens-Bernstein-Brodsky formulas. Methods Blood volume estimation was evaluated using two datasets: the Retzlaff dataset, based on 78 healthy individuals, and the Allen dataset, derived from 81 subjects, two of European descent, the remainder Chinese ‘medical, nursing and pedagogic students, technicians, clerks and family members’ and one young Chinese physician. The formulas were compared using robust statistical methods, including the Wilcoxon Signed-Rank Test, permutation tests, Bland-Altman analysis, and Proportion Within Range. Results Across all methods, the formula derived from the Boer equation showed the narrowest limits of agreement and smallest variability in most metrics, highlighting its potential as the most accurate and clinically useful tool for blood volume estimation. The Nadler formula also performed well but with slightly larger errors and bias. Conclusion This study highlights the limitations of the Allen formula and demonstrates the superior performance of the Boer formula, which is derived from lean body mass. While the Allen formula performed well on its original dataset, it showed higher variability and less accuracy on more modern data. Both the Nadler and Boer formulas exhibited greater precision, with the Boer formula showing slightly lower variability. The study emphasizes the importance of using independent data sets for validation and addresses a critical gap in blood volume assessment by using robust techniques for analysis.
Article
Full-text available
Ciprofloxacin, a fluoroquinolone antibacterial agent, is not recommended in pediatric population on account of its possible adverse effect on growing cartilage. It is being commonly used for treatment of variety of infections in children in our country and very little information is available on the risks involved in its use. A questionnaire was sent to 750 pediatricians in the last week of November 1990, to retrospectively judge over the previous 2 month period the extent of its use and identify the adverse drug reactions (ADRs). One hundred and fifty-four pediatricians replied, of which 147 had prescribed ciprofloxacin in a total of 3341 patients under 18 years of age, enteric fever being the commonest indication for its use. One hundred and fifty-nine ADRs were reported in 104 (3.1%) patients. They were: gastrointestinal in 50% of these 104 patients, CNS in 23%, skin and allergic in 19.1%, musculoskeletal in 8.6%, hematological in 3.8%, CVS in 2.9% and nephrological in 0.9% cases. Of 159 ADRs, 8 (5%) were severe, 76 (47.8%) were moderate and 75 (47.2%) were mild. Therapy needed discontinuation in only 9 (0.3%) patients. Two new ADRs were identified, viz., sudden death after intravenous ciprofloxacin and sinus nodal arrest causing bradycardia.
Article
Full-text available
Preproduction and current models of the miniature Wright peak flow meter have been compared with the standard Wright peak flow meter on normal and abnormal subjects. Early problems in production appear to have been overcome, and the current model agrees to within 3% with the standard peak flow meter, which is as close as the agreement between two standard instruments. The new mini-meter may be enclosed in a case, making direct comparisons with other instruments possible.
Article
Methods of analysis used in the comparison of two methods of measurement are reviewed. The use of correlation, regression and the difference between means is criticized. A simple parametric approach is proposed based on analysis of variance and simple graphical methods.
Article
The accuracy of the Nellcor N-101 pulse oximeter has been evaluated in adult patients receiving general anaesthesia or intensive care. Readings obtained noninvasively with this instrument were compared with measurements made on arterial blood using a Radiometer OSM2 oximeter. The pulse oximeter was easy to use and within the range tested (70–100 percent saturation of haemoglobin with oxygen) the readings were within I digit of the values obtained by in vitro measurement.
Article
Full textFull text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (361K), or click on a page image below to browse page by page. 242 243
Article
Seventy-three low birthweight babies were independently assessed for gestational age using the scoring system of Dubowitz et al. (1970) and 5 neurological reflexes described by Robinson (1966). The results obtained by the 5 reflexes were compared with those obtained by the scoring system and were found to be accurate estimations of gestational age. The 5 reflexes may be used for babies of gestational ages 29 to 37 weeks, but above 37 weeks the scoring system must be used.
Article
The relation between pre-treatment blood-pressure and the fall in pressure after treatment was examined for most classes of antihypertensive drugs. Positive correlations were demonstrated for all drugs, for placebo, and for bed rest. This suggests that for all manoeuvres response is related to the height of the pretreatment pressure. Substitution of the pre-treatment and achieved pressures by random numbers reveals that positive correlations are mathematically inevitable and do not indicate any action on a basic mechanism of essential hypertension. After statistical correction for mathematical associations between the variables the apparent effects were generally lost. A correlation between the pre-treatment value of any variable and its change after a therapeutic intervention thus may not be valid.
Article
M mode echocardiographic anteroposterior indexes of left ventricular function derived from long and short axis parasternal planes were compared in one hundred cases. In all the disease groups studied the paired values were within acceptable statistical limits of comparability and interchangeability; that is they were within two standard deviations of the mean difference in both directions. Values from either plane can usually be considered as being representative of the expected values for the individual.
Article
The accuracy of the Nellcor N-101 pulse oximeter has been evaluated in adult patients receiving general anaesthesia or intensive care. Readings obtained noninvasively with this instrument were compared with measurements made on arterial blood using a Radiometer OSM2 oximeter. The pulse oximeter was easy to use and within the range tested (70-100 percent saturation of haemoglobin with oxygen) the readings were within I digit of the values obtained by in vitro measurement.
Article
Normal pregnant women are resistant to the pressor effect of intravenously administered angiotensin II (AII), but women who are destined to develop hypertensive complications in pregnancy show an increased sensitivity to AII several weeks before the onset of the first clinical symptoms. In 231 normotensive nulliparous women (age 25 +/- 5 years), an angiotensin sensitivity test (AST) was performed between weeks 28 and 32 of gestation. If an effective angiotensin pressor dose (APD) of less than 10 ng . kg-1 . min-1, is considered to be a positive test result, 58 subjects had a positive AST and 173 had a negative AST. Twenty-six of 34 women who ultimately developed pregnancy-induced hypertension (PIH) or preeclampsia had a positive test, and the diagnosis was made early. Each of the eight pregnant subjects with a false negative test developed only a mild form of the hypertensive disorder. In this series, 11 women had a premature onset of labor; eight of them also had an APD of less than 10 ng . kg-1 . min-1. The study confirms the high predictive value of negative test results. Therefore, the AST can be used as an appropriate method for identifying women who are destined to develop hypertensive complications in pregnancy. However, because of the low practicability of the test, it may not be recommended as a screening method in routine prenatal care.
Article
We assessed the effect of self-administration of a disease-specific health-related quality of life instrument, the Inflammatory Bowel Disease Questionnaire (IBDQ), on score results. Patients were assessed at two visits in two tertiary centers. "Experienced" patients (N = 31) with Crohn's disease had previously completed the IBDQ several times while "novices" (N = 37) with Crohn's disease or ulcerative colitis had no prior exposure to the IBDQ. At each visit a self-administered IBDQ followed by a nurse-administered IBDQ (score range, 1-7; absolute score range, 32-224) and disease activity were assessed. At visit 1, the mean rates of discrepant responses between nurse and self-administered scores were 24 +/- 15% in experienced patients and 34 +/- 17% in novice patients (p = 0.018), which fell to 21 +/- 16 and 23 +/- 10%, respectively, by visit 2 (p = NS). However, discrepancy rates were not significantly different between novice and experienced patients when adjusted by center. Discrepancies occurred randomly in all 32 IBDQ items. Eighty percent of all discrepant responses differed by only one grade of a seven-point Likert scale. Baseline self-administered scores for all patients were 4.80 +/- 1.24 (absolute score, 153.0 +/- 39.9). Mean score differences at each visit (nurse minus self) were very small, ranging from 0.029 to 0.136, and would not be considered clinically important. Intraclass correlation coefficients between the nurse and self-administered IBDQ and the four dimensional scores were > or = 0.97 by visit 2, indicating excellent concordance and minimal observer error. Mean changes in score over time were of comparable magnitude for both self (0.320 +/- 0.819) and nurse (0.260 +/- 0.831) assessments. We conclude that the IBDQ may be reliably used as a self-administered instrument in clinical trials.
Article
Data have emerged that provide the scientific basis for therapeutic drug monitoring of mycophenolic acid (MPA) in transplant patients receiving mycophenolate mofetil (MMF), the parent drug, in combination with other immunosuppressive agents. There is a significant relationship between the dose-interval MPA AUC and risk for acute rejection based on retrospective investigations in renal and heart transplant patients and on prospective investigations in renal transplant patients. The MPA dose-interval AUC varies naturally by more than 10-fold in renal and heart transplant patients. Other significant sources of pharmacokinetic variability for MPA include the effects of concomitant medications, and the effects of disease states such as renal dysfunction and liver disease on the steady state MPA AUC. Individualized MMF dose evaluation, guided by MPA plasma concentrations, is becoming the standard of practice at a growing number of transplant centers worldwide because of these factors and because of the need to closely evaluate the immunosuppression afforded by MPA when a change in the immunosuppression regimen in stable transplant patients is planned. Investigations of therapeutic drug monitoring strategies with an emphasis on identifying an optimal abbreviated sampling strategy for MPA AUC estimation are ongoing. Based on the concentration-outcome studies and experience at the authors' institutions and other centers, the authors propose a set of therapeutic drug monitoring guidelines for MPA in stable renal and heart transplant patients for the immediate (first 3 months posttransplant) and maintenance (>3 months) periods. When MPA binding to human serum albumin is altered, as occurs in patients with significant renal dysfunction, liver disease, or a substantial reduction in human serum albumin concentration, the possibility of increased MPA free fraction and free concentration will need to be taken into account in the interpretation of MPA total concentrations.
Article
Cyclosporin was introduced into clinical practice in the early 1980s and has since been shown to prolong survival for transplant recipients. Because cyclosporin is a narrow therapeutic index drug and there are significant consequences associated with ‘subtherapeutic’ and ‘supratherapeutic’ concentrations, cyclosporin therapy is monitored as part of routine patient follow-up. However, the optimal method for the therapeutic drug monitoring of cyclosporin has yet to be defined. Currently, the most common method involves monitoring pre-dose trough concentrations, but this method is less than ideal. Other methods of monitoring cyclosporin therapy include monitoring the area under the concentration-time curve, limited sampling strategies, monitoring of single concentrations other than troughs and pharmacodynamic monitoring. Bayesian forecasting has been used successfully in clinical practice with other drugs with narrow therapeutic indices. However, few studies are available regarding Bayesian forecasting and cyclosporin. Existing studies are preliminary in nature and involve the old Sandimmun® formulation rather than the Neoral® formulation. Although these methods show promise, they have not gained widespread acceptance. This is because of their impracticality and the lack of prospective studies comparing other monitoring methods with trough concentration monitoring. Further comparative studies evaluating the impact of the specific monitoring method on definite patient outcomes are warranted.
Article
The purpose of this study was to characterize the pharmacokinetic parameters of mycophenolic acid (MPA) in Korean kidney transplant recipients. Plasma MPA concentrations of 10 Korean kidney transplant recipients administered a lower dose of mycophenolate mofetil (MMF; 750 mg twice a day) were measured at 2 weeks of MMF therapy by high-performance liquid chromatography (HPLC). The plasma MPA concentration-time curve pattern of patients taking lower doses of MPA was consistent with previously reported profiles of patients taking the fully recommended doses. The plasma MPA concentration-time curve was characterized by an early sharp peak within 1 hour and a small second peak in some patients at 4 to 12 hours postdose. The mean C(max) and AUC were 8.73 +/- 4.65 microg/mL and 18.45 +/- 4.25 microg*h/mL, respectively. The mean fraction of free MPA was 1.60% +/- 0.23%. Patients' age, weight, body surface area, and renal function did not influence the AUC. The free fraction of MPA appeared not to be affected by serum albumin and renal function when creatinine clearance was above 40 mL/min. Regression analysis between each plasma concentration and AUC for the limited sampling strategy of MMF therapeutic drug monitoring demonstrated that the concentrations of predose and 1- and 8-hour postdose were positively correlated with AUC (r = 0.74545, p = 0.0133; r = 0.68485, p = 0.0289; and r = 0.63636, p = 0.0479, respectively). The pattern of the concentration-time profile of MPA in Korean kidney recipients was similar to the results of other studies performed in Caucasians, although there was interindividual variability of AUC, C(max), and t(max). MPA concentrations of predose and 1- and 8-hour postdose were positively correlated with AUC.
Article
To investigate the pharmacokinetics of mycophenolic acid (MPA) in Chinese adult renal allograft recipients, and to generate the validated model equations for estimation of the MPA area under the plasma concentration-time curve from 0 to 12 hours (AUC(12)) with a limited sampling strategy. The pharmacokinetics in 75 Chinese renal allograft recipients treated with mycophenolate mofetil 2 g/day in combination with cyclosporin and corticosteroids were determined. The MPA concentration was assayed by high-performance liquid chromatography at pre-dose (C(0)) and at 0.5 (C(0.5)), 1 (C(1)), 1.5 (C(1.5)), 2 (C(2)), 4 (C(4)), 6 (C(6)), 8 (C(8)), 10 (C(10)) and 12 (C(12)) hours after dosing on day 14 post-transplant. Patients were randomly divided into: (i) a model group (n = 50) to generate the model equations by multiple stepwise regression analysis for estimation of the MPA AUC by a limited sampling strategy; and (ii) a validation group (n = 25) to evaluate the predictive performance of the model equations. The mean MPA AUC(12) was 52.97 +/- 15.09 mg . h/L, ranging from 24.0 to 102.3 mg . h/L. The patient's age and serum albumin level had a significant impact on the MPA AUC(12). The correlation between the pre-dose MPA trough level (C(0)) and the MPA AUC(12) was poor (r(2) = 0.02, p = 0.33). Model equations 7 (MPA AUC(12) = 14.81 + 0.80 . C(0.5) + 1.56 . C(2) + 4.80 . C(4), r(2) = 0.70) and 11 (MPA AUC(12) = 11.29 + 0.51 . C(0.5) + 2.13 . C(2) + 8.15 . C(8), r(2) = 0.88) were selected for MPA AUC calculation in Chinese patients, resulting in good agreements between the estimated MPA AUC and the full MPA AUC(12), with a mean prediction error of +/-10.1 and +/-6.9 mg . h/L, respectively. In Chinese renal allograft recipients, MPA pharmacokinetics manifest substantial interindividual variability, and the MPA AUC(12) tends to be higher than that in Caucasian patients receiving the same dose of mycophenolate mofetil. Two validated model equations with three sampling timepoints are recommended for MPA AUC estimation in Chinese patients.
Statistical methods in medical research Reproduced by kind permission of the Lancet
  • P Armitage
Armitage P. Statistical methods in medical research. Oxford: Blackwell Scientific Publications, 1971: chap 7. Reproduced by kind permission of the Lancet.
Statistical methods in medical research British Standards Institution Precision of test methods I. Guide for the determination and reproducibility for a standard test method Comparability of M-mode echo-cardiography long axis and short axis left ventricular function deri-vatives
  • P Armitage
  • Bsi
  • D London
  • P G Arbela
  • Z M Silayan
  • J M Bland
Armitage, P., 1971. Statistical methods in medical research. Blackwell Scientific Publications, Oxford (Chapter 7). British Standards Institution, 1979. Precision of test methods I. Guide for the determination and reproducibility for a standard test method (BS 5497, part 1). BSI, London. D'Arbela, P.G., Silayan, Z.M., Bland, J.M. Comparability of M-mode echo-cardiography long axis and short axis left ventricular function deri-vatives, unpublished. Fig. 6. Repeated measures of PEFR using mini Wright peak flow meter.