Content uploaded by Vitaly Polushkin
Author content
All content in this area was uploaded by Vitaly Polushkin on Apr 25, 2025
Content may be subject to copyright.
xAID Chest CT: retrospective clinical utility assessment.
Introduction
xAID is a radiological AI tool that supports findings of multiple features on tomographic images.
While a number of studies on its Chest CT module are available as preprints for peer-reviewed
journals, they all assess retrospectively formal aspects of sensitivity and specificity as defined by
consensus of three radiologists for each case. In the meantime, many studies and opinion pieces
point out that sensitivity and specificity can poorly reflect the real-world clinical value of
radiological software [1, 2].
This is especially true for modern multifeatured AI solutions that actually serve purposes other
than the detection of single pathology. It rather works as a “junior intern” for radiologists,
demonstrating the case, highlighting key areas of concern and performing routine measurements
[3]. In such cases its overall contribution of AI towards correct radiological diagnosis rather than
individual performance of separate program elements seems a more viable alternative.
This study analyzed the potential clinical benefit of AI software in the non-balanced selection of
cases as assessed by board-certified radiologists from four different European countries.
Goal
To assess the xAID Chest CT product potential impact on the clinical performance of
board-certified radiologists with real-world data.
Design
The study analyzed the performance of the research-only version of the xAID Chest CT product
that included 12 findings. All findings with default thresholds and limitations are listed in table
1. The software takes non-contrast chest CT series and sends it to the cloud, where it’s being
processed and sent back as three additional series as shown in picture 1 (A-C):
1. DICOM-SR summary series containing information on most prominent findings (or their
absence) with visual examples in anatomy-wise order
2. DICOM-SR series containing all axial slices with each pathology annotated, a scrollbar
showing the most critical pathology at a given slice and a summary side panel.
3. DICOM-SC in English containing textual information on study findings.
The images could then be reviewed with any DICOM viewer.
Four board-certified radiologists from four European countries (France, Greece, Slovakia and the
United Kingdom) were enrolled in the study on a pay-for-service basis. All of them work both
elective and emergency procedures in different hospitals and have experience with at least 1 AI
product.
They were encouraged to upload between 20 and 25 anonymized non-contrast Chest CT studies
via a specialized demo account. Sources to be used were to be selected by radiologists on their
own (e.g. open datasets, university datasets, personal case selections) with no limitations on
findings or image type expect that known patient age was suggested to be >18 years old and
maximal slice thickness of 3 mm.
Prior to uploading studies, the research participants received instructions for the use of software
and then were asked to adjust the program’s detection\reporting thresholds to fit their standards.
All radiologists participated in 30 minute training session format and were encouraged to analyze
no more than 2 studies per day to make allowance for the learning curve.
The results of each individual image were assessed by radiologist themselves using a formalized
table (see Table 2). Instructions for paper suggested using definite descriptions to be put into
column by radiologists (like “yes” or “no”). Any ambiguous answers were clarified and reduced
to binary options during a follow-up online session. The radiologists were also able to report
specific details of why they agree or disagree with the program's decision. Radiologist’s decision
on case was the sole final evaluation point, no independent analysis of radiologists’ performance
was performed.
Statistical analysis included measurement of binary prevalence with 95% confidence intervals
measured as Wilson score due to case number <100 and values closing upper and lower 20%
edges[4, 5].
Primary outcome:
AI potential contribution to establishing clinical diagnosis (as a share of cases in which AI would
have contributed to clinically significant findings based on radiologists’ judgment)
Secondary outcomes:
1. Miss rated for clinically significant findings
2. Overall Satisfaction
2. Detection rate by pathology
2. Segmentation quality (by pathology)
3. Measurement quality (by pathology)
Results
The study included 81 cases assessed by four board-certified radiologists from different
European countries. The primary outcome demonstrated that AI segmentation contributed to
establishing a clinical diagnosis in 81.5% [71.7–88.5%] of cases. Radiologists reported a 47.4%
[36.9–58.1%] miss rate for clinically significant findings.
Regarding usability, 89.7% [81.2–94.6%] of participants approved the image layout, and 94.9%
[87.8–98.0%] indicated that DICOM-SR components could be integrated into their reports with
minor modifications. AI detected findings outside routine clinical practice in 28.2%
[19.6–38.8%] of cases.
For specific pathologies, the AI demonstrated high accuracy in detecting and measuring
conditions such as pleural effusion (89.7% [81.2–94.6%]) and aortic measurements (89.7%
[81.2–94.6%]), whereas its performance was lower for pulmonary nodules (66.7%
[55.9–76.0%]) and pulmonary opacification (73.1% [62.6–81.5%]). Measurement accuracy and
visualization consistency averaged 81.3% and 81.8%, respectively, while correct detection of
normal/pathological features was lower at 74.1%.
These results highlight the AI's potential as a clinical support tool while identifying areas for
improvement, particularly in false positives for lung nodules and precision in pathology
differentiation.
Discussion
The findings of this study support the utility of the xAID Chest CT Module as a clinical
decision-support tool for radiologists. AI segmentation contributed to establishing a clinical
diagnosis in over 80% of cases, underscoring its role in enhancing radiological workflows
beyond simple pathology detection. However, limitations were observed in specific areas,
particularly in the precision of pulmonary nodule detection and differentiation of certain
pathologies.
False positives in nodule detection were frequently reported, with vessels being misclassified as
nodules, and subpleural nodules proving particularly challenging. Similarly, discrepancies in the
interpretation of age-adjustable findings, such as coronary artery calcium and vertebral
compression fractures, suggest a need for improved standardization in measurement and
classification algorithms. The lower detection accuracy for pulmonary opacification further
highlights areas for refinement, particularly in terms of feature differentiation and measurement
formalization.
Despite these limitations, radiologists found the AI-generated DICOM-SR reports useful, with
nearly 95% indicating that parts of the output could be incorporated into their clinical reports
with minor modifications. The high approval rate for the image layout (nearly 90%) suggests that
the AI’s visualization approach aligns well with radiologists’ expectations. Additionally, the AI
system identified findings that might not typically be considered in routine practice in over a
quarter of cases, suggesting its potential to improve comprehensiveness in reporting.
These results highlight the evolving role of AI in radiology—not as a standalone diagnostic tool
but as an augmentative system that enhances efficiency, consistency, and accuracy. Future
improvements should focus on optimizing nodule detection precision, refining measurement
algorithms, and addressing pathology interpretation discrepancies to further enhance clinical
applicability.
Conclusions
This study demonstrates that the xAID Chest CT Module provides meaningful clinical support
for radiologists, with AI segmentation contributing to diagnosis in over 80% of cases. The tool
effectively enhances workflow efficiency by automating routine measurements and highlighting
key findings, making it a valuable adjunct in radiological practice.
However, challenges remain in certain areas, particularly in reducing false-positive nodule
detections and improving the precision of pathology measurements. Addressing these limitations
through algorithm refinement and improved interpretability will be essential for maximizing AI’s
clinical impact.
Overall, the findings reinforce the potential of multifeatured AI systems as integral components
of modern radiology, assisting clinicians in decision-making rather than replacing their expertise.
Future research should focus on refining AI-driven pathology differentiation and expanding
real-world validation studies to further enhance its diagnostic utility.
Supplement
Table 1. Product basic functionality
Pathology Name
Measurements
Limitations
Lung nodules
Solid nodules only, detects
all nodules including
inflammatory, ≥ 4-6 mm
For patients with known malignancy
or infectious disease, size criteria for
malignancy cannot be used
Pleural effusion
Detects crescent-shaped
liquid accumulations in
gravity-dependent areas
Limited detection in pulmonary-only
series or noisy images
Pulmonary trunk
dilatation
Measures at the widest part,
normal upper limit: 29-33
mm
Some anatomical features may
obstruct the identification
Pulmonary
opacification
Detects consolidated lung
tissue areas; <0.5% volume
may be 0%
Motion artifacts limit detection;
boundaries of large consolidations
are difficult to determine
Pneumothorax
Detects gas accumulation in
pleural cavity
-
Pulmonary
emphysema
Detects voxels with CT
density ≥6%, ≤ -950 HU
Cannot differentiate between cysts,
bronchiectasis, and cavities
Coronary artery
calcification
Detects calcifications,
calculates Agatston index,
classifies severity
(CAC-DRS)
Stents may cause false positives
Pericardial and
epicardial fat
Identifies and segments
pericardial/epicardial fat, ≥
200 ml
Vessels within the pericardium may
be included in segmentation
Hydropericardium
Detects and measures ≥ 50
ml
Heartbeat artifacts limit detection
Dilatation or
aneurysm of
thoracic aorta
Measures
ascending/descending aorta
Limited accuracy in asthenic
patients, esophageal pathology, aortic
dissection, and post-surgery cases
Adrenal gland
lesions
Detects lesions ≥ 10 mm in
adrenal gland
False positives due to tumors from
nearby organs
Spinal
compression
fractures
Classifies fractures by height
reduction: <25% (Genant
0-I), 25-40% (Genant 2),
>40% (Genant 3)
Errors are possible in patients with
extensive scoliosis
Table 2. Questions for research participants.
Study #
Question
Answer options
Normal\pathologi
cal features
detected correctly
Visualized
correctly
Measured
correctly
Comments
Pulmonary nodules
Pulmonary trunk measurement
Pulmonary opacification
Pleural effusion
Spinal compression fractures
Coronary artery calcification
Pericardial and epicardial fat
Ascending and descending
thoracic aorta measurement
Adrenal gland lesions
Pericardial effusion
General reporting questions
Could AI segmentation
contribute to establishing
diagnosis in this clinical case?
What clinically significant
findings were missed by AI on
this image?
Is there clinical significance for
this case?
Did you like the layout of the
images?
In real practice, would be able to
use parts of DICOM-SR for your
own report (assuming they were
in your working language)? What
needed to be changed for you to
be able to do so?
Have the AI tool found anything
you wouldn’t look for or report
in everyday clinical practice?
Please, provide a report for this
case that you will submit in
real-world example (maybe in
working language) for the first 3
cases
Table 3. Primary outcome and general data
Question
% of positive answers
Could AI segmentation contribute to establishing diagnosis in this clinical
case?
81.5 [71.7–88.5]%
What clinically significant findings were missed by AI on this image?
47.4 [36.9–58.1]%
Did you like the layout of the images?
89.7 [81.2–94.6]%
In real practice, would be able to use parts of DICOM-SR for your own report
(assuming they were in your working language)? What needed to be changed
for you to be able to do so?
94.9 [87.8–98.0]%
Have the AI tool found anything you wouldn’t look for or report in everyday
clinical practice?
28.2 [19.6–38.8]%
Table 4. Secondary outcome (specific metrics)
Question
% or results answered positively
Measured correctly
Visualized
correctly
Normal\pathol
ogical features
detected
correctly
Pulmonary trunk measurement
85.9 [76.7–91.9]%
87.2 [78.2–92.8]%
87.2
[78.2–92.8]%
Pleural effusion
89.7 [81.2–94.6]%
89.7 [81.2–94.6]%
75.6
[65.2–83.7]%
Spinal compression fractures
88.5 [79.7–93.8]%
88.5 [79.7–93.8]%
71.8
[61.2–80.4]%
Coronary artery calcification
83.3 [73.7–89.9]%
84.6 [75.2–90.9]%
65.4
[54.6–74.8]%
Pericardial and epicardial fat
85.9 [76.7–91.9]%
85.9 [76.7–91.9]%
82.1
[72.4–88.9]%
Ascending and descending
thoracic aorta measurement
89.7 [81.2–94.6]%
89.7 [81.2–94.6]%
88.5
[79.7–93.8]%
Adrenal gland lesions
75.6 [65.2–83.7]%
75.6 [65.2–83.7]%
70.5
[59.8–79.3]%
Pericardial effusion
74.4 [63.9–82.6]%
75.6 [65.2–83.7]%
67.9
[57.1–77.1]%
Pulmonary nodules
66.7 [55.9–76.0]%
67.9 [57.1–77.1]%
66.7
[55.9–76.0]%
Pulmonary opacification
73.1 [62.6–81.5]%
73.1 [62.6–81.5]%
65.4
[54.6–74.8]%
Mean
81.28 [71.4–88.3]%
81.78
[72.0–88.7]%
74.11
[63.6–82.4]%
Images
Image 1. Visual interface of the program. A – summary series (DICOM-SC separate series). B –
vertebral compression and bone density (1st slice of general DICOM-SC series). C. Main
DICOM-SC axial series. D. Summary reports (DICOM-SR series).
A
B
C
D
Literature
1. Vasilev, Y., et al., AI-Based CXR First Reading: Current Limitations to Ensure Practical Value.
Diagnostics, 2023. 13(8): p. 1430.
2. Bernstein, M.H., et al., Can incorrect artificial intelligence (AI) results impact radiologists, and if
so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest
radiography. Eur Radiol, 2023. 33(11): p. 8263-8269.
3. Siepmann, R., et al., The virtual reference radiologist: comprehensive AI assistance for clinical
image reading and interpretation. European Radiology, 2024. 34(10): p. 6652-6666.
4. Hazra, A., Using the confidence interval confidently. J Thorac Dis, 2017. 9(10): p. 4125-4130.
5. Agresti, A. and B. Coull, Approximate is Better than “Exact” for Interval Estimation of Binomial
Proportions. The American Statistician, 1998. 52: p. 119-126.