ArticlePDF Available

WCAY object detection of fractures for X-ray images of multiple sites

Springer Nature
Scientific Reports
Authors:
  • 黑龙江大学

Abstract and Figures

The WCAY (weighted channel attention YOLO) model, which is meticulously crafted to identify fracture features across diverse X-ray image sites, is presented herein. This model integrates novel core operators and an innovative attention mechanism to enhance its efficacy. Initially, leveraging the benefits of dynamic snake convolution (DSConv), which is adept at capturing elongated tubular structural features, we introduce the DSC-C2f module to augment the model’s fracture detection performance by replacing a portion of C2f. Subsequently, we integrate the newly proposed weighted channel attention (WCA) mechanism into the architecture to bolster feature fusion and improve fracture detection across various sites. Comparative experiments were conducted, to evaluate the performances of several attention mechanisms. These enhancement strategies were validated through experimentation on public X-ray image datasets (FracAtlas and GRAZPEDWRI-DX). Multiple experimental comparisons substantiated the model’s efficacy, demonstrating its superior accuracy and real-time detection capabilities. According to the experimental findings, on the FracAtlas dataset, our WCAY model exhibits a notable 8.8% improvement in mean average precision (mAP) over the original model. On the GRAZPEDWRI-DX dataset, the mAP reaches 64.4%, with a detection accuracy of 93.9% for the “fracture” category alone. The proposed model represents a substantial improvement over the original algorithm compared to other state-of-the-art object detection models. The code is publicly available at https://github.com/cccp421/Fracture-Detection-WCAY . Supplementary Information The online version contains supplementary material available at 10.1038/s41598-024-77878-6.
This content is subject to copyright. Terms and conditions apply.
WCAY object detection of fractures
for X-ray images of multiple sites
Peng Chen1, Songyan Liu1, Wenbin Lu1,2, Fangpeng Lu1,2 & Boyang Ding1
The WCAY (weighted channel attention YOLO) model, which is meticulously crafted to identify
fracture features across diverse X-ray image sites, is presented herein. This model integrates novel
core operators and an innovative attention mechanism to enhance its ecacy. Initially, leveraging
the benets of dynamic snake convolution (DSConv), which is adept at capturing elongated tubular
structural features, we introduce the DSC-C2f module to augment the model’s fracture detection
performance by replacing a portion of C2f. Subsequently, we integrate the newly proposed weighted
channel attention (WCA) mechanism into the architecture to bolster feature fusion and improve
fracture detection across various sites. Comparative experiments were conducted, to evaluate the
performances of several attention mechanisms. These enhancement strategies were validated
through experimentation on public X-ray image datasets (FracAtlas and GRAZPEDWRI-DX). Multiple
experimental comparisons substantiated the model’s ecacy, demonstrating its superior accuracy and
real-time detection capabilities. According to the experimental ndings, on the FracAtlas dataset, our
WCAY model exhibits a notable 8.8% improvement in mean average precision (mAP) over the original
model. On the GRAZPEDWRI-DX dataset, the mAP reaches 64.4%, with a detection accuracy of 93.9%
for the “fracture” category alone. The proposed model represents a substantial improvement over the
original algorithm compared to other state-of-the-art object detection models. The code is publicly
available at https://github.com/cccp421/Fracture-Detection-WCAY .
Keywords Fracture detection, Deep learning, Attention mechanism, YOLO
Bone trauma, arising from incidents such as jostling, falls, and car accidents, is a prevalent occurrence in
modern life. It encompasses a range of injuries, including fractures, cracks, tears, and compression injuries.
Symptoms typically manifest as pain, swelling, and restricted movement, potentially leading to complications
such as nonunion and infection1. Timely diagnosis and appropriate treatment are crucial in managing bone
trauma, given the unpredictable nature of injury occurrence and variations in medical expertise among treating
physicians2. e advent of articial intelligence oers promising solutions to the clinical complexities associated
with orthopedic trauma3.
Deep learning, a pivotal subset of articial intelligence, has garnered signicant attention for its applications
in fracture detection and as a supplementary tool for clinician diagnostics4. Fracture detection primarily
utilizes X-ray and computed tomography (CT) images, with X-ray image research being particularly prevalent5.
Consequently, fracture detection within deep learning frameworks can be conceptualized as an object detection
task6.
Object detection algorithms serve the purpose of identifying both the location and class of targets within an
image7. ese algorithms predominantly rely on convolutional neural networks and are categorized into two
main types: two-stage models and single-stage models8. Two-stage models typically involve the generation of
candidate regions from the input image, followed by classication and regression9. Examples include R-CNN10,
Fast R-CNN11, and Faster R-CNN12, which are known for their higher detection accuracy. In contrast, single-
stage models simplify the problem by treating object detection as a regression task and performing global
regression-based classication13. Models such as the You Only Look Once (YOLO)14 series and RetinaNet15
directly extract class and location information without the need for candidate region generation.
Furthermore, the ongoing pursuit of improving model detection performance in neural network models for
object detection remains a prominent research focus. Enhancement strategies primarily revolve around data
augmentation and network architecture modications16. Of particular interest in network structure enhancement
is the integration of attention mechanisms, representing a current area of active exploration and research16.e
attention mechanism is a unique structure embedded in machine learning models that automatically captures
the contribution of input data to output data17. e basic principle of attentional mechanisms in computer vision
is to nd the correlation between the raw data and then emphasize some key features18, such as the squeeze-
1Heilongjiang University, Harbin 150080, China. 2These authors contributed equally: Wenbin Lu and Fangpeng Lu.
email: liusongyan@hlju.edu.cn
OPEN
Scientic Reports | (2024) 14:26702 1
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports
Content courtesy of Springer Nature, terms of use apply. Rights reserved
and-excitation (SE) attention method19, convolutional block attention module (CBAM)20, global attentional
mechanism (GAM)21, and coordinate attention (CA)22.
erefore, we use the FracAtlas X-ray dataset23, which is a collection of X-ray scan image data from
multiple parts of the hand, shoulder, leg, and foot. A generalized X-ray fracture detection model is designed.
We introduce the dynamic snake convolution C2f (DSC-C2f) operator, which is designed to eciently extract
slender fracture features. In addition, we introduce a novel WCA attention mechanism to improve the detection
accuracy. Leveraging insights from the YOLO family of single-stage detection algorithms, we develop the WCAY
fracture detection model. To improve the overall ecacy of the proposed model, we incorporated the YOLO
algorithm for dierent model sizes, including the Nano, Small and Medium models. To validate the feasibility
of the model, we trained the model on the GRAZPEDWRI-DX public dataset24. e contributions of this paper
are summarized as follows:
Leveraging dynamic snake convolution (DSConv)25, we introduce a learning residual module, DSC-C2f, ca-
pable of capturing tubular structures.
We propose a weighted channel attention mechanism (WCA).
We propose a new object detection network called weighted channel attention YOLO (WCAY) that incor-
porates some of the above attention mechanisms as well as the WCA and DSC-C2f proposed in this paper.
e feasibility of DSC-C2f, WCA and WCAY was veried with several datasets.
Related work
Fracture detection, a critical aspect of medical imaging, has seen widespread application. Guan et al.26.
utilized the R-CNN model on the MURA dataset27, achieving an average accuracy of 62.04%. Yahalomi et
al.28. demonstrated the eectiveness of a Faster R-CNN model in localizing distal radius fractures, surpassing
radiologists’ performance and oering promise in rare disease identication. Wang et al.29. introduced
ParallelNet, an R-CNN network with a TripleNet backbone, for thigh fracture detection in a dataset comprising
3842 X-ray images. Similarly, Krogue et al.30. employed a RetinaNet model utilizing DenseNet169 for automatic
detection, localization, and classication of hip fractures.
While these two-stage algorithms boast high accuracy, their speed remains a concern. Achieving a balance
between accuracy and speed is imperative. Single-stage object detection algorithms, exemplied by the YOLO
family, have emerged as signicant contributors in this realm. Li et al.31. applied the YOLOv3 model to vertebral
fracture detection, demonstrating its eectiveness. Yuan et al.32. innovatively integrated external attention and
3D feature fusion into YOLOv5 to detect skull fractures in CT images. Warin et al.33. leveraged YOLOv5 to
detect mammofacial fractures in a substantial dataset, classifying fracture conditions into frontal, midfacial, and
jaw fractures and no fractures. Mushtaq et al.34. demonstrated the prociency of the YOLOv5 model in lumbar
vertebrae localization, achieving an impressive average accuracy of 0.975. Furthermore, in pediatric wrist
fracture detection, Dibo et al.35. enhanced YOLOv7 with the CBAM attention mechanism, achieving improved
performance on the GRAZPEDWRI-DX dataset. Moreover, Rui et al.36. utilized the YOLOv8 model for wrist
fracture detection, presenting an application tailored for this purpose.
However, due to the diculties in establishing a high-quality fracture image dataset and the subjective nature
of doctors’ image annotations, a completely uniform standard does not exist, and deep learning-based fracture
diagnosis studies are usually conducted for specic fracture types37. erefore, it is particularly important to
develop a deep learning model for fracture detection that is applicable to various types of images and dierent
fracture sites.
Proposed method
YOLOv8 Architecture
Redmon et al.14. introduced the YOLO architecture in 2015 for real-time detection, aiming to address target
detection as a regression challenge. is approach involves directly mapping coordinates and class probabilities
from image pixels to bounding boxes using a single neural network model. YOLOv838, the latest iteration
proposed by Glenn Jocher, represents a signicant improvement over YOLOv539. Notably, YOLOv8 replaces the
C3 module with the more ecient C2f module, which features a CSP bottleneck with two convolutions instead
of three, along with adjustments to the number of channels. Moreover, the head section undergoes modications
to employ the decoupled head technique, separating classication and detection tasks.
Weighted-channel-attention-YOLO fracture detection network
To address issues such as inaccurate fracture detection, excessive model parameters, large model sizes, and
limited detection sites in traditional networks, this study introduces a novel X-ray fracture detection model
named WCAY (shown in Fig.1). Leveraging YOLOv8s as the baseline network, we incorporated the DSC-C2f
core operator into the network backbone to enhance the model’s sensitivity to elongated and curved tubular
structures typical of fractures. Additionally, we integrate a self-developed attention module (WCA) into the neck
network to enable the model to prioritize abnormal regions while suppressing non anomalous areas, thereby
enhancing overall performance.
DSC-C2f Module
e YOLOv8s network architecture incorporates numerous C2f modules, which are primarily tasked with
learning residual features. erefore, the network’s performance is heavily reliant on the eectiveness of these C2f
module features. Given the signicant variations in fracture morphology, location, and size—particularly with
crack-like fractures, which exhibit diverse shapes and sizes—the original C2f module may struggle to adequately
extract such small, localized features. To address this limitation and further bolster the network’s ability to learn
Scientic Reports | (2024) 14:26702 2
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
fracture features, this paper introduces the DSConv from the dynamic snake convolution network (DSCNet).
Subsequently, a new module, termed the DSC-C2f module, is meticulously designed.
In 2023, Yaolei Qi et al.25. developed the DSCNet network, which is specically tailored for tubular structure
segmentation tasks. Within the DSCNet network, DSConv emerged as a convolutional module, oering a novel
approach to traditional convolution. As illustrated in Fig.2, DSConv demonstrates distinctive operational
characteristics. To eectively extract local features of tubular structures and enable the convolutional kernel
to focus on intricate geometric features, DSConv introduces deformation osets. By sequentially examining
each target for processing, DSConv ensures consistent attention. Additionally, the incorporation of signicant
deformation osets prevents the spreading of sensory elds too extensively, resulting in an output feature map
resembling a “snake” shape.
Figure 3 illustrates the structure of DSC-C2F. e DySnakeConv module is formed by linking two initial
DSConv layers with a convolution module (ConvM) layer. Initially, the ConvM layer increases the number of
channels in the expansion layer. Subsequently, the DySnakeConv module is applied to the feature map, followed
by the utilization of a second ConvM layer to reduce the number of channels in the output feature map to align
with the input channels. Finally, the feature obtained in the preceding stage is merged with the residual edge
for feature fusion, thus constituting the dynamic snake convolution bottleneck (DSC-Botneck) module. e
newly designed DSC-C2f module is a DSC-Botneck module that replaces all the bottleneck components of
the original C2f module in the network model. is DSC-C2f module brings together the multiscale feature
extraction capabilities of the original C2f module with DSConv’s ability to pay adaptive attention to slender and
curvilinear features.
Weighted Channel Attention mechanism
e attention mechanism plays a crucial role in capturing the aspect of focus in the whole image to further enhance
the model’s focus on the image features of the abnormal bone region and improve the model generalizability.
However, it is important to note that utilizing the attention mechanism also has the disadvantage of increased
Fig. 1. Model structure of WCAY.
Scientic Reports | (2024) 14:26702 3
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
computational eort, leading to increased computational cost. We design a new channel attention mechanism,
weighted channel attention (WCA), inspired by the CA (coordinate attention)22 module, as shown in Fig.4.
is WCA module can be viewed as a computational unit designed to improve the representation of features
learned by the network. It can take as input any intermediate feature tensor
XRC×H×W
, where
C
denotes
the number of input channels and
H
and
W
denote the spatial dimensions of the input features. To clearly
describe the proposed WCA, we rst revisit the embedding of location information into the channel attention
CA, as shown in (a) in Fig.5.
e CA decomposes the original input tensor
X
into two parallel one-dimensional feature encoding
vectors for modeling cross-channel dependencies with spatial location information. e following two formulas
represent two one-dimensional vectors each from a one-dimensional global average pooling along the horizontal
dimension so that it can be viewed as a collection of positional information along the vertical dimension. e
one-dimensional global average pooling that encodes global information along the horizontal dimension of
C
with height
H
can be expressed as Eq.(1). Similarly, the output of the pooling in
C
with width
W
can be
expressed as Eq.(2).
ZH
c(H)=
1
W
0
i
Wxc(H, i
)
(1)
ZW
c(W)=
1
H
0
j
Hxc(j, W
)
(2)
Here,
denotes the input feature in channel
c
. rough such an encoding process, CA captures the long-
distance dependencies in the horizontal dimension direction and preserves the exact position information in
the vertical dimension direction. e model uses input feature encoding to synthesize global information to
help capture spatial global features. It then generates two parallel 1D vectors for feature coding and permutes the
shape of one of the vectors before merging the two. Immediately aer, these parallel encoded vectors are shared
with the downscaled 1 × 1 convolution. Coordinate attention (CA) then decomposes the 1 × 1 convolution
output into dual parallel 1D feature encoding vectors. Each path contains a 1 × 1 convolution and a nonlinear
sigmoid function. Finally, the attentional weights of the two paths are applied to the original feature map to
produce the nal output. is approach preserves accurate spatial details while eciently exploiting long-range
dependencies through interchannel and spatial information coding.
Fig. 3. Structure of DSC-C2f.
Fig. 2. Schematic of how DSConv works. Dynamic snake convolution (DSConv) learns deformations based on
input feature maps and adaptively focuses on elongated and tortuous local features based on an understanding
of the morphology of tubular structures25.
Scientic Reports | (2024) 14:26702 4
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Although CA embeds precise positional information into channels, utilizing this spatial capture of long-
distance interactions improves the model’s concentration of fracture features40. However, the excess of long-
range temporal information causes the model to miss crucial feature details during multiscale fusion, leading
to overtting. As a result, the fracture feature localization becomes diuse and unconstrained, with the model
capturing a wide range of focal points beyond the pre-labeled bounding box in the image. To solve this problem
of concentration diusion, we designed the WCA module, whose overall structure is shown in (b) in Fig.5.
Fig. 5. Comparisons with dierent attention modules: (a) CA module; (b) WCA module.
Fig. 4. Principle of the WCA. Here, “X avg pool” represents 1D horizontal global pooling, and “Y avg pool”
indicates 1D vertical global pooling22.
Scientic Reports | (2024) 14:26702 5
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Specically, given the aggregated feature maps produced by Eq.(1) and Eq.(2). We rst concatenate them
and send them to a 3 × 3 convolutional transform function
F3×3
to obtain the following formula:
f=F
3×3
Z
H
,Z
W
(3)
where
[,]
denotes the join operation along the spatial dimension and
fRC×1×(W+H)
is the intermediate
feature map encoding spatial information in the horizontal and vertical directions. We then split
f
into two
separate tensors
fHRC×H×1
and
fWRC×1×W
along the spatial dimension. en to obtain the feature
weights for each of the two tensors in the vertical and horizontal dimensions, we feed
f
into a 1 × 1 convolutional
transform to obtain the following
w=σ(F1×1(f))
(4)
where
σ
is a sigmoid function, and similarly, we split
w
along the spatial dimensions into two separate feature
weights
wHRC×H×1
and
wWRC×1×W
. We then aggregate the dimension tensors and weights via
simple multiplication to obtain Eq.(5) and Eq.(6)
aH=fH×wH
(5)
bW=fW×wW
(6)
Finally, by multiplying the output of the two parallel routes with the original input feature map, the output Y of
our WCA module can be written as
y
c
(i, j)=x
c
(i, j)
×
σ
a
H
c
(i)
×
σ
b
W
c
(j)
(7)
In contrast to channel attention, which solely recalibrates the signicance of various channels, our WCA block
not only incorporates spatial information encoding but also amplies constraints, prioritizing spatial details. As
elucidated earlier, weighted attention is concurrently applied along both the horizontal and vertical directions
to the input tensor. Each element within these attention maps signies the presence of the object of interest in
the corresponding row and column. is encoding mechanism enables our WCA to precisely pinpoint the exact
position of an object, thereby facilitating improved recognition by the overall model.
Experiments and discussion
Dataset and image preprocessing
e FracAtlas and GRAZPEDWRI-DX datasets were used in this study. e FracAtlas dataset is composed of
4083 bone fracture images of X-rays from all major parts of the human body collected from three major hospitals
in Bangladesh, as shown in Fig.6. is dataset was manually annotated with the help of two radiologists and
an orthopedic surgeon and contains 717 images with 922 fracture instances23. e GRAZPEDWRI-DX dataset,
shown in Fig.7, was collected by a number of pediatric radiologists at the Department of Pediatric Surgery at
the University Hospital Graz. A total of 10,643 wrist site studies involving 20,327 image samples involving 6,091
unique pediatric patients were performed24. e dataset was annotated by a group of pediatric radiologists.
ere are nine dierent types of annotation objects, and each image can be associated with multiple objects3536.
In addition, the restricted image diversity observed in low-feature X-ray images poses a challenge, as models
trained solely on such data may exhibit suboptimal performance when applied to other X-ray images. To
enhance the robustness of these models, we employ data augmentation techniques aimed at improving image
quality. Specically, we implement online data augmentation on the training dataset, leveraging methods such
as mosaic and mixup. Additionally, we ne-tune image brightness and contrast to further enhance model quality
utilizing Albumentations41, an open-source Python library renowned for its image enhancement capabilities.
Ethics approval
is research does not involve human participants and/or animals. All methods complied with the guidelines
and relevant regulations.
Fig. 6. Fracatlas dataset, showing scans containing various parts of the arm, leg, waist and shoulder.
Each fracture instance has its own mask and bounding box, and the scans also have a global label for the
classication task, which is set to “fractured.
Scientic Reports | (2024) 14:26702 6
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Experimental environment
is experiment was conducted on an Ubuntu 18.04 system equipped with an Intel(R) Xeon(R) Platinum
8255C CPU and an NVIDIA GeForce RTX 3090 GPU; utilizing torch version 1.11. During training, the input
image resolution was set to 640 × 640 pixels. e model was trained for 300 epochs with a patience of 50, a batch
size of 32, and a learning rate of 0.01 utilizing the “SGD” optimizer. Each dataset was randomly divided into
three subsets—training, validation, and test sets—comprising approximately 70%, 20%, and 10% of the original
dataset, respectively.
Evaluation indicators
e key evaluation metrics of object detection algorithms include detection accuracy, model complexity, and
detection speed. We introduce the key metrics of precision, recall and mAP to evaluate the model detection
accuracy. e precision and recall are calculated via Eq.(8) and Eq.(9)
Pprecision =
TP(TP +FP)
(8)
Rrecall =
TP(TP +FN)
(9)
In the evaluation of target detection algorithms, true positives (TP) represent correctly detected positive
samples, false positives (FP) represent negative samples incorrectly identied as positive, and false negatives
(FN) represent positive samples erroneously identied as negative. A precision-recall (P-R) curve is generated
for each category during the performance assessment, depicting the accuracy against the recall42. e area under
this curve, which spans between the curve and the horizontal axis, denotes the average precision (AP) of the
category. e mAP value of the model is computed as the average of the AP values across all categories. Typically,
mAP is assessed using two metrics: mAP50, which considers predictions with at least 50% overlap with true
frames as correct, and mAP50:95, which evaluates IOU thresholds ranging from 0.5 to 0.95.
e complexity of an object detection algorithm is gauged by various factors, such as model size, parameter
count, and computational demands. Elevated values in these aspects correlate with increased model complexity.
is study assesses model complexity through evaluation metrics encompassing computational load and model
size. e computational load, which is indicative of time complexity, is quantied in oating-point operations
(FLOPs), where one GFLOPs equals one billion oating-point operations per second. Higher computational
demands signify greater computational resource requirements.
Ablation study
To demonstrate the eectiveness of WCAY, we chose YOLOv8s as the baseline network (Baseline) and added the
DSC-C2f module to the backbone network as well as the reneck network with our WCA attention mechanism.
We performed ablation studies mainly on the FracAtlas dataset, testing dierent combinations of several
improved modules.
Table1. Ablation experiment.
Comparative experiments of DSC-C2f modules
To demonstrate the eectiveness of DSC-C2f in the detection task and the eect of DSC-C2f at dierent positions
in the network on the detection performance, we uniformly conducted a series of positional substitution
comparison experiments on DSC-C2f on the FracAtlas dataset.
As seen in Fig.1 Model structure of WCAY, a C2f layer is set up in the P2, P3, P4 and P5 layers of the original
network backbone to extract features from the input image, and we replace the C2f of each layer with the DSC-
C2f module in turn. As shown in Table2, the accuracy of the model is improved to dierent degrees aer
replacing the C2f layer in the original model with DSC-C2f, which reects the excellent ability of the DSC-C2f
module to extract fracture tubular features. In addition, dierent arrangements of the same number of modules
produce dierent results. When we replace the C2f module in the P5 layer with DSC-C2f, the model detection
accuracy improves the most, by 5.7%, from 47.9% in the baseline model mAP50 to 53.6%, compared to when
we replace it in positions P2, P3, and P4. Although the number of parameters is improved, the corresponding
improvement in accuracy values is the most eective.
Fig. 7. e GRAZPEDWRI-DX dataset, which shows the wrist fracture conditions in children from this
dataset, is shown in the gure. Because there are fewer images in the metal category, we included the metal
category in the foreign body category to guarantee the convergence of the dataset. e dataset categories are
classied as “fracture, “text”, “periosteal reaction, “pronatorsign, “pronatorsign, “sotissue, “foreignbody”, "
boneanomaly”, and “bonelesion.
Scientic Reports | (2024) 14:26702 7
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Figure8 illustrates the impact of DSC-C2f on model accuracy across various locations. Over time, the
precision and recall curves consistently surpass the baseline curve. Notably, the most eective strategy, yielding
the maximum mAP, involves replacing DSC-C2f at layer P5. is approach ensures that the model maintains its
initial precision and recall levels while enhancing accuracy, thereby inuencing the mAP positively.
Figure9 shows a comparison plot of the eective receptive eld visualization for each C2f module in the
network backbone, where we introduce the eective receptive eld (erf) visualization method4344. As shown in
the gure, we compare the eective erf sizes of the original C2f modules in each layer of the network backbone
with our DSC-C2f modules, and for the replaced DSC-C2f modules, the erf size is smaller than that of the
baseline network. Generally, the smaller the receptive eld is, the more local and detailed the features tend to
be. Consequently, our DSC-C2f module excels in capturing local features of the input image, enhancing the
network’s ability to discern local patterns and structures.
Fig. 8. Comparison of the precision and recall when the DSC-C2f module is at dierent positions in the
network structure.
Model Parameters/M GFLOPs Precision Recall mAP50(%)
Baseline 11.13 28.4 79.8 41.4 47.9
+DSC-C2f*P2 11.15 29.1 70.0 46.6 50.7
+DSC-C2f*P3 11.26 29.6 72.4 45.2 49.8
+DSC-C2f*P4 11.65 29.5 65.0 44.8 49.0
+DSC-C2f*P5 12.14 28.9 62.9 48.3 53.6
Tab le 2. Comparative experiments of DSC-C2f modules at dierent locations in the network structure.
Methods DSC-C2f WCA Parameters/M GFLOPs mAP50 (%) mAP50:95(%)
Baseline 11.13 28.4 47.9 17.8
a 12.14 28.9 53.6 23.0
b 12.44 28.5 53.3 23.5
c 13.45 29.0 56.7 23.1
Tab le 1. shows the experimental results on the FracAtlas dataset. Aer the C2f module in the P5 layer in the
baseline network backbone is replaced with the DSC-C2f module, the mAP50 improves from 47.9–50%, and
the mAP50:95 improves by 4%. us, the proposed DSC-C2f method is eective for fracture feature extraction
from images. e WCA mechanism improved the mAP50 by 5.4%, and the mAP50:95 also increased from
17.8–23.5% aer it was added to the model alone. is demonstrates that the proposed WCA can capture a
wider range of global information, allowing the network to focus more on features of the skeletal disease region
in the image. When both DSC-C2f and WCA modules are added to the model, the WCAY model achieves
an improvement of 8.8% in mAP50 and an increase from 17.8 to 23% in mAP50:95. e experimental results
demonstrated that the improved module WCAY model containing these recommendations made signicant
progress on all the evaluation metrics compared to the original YOLOv8s model. is validates the ecacy of
the improved module.
Scientic Reports | (2024) 14:26702 8
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Comparative experiments of WCA modules
In this section, we conduct comparative experiments on dierent attention mechanisms embedded in network
models to further validate the eectiveness of the proposed WCA module.
We chose YOLOv8s as a benchmark model to compare the performance of the WCA module with X/Y
weights added for model performance improvement. e experimental results are shown in Table3. Since our
WCA module was designed inspired by the CA module, it can be seen that in the fact that no weights are added,
the performance of both performs almost the same. And the performance of the model improves signicantly
when weights in the horizontal (X) and vertical (Y) directions are added separately20, as well as when both
are added. e results of the visualization are shown in Fig.10, it can be seen that with the addition of the X
weights alone, the model is more sensitive to the horizontal direction, and the activation value of the heat map
Fig. 10. Comparison of heat map results for WCA with the addition of dierent directional weights. e
heatmaps were created by Grad-CAM45. It is clear that WCA, with the addition of horizontal (X) and vertical
(Y) direction weights, the model pays more attention to fracture features.
Method Parameters/M GFLOPs Precision Recall mAP50(%)
Baseline 11.13 28.4 79.8 41.4 47.9
+CA 11.15 28.4 72.4 46.6 50.6
+WCA* 12.76 28.5 74.1 46.1 50.6
+X weight 12.44 28.5 77.4 45.4 52.8
+Y weight 12.44 28.5 69.0 47.1 51.6
+WCA 12.44 28.5 72.3 47.7 53.3
Tab le 3. Comparison results of WCA modules with dierent directional weights added. Here, the WCA* table
does not incorporate the WCA modules for horizontal (X) weights and vertical (Y) weights.
Fig. 9. Comparison of eective receptive elds (erf). Visual comparison of the eective receptive eld of the
DSC-C2f module and the C2f module.
Scientic Reports | (2024) 14:26702 9
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
is signicantly higher, indicating that the region receives more attention in the x-direction. Similarly, with the
addition of Y weights, the model has higher activation values for the vertical region heat map.
Meanwhile, we select dierent attention mechanisms to compare with WCA, and further verify the
eectiveness by adding SE19, CBAM20, GAM21, and CA22. e experimental results are presented in Table4. It
can be seen that the parameter proliferation of the model aer integrating GAM and CBAM fails to satisfactorily
improve the detection accuracy. On the other hand, SE and CA achieve signicant accuracy improvement with
minimal parameter increment. However, their ecacy in capturing fracture features seems to be somewhat
limited, as shown in the heat map in Fig.11. SE occasionally fails to capture certain features, while CA, due to its
intrinsic properties, sometimes exceeds the specied concentration range. On the contrary, despite the increase
in parameters and the negligible increase in computational cost, the accuracy of WCA is signicantly improved
by 5.4% compared to the baseline network without the attention mechanism.
In addition, we also conducted comp arative experiments for dierent attention mechanisms in the benchmark
network aer adding the DSC-C2f module, as shown in Table 5. e experiments show that aer feature
extraction with the DSC-C2f module, except for the model with the addition of the WCA module, which has
a 3.1% improvement in mAP, the mAP values of the models with the addition of the other modules decreased.
In terms of the precision and recall metrics, except for the model with the addition of the GAM module, the
metrics of all the other models improved. In contrast, the WCA attentional mechanism, which outperforms
other attentional mechanisms in all metrics, is more likely to perform well in fracture detection tasks.
Comparative experiments of the WCAY algorithm
To demonstrate the eectiveness of the proposed WCAY algorithm for fracture detection in X-ray images,
we conducted a series of comparative experiments. We have selected several state-of-the-art object detection
Fig. 11. Results of our heatmap visualization of dierent attentional mechanisms on the FracAtlas and
GRAZPEDWRI-DX datasets. It is clear that our WCA can localize objects of interest more accurately than
other attention methods.
Method Parameters/M GFLOPs Precision Recall mAP50(%)
Baseline 11.13 28.4 79.8 41.4 47.9
+GAM 17.68 33.7 56.7 44.3 43.5
+CBAM 11.39 28.4 70.4 42.3 47.2
+SE 11.16 28.4 76.0 42.0 50.2
+CA 11.15 28.4 72.4 46.6 50.6
+WCA 12.44 28.5 72.3 47.7 53.3
Tab le 4. Comparison of Dierent Attention Methods on the FracAtlas Dataset.
Scientic Reports | (2024) 14:26702 10
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
methods for our experiments, including the YOLO series, the DETR46 series, and other single-stage detection
models47,48, of which we have set up dierent sizes such as nano, small, and medium for the YOLO series.
It is worth noting that in order to test the training parameters of the DETR series model during the testing
process dierently from the YOLO series, we use the ocial default parameters and pre-training weights le,
with the batch size set to 8, and the input size of (974, 800).
As seen from the results in Table6, our algorithm has a positive eect on the detection performance, with
the mAP at the highest value under each model size. For the nano size, the mAP of WCAY-n reaches 47.2%,
which is 4.9% higher than the 42.3% of YOLOv8-n, which is the highest mAP among the other models. Our
model also achieves the best results for model comparisons with parameter counts of 30M or more, and has
the highest mAP value in comparison to the DETR series, which has more than 33% higher parameter counts
and computational eort. Our model also achieves the best performance among the models with the same
single-stage detection. Surprisingly, among the small models, our WCA achieves the highest mAP value of all
models, 56.7%, which is 5.9%, 5.7%, and 7.2% higher than the models YOLOv8, RT-DETR49, and FreeAnchor48,
respectively, which have the highest mAP values among the other algorithm series.
It is essential to highlight that transitioning from small to medium in size leads to a decrease in the mAP.
is decline can be attributed to the larger model size necessitating higher-resolution input images and larger
datasets. However, given the standardized input image size of 640, medium-sized models and larger models are
susceptible to overtting on our dataset. is is crucial for the DETR family of models, which is why pre-training
weights need to be added during the training process. Consequently, it becomes evident that the small model
size is the most suitable for our detection task.
To validate the versatility of our model, we conducted comparative experiments across multiple categories
using the GRAZPEDWRI-DX dataset. e results, depicted in Figs.12 and 13, reveal WCAY’s superior mAP
Method Parameters/M GFLOPs Precision Recall mAP50(%)
YOLO Series
YOLOv5-n 2.50 7.1 66.8 33.3 38.9
YOLOv5-s 9.11 23.8 70.6 40.8 48.4
YOLOv5-m 25.05 64.0 64.0 48.3 50.5
YOLOv6-n 4.23 11.8 68.7 35.6 39.1
YOLOv6-s 16.30 44.0 61.9 36.8 40.5
YOLOv6-m 51.98 161.1 72.4 37.4 39.7
YOLOv8-n 3.01 8.1 60.7 41.4 42.3
YOLOv8-s 11.13 28.4 79.8 41.4 47.9
YOLOv8-m 25.84 78.7 67.5 45.4 50.8
DETR Series
DETR-R5046 41.56 74.5 42.2 43.0 42.2
DAB-DETR-R5050 43.70 79.7 44.2 46.4 44.2
Conditional-DETR-R5051 43.45 78.4 50.1 36.3 50.1
RT-DETR-R504952 41.94 125.6 51.1 48.3 51.1
Other single-stage models
FreeAnchor-R5048 36.33 159.0 49.5 34.5 49.5
TOOD-R5047 32.02 153.0 39.6 33.7 39.6
Ours models
WCAY-n 3.60 8.2 70.9 43.1 47.2
WCAY-s 13.45 29.0 76.2 51.7 56.7
WCAY-m 30.06 80.0 65.9 49.4 51.9
Tab le 6. Comparison of WCAY with dierent detection algorithms on the FracAtlas dataset. e detection
methods compared include the YOLO Series, the DETR Series, and other single-stage detection algorithms.
Method Parameters/M GFLOPs Precision Recall mAP50(%)
Baseline + DSC-C2f 12.14 28.9 73.3 43.7 53.6
+CBAM 12.40 28.9 67.4 47.6 52.3
+GAM 18.70 34.2 76.4 39.7 48.0
+SE 12.17 28.9 63.7 49.4 51.5
+CA 12.16 28.9 72.6 48.3 52.6
+WCA 13.45 29.0 76.2 51.7 56.7
Tab le 5. Comparison of dierent attention methods on the FracAtlas dataset with the addition of DSC-C2f.
Scientic Reports | (2024) 14:26702 11
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
across various real-time detection algorithms. However, our algorithms exhibit slightly lower accuracy in
detecting the “bone anomaly” and “so tissue” categories. Nonetheless, for categories such as “fracture,” “text,
“foreignbody,” “periostealreaction,” and “pronatorsign,” our algorithms demonstrated the highest mAP. Notably,
the “bonelesion” category consistently maintains a high AP value across dierent models, particularly in nano
and small models, providing remarkable detection results.
In conclusion, our algorithm consistently outperforms other models in terms of detection accuracy across
both datasets, despite the increased number of parameters and computational load required to maintain this
accuracy. Our experiments demonstrate the robust performance of WCAY compared to that of other object
detection networks, demonstrating its strong generalizability and eectiveness in tackling the task of X-ray
image fracture detection.
Fig. 13. Comparison of the mAP results of dierent real-time detection algorithms on the GRAZPEDWRI-
DX dataset.
Fig. 12. Comparison of the detection results of dierent real-time detection algorithms, in dierent categories,
on the GRAZPEDWRI-DX dataset.
Scientic Reports | (2024) 14:26702 12
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Qualitative results
To clearly demonstrate the ecacy of the WCAY model, in addition to performing inference detection on two
X-ray fracture detection datasets, FracAtlas and GRAZPEDWRI-DX, we also perform inference detection on
datasets with similarities to the X-ray images, NEU-DET52 and SSDD53 public datasets. e WCAY model allows
for better detection of objects in images from dierent domains and angles to detect objects in images, including
objects with random orientations and dierent scales. e detection results are visualized in Fig.14.
As seen from the gure, on the FracAtlas dataset, our model can clearly detect and localize the fracture region
in the X-ray image, and the detection results show a high condence level; on the GRAZPEDWRI-DX dataset,
our model can also detect the features of skeletal disorders in addition to the fracture features; and on the NEU-
DET and SSDD datasets, our model also perfectly detects the corresponding targets. set, our model can also
detect the corresponding targets perfectly, and the accurate localization and identication of target detection
in the displayed images prove the eectiveness of the WCAY algorithm in various types of challenging image
detection.
Conclusion
In this paper, we propose a new algorithm, WCAY, for fracture detection in dierent parts of X-ray images. To
improve the accuracy of the model in detecting fracture features, we introduce the DSConv module to improve
the C2f module and propose a new core operator, DSC-C2f. We also introduce an attention mechanism to
improve the model’s new energy. In addition, we design a new channel attention mechanism (WCA), which is
more eective at capturing long-range dependent information. e experimental results of the proposed WCAY
model on the X-ray fracture detection dataset show that it has advantages over some mainstream real-time
object detection methods. It performs well in terms of evaluation metrics, reaching the SOTA (State of the
art) level, e.g., precision, recall, and mAP. Specically, the WCAY model improves the mAP in the FracAtlas
dataset by 8.8% compared to the baseline model for small model sizes, while the mAP in the GRAZPEDWRI-
DX dataset for all categories improves by 1.1%, and for the fracture category therein the mAP reaches 93.9%,
proving its X-ray image capability in the task of excellent fracture detection in X-ray imaging.
Fig. 14. e gure shows some qualitative results of the WCAY algorithm proposed in this paper on four
datasets.
Scientic Reports | (2024) 14:26702 13
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Data availability
e datasets analyzed during the current study are available at Figshare under h t t p s : / /  g s h a r e . c o m / a r t i c l e s / d a
t a s e t /  e _ d a t a s e t / 2 2 3 6 3 0 1 2 ( F r a c A t l a s ) and h t t p s : / /  g s h a r e . c o m / a r t i c l e s / d a t a s e t / G R A Z P E D W R I - D X / 1 4 8 2 5 1 9
3 ( G R A Z P E D W R I - D X ) . Both datasets are licensed under the Creative Commons Attribution 4.0 International
(CC BY 4.0) (https://creativecommons.org/licenses/by/4.0/). e implementation code and trained models for
this study can be found on GitHub at https://github.com/cccp421/Fracture-Detection-WCAY. I n c l u d i n g the da-
tasets used in this experiment, the provenance can be found at this URL.
Received: 17 April 2024; Accepted: 25 October 2024
References
1. Forriol, F. & Mazzola, A. Bone fractures: Generalities. Textbook Musculoskeletal D i s o r d e r s . h t t p s : / / d o i . o r g / 1 0 . 1 0 0 7 / 9 7 8 - 3 - 0 3 1 - 2 0 9 8
7 - 1 _ 2 8 (2023).
2. Venneri, F. et al. Safe surgery saves lives. Textbook of Patient Safety and Clinical Risk M a n a g e m e n t . h t t p s : / / d o i . o r g / 1 0 . 1 0 0 7 / 9 7 8 - 3 - 0
3 0 - 5 9 4 0 3 - 9 _ 1 4 (2021).
3. Lisacek-Kiosoglous, A. B. et al. Articial intelligence in orthopedic surgery: exploring its applications, limitations, and future
direction. Bone Joint Res. 12, 447–454. https://doi. org/10.1302/ 2046-3758.12 7.BJR-2023- 0111.R1 (2023).
4. Xu, F. et al. Deep learning-based articial intelligence model for classication of vertebral compression fractures: A multicenter
diagnostic study. Front. Endocrinol.https://doi.org/10.3389/fendo.2023.1025749 (2023).
5. Ju, R. Y. & Cai, W. Fracture detection in pediatric wrist trauma X-ray images using YOLOv8 algorithm. Sci R e p . h t t p s : / / d o i . o r g / 1 0 .
1 0 3 8 / s 4 1 5 9 8 - 0 2 3 - 4 7 4 6 0 - 7 (2023).
6. ian, Y. L. et al. Convolutional neural networks for automated fracture detection and localization on wrist radiographs. Radiology:
Articial Intelligence. https://doi.org/10.1148/ryai.2019180001 (2019).
7. Zhao, Z. Q., Zheng, P., Xu, S. T. & Wu, X. D. Object detection with deep learning: A review. IEEE Transactions on Neural Networks
and Learning Systems. 30, 3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865 (2019).
8. L. Jiao. et al. A survey of deep learning-based object detection. IEEE Access. 7, 128837–128868. h t t p s : / / d o i . o r g / 1 0 . 1 1 0 9 / A C C E S S . 2
0 1 9 . 2 9 3 9 2 0 1 (2019).
9. Arkin, E., Yadikar, N., Muhtar, Y., Ubul, K. A survey of object detection based on CNN and transformer. in IEEE International
Conference on Pattern Recognition and Machine Learning (PRML) 99–108. https://doi.org/10.1109/PRML52754.2021.9520732
(2021).
10. Girshick, R., Donahue, J., Darrell, T. & Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation.
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Preprint at https://arxiv.org/abs/1311.2524 (2014).
11. Girshick, R. Fast r-cnn. in IEEE International Conference on Computer Vision (ICCV) 1440–1448. Preprint at h t t p s : / / a r x i v . o r g / a b s
/ 1 5 0 4 . 0 8 0 8 3 (2015).
12. Ren S, He K, Girshick R, Sun J. Faster r-cnn: Toward real-time object detection with region proposal networks. Adv. Neural Inf.
Process. Syst. Preprint at https://arxiv.org/abs/1506.01497 (2015).
13. Hou, L., Lu, K. & Xue, J. Rened one-stage oriented object detection method for remote sensing images. IEEE Transactions on
Image Processing. 31, 1545–1558. https://doi.org/10.1609/aaai.v33i01.33018577 (2022).
14. Redmon et al. You Only Look Once: Unied, Real-Time Object Detection. Preprint at https://arxiv.org/abs/1506.02640 (2016).
15. Tsung, Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal Loss for Dense Object Detection. Preprint at h t t p s : / / a r x i v . o r g / a b s / 1 7 0 8
. 0 2 0 0 2 (2018).
16. Niu, Z., Zhong, G. & Yu, H. A review on the attention mechanism of deep learning. Neurocomputing. 452, 48–62. h t t p s : / / d o i . o r g /
1 0 . 1 0 1 6 / j . n e u c o m . 2 0 2 1 . 0 3 . 0 9 1 (2021).
17. Galassi, A., Lippi, M. & Torroni, P. Attention in natural language processing. IEEE Transactions on Neural Networks and Learning
Systems. 32, 4291–4308. https://doi.org/10.1109/TNNLS.2020.3019893 (2021).
18. Wan, D. H. et al. Mixed local channel attention for object detection. Eng. Appl. Artif. Intell. 123. h t t p s : / / d o i . o r g / 1 0 . 1 0 1 6 / j . e n g a p p a
i . 2 0 2 3 . 1 0 6 4 4 2 (2023).
19. Jie, H, Li, S, Gang, S. Squeeze-and-excitation networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
7132–7141, Preprint at https://arxiv.org/abs/1709.01507v4 (2019).
20. Woo, S. et al. Cbam: Convolutional block attention module. in European Conference on Computer Vision (ECCV) 3–19, Preprint at
http://arxiv.org/abs/1807.06521 (2018).
21. Liu, Y., Shao, Z., Homann, N. Global attention mechanism: Retain information to enhance channel-spatial interactions. Preprint
at https://arxiv.org/abs/2112.05561v1 (2021).
22. Hou Q, Zhou D, Feng J. C oordinate attention for ecient mobile network design. in IEEE/CVF Conference on Computer Vision and
Pattern Recognition. 13713–13722, Preprint at https://arxiv.org/abs/2103.02907v1 (2021).
23. Abedeen et al. FracAtlas: A dataset for fracture classication, localization and segmentation of musculoskeletal radiographs. Sci.
Data. 10, 521. https://doi.org/10.1038/s41597-023-02432-4 (2023).
24. Nagy, E. et al. A pediatric wrist trauma X-ray dataset (GRAZPEDWRI-DX) for machine learning. Sci Data.Bold">9, 222. h t t p s : / / d
o i . o r g / 1 0 . 1 0 3 8 / s 4 1 5 9 7 - 0 2 2 - 0 1 3 2 8 - z (2022).
25. Qi, Y., He, Y., Qi, X., Zhang, Y., Yang, G. Dynamic snake convolution based on topological geometric constraints for tubular
structure segmentation. in IEEE/CVF International Conference on Computer Vision (ICCV) 6047–6056. h t t p s : / / d o i . o r g / 1 0 . 1 1 0 9 / I C
C V 5 1 0 7 0 . 2 0 2 3 . 0 0 5 5 8 (2023).
26. Guan, B., Zhang, G., Yao, J., Wang, X., Wang, M. Arm fracture detection in X-rays based on improved deep convolutional neural
network. Comput. Electr. Eng. 81. https://doi.org/10.1016/j.compeleceng.2019.106530 (2020).
27. Rajpurkar, P. et al. Mura dataset: Toward radiologist-level abnormality detection in musculoskeletal radiographs. Preprint at h t t p s : / /
a r x i v . o r g / a b s / 1 7 1 2 . 0 6 9 5 7 v 4 (2017).
28. Yahalomi, E., Chernofsky, M. & Werman, M. Detection of distal radius fractures trained by a small set of X-ray images and faster
R-CNN. Intell. Syst. Comput. 997. https://doi.org/10.1007/978-3-030-22871-2_69 (2019).
29. Wang, M. et al. ParallelNet: Multiple backbone network for detection tasks on thigh bone fracture. Multimedia Systems. 27, 1091–
1100. https://doi.org/10.1007/s00530-021-00783-9 (2021).
30. Krogue, J. D. et al. Automatic hip fracture identication and functional subclassication with deep learning. Radiol. Artif. Intell. 2.
https://doi.org/10.1148/ryai.2020190023 (2020).
31. Li, Y.-C. et al. Can a deep-learning model for the automated detection of vertebral fractures approach the performance level of
human subspecialists? Clinical Orthoped and Related Research. 479, 1598–1612. https://doi.org/10.1097/CORR.0000000000001685
(2021).
32. Yuan, G., Liu, G., Wu, X., Jiang, R. An improved YOLOv5 for skull f racture detec tion. E xploration of novel intelligent optimization
algorithms. Communications in Computer and Information Science 1590. https://doi.org/10.1007/978-981-19-4109-2_17 (2022).
Scientic Reports | (2024) 14:26702 14
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
33. Warin, K. et al. Maxillofacial fracture detection and classication in computed tomography images using convolutional neural
network-based models. Sci. Rep. 13, 3434. https://doi.org/10.1038/s41598-023-30640-w (2023).
34. Fatima, J. et al. Ver tebrae loc alization and spine segmentation on radiographic images for featureb as ed curvature classication for
scoliosis. Concurrency and Computation: Practice and Experience. 34. https://doi.org/10.1002/cpe.7300 (2022).
35. Dibo, R. et al. DeepLOC: Deep learning-based bone pathology localization and classication in wrist X-ray images. Analysis of
Images, Social Networks and Texts. 14486. https://doi.org/10.1007/978-3-031-54534-4_14 (2024).
36. Ju, R. Y. & Cai, W. Fracture detection in pediatric wrist trauma X-ray images using YOLOv8 algorithm. Sci. Rep. 13, 20077. h t t p s : /
/ d o i . o r g / 1 0 . 1 0 3 8 / s 4 1 5 9 8 - 0 2 3 - 4 7 4 6 0 - 7 (2023).
37. Tanzi, L., Vezzetti, E., Moreno, R. & Moos, S. X-ray bone fracture classication using deep learning: A baseline for designing a
reliable approach. Applied Sciences. 10, 1507. https://doi.org/10.3390/app10041507 (2020).
38. Jocher, G. et al. Ultralytics YOLO. GitHub https://github.com/ultralytics/ultralytics (2023).
39. Jocher, G. et al. YOLOv5 by Ultralytics. GitHub. https://doi.org/10.5281/zenodo.3908559 (2020).
40. Ouyang, D. et al. Ecient multi-scale attention module with cross-spatial learning. IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) 1–5. https://doi.org/10.1109/ICASSP49357.2023.10096516 (2023).
41. Buslaev, A. et al. Albumentations: Fast and exible image augmentations. Information. 11, 125. h t t p s : / / d o i . o r g / 1 0 . 3 3 9 0 / i n f o 1 1 0 2 0
1 2 5 (2020).
42. Boyd, K., Eng, K. H., Page, C. D. Area under the precision-recall curve: Point estimates and condence intervals. Machine learning
and knowledge discovery in databases. Lecture Notes in Computer Science. 8190. https://doi.org/10.1007/978-3-642-40994-3_29
(2013).
43. Luo, W. et al. Understanding the eective receptive eld in deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 29,
https: //proceedin gs.neur ip s.cc/ pape r/20 16 /hash/c8067ad1937f728f51288b3e b986afaa -Abstract.html (2016).
44. Shi, D. TransNeXt: Robust Foveal Visual Perception for Vision Transformers. Preprint at https://arxiv.org/abs/2311.17132 (2023).
45. Selvaraju et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. in Proceedings of the IEEE
International Conference on Computer Vision 618–626, Preprint at https://arxiv.org/abs/1610.02391v4 (2017).
46. Carion, N., Massa, F., Synnaeve, G. et al. End-to-end object detection with transformers. Computer Vision—ECCV 2020 (ECCV
2020). vol 12346. https://doi.org/10.1007/978-3-030-58452-8_13 (2020).
47. Feng, C., Zhong, Y., Gao, Y. et al. Tood: Task-aligned one-stage object detection. in International Conference on Computer Vision
(ICCV). IEEE Computer Society 3490–3499. https://doi.org/10.1109/ICCV48922.2021.00349 (2021).
48. Zhang, X., Wan, F., Liu, C. et al. Freeanchor: Learning to match anchors for visual obj ect detection. Advances in Neural Information
Processing Systems. 32. https://doi.org/10.48550/arXiv.1909.02466 (2019).
49. Zhao, Y., Lv, W., Xu, S. et al. Detrs beat yolos on real-time object detection. in Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR2024) 16965–16974. https://doi.org/10.48550/arXiv.2304.08069 (2024).
50. Liu, S., Li, F., Zhang, H. et al. Dab-detr: Dynamic anchor boxes are better queries for detr. Preprint at. h t t p s : / / d o i . o r g / 1 0 . 4 8 5 5 0 / a r X i
v . 2 2 0 1 . 1 2 3 2 9 (2022).
51. Meng, D., Chen, X., Fan, Z. et al. Conditional detr for fast training convergence. in Proceedings of the IEEE/CVF International
Conference on Computer Vision (ICCV2021) 3651–3660. https://doi.org/10.48550/arXiv.2108.06152 (2021).
52. Zhao, W. D. et al. A new steel defect detection algorithm based on deep learning. Computational Intelligence and Neuroscience
1–13. https://doi.org/10.1155/2021/5592878 (2021).
53. Wang, Y. Y. et al. A SAR dataset of ship detection for deep learning under complex backgrounds. Remote Sensing. 11, 765. h t t p s : / /
d o i . o r g / 1 0 . 3 3 9 0 / r s 1 1 0 7 0 7 6 5 (2019).
54. Li, C. Y. et al. YOLOv6 by Meituan. GitHub https://github.com/meituan/YOLOv6 (2022).
Author contributions
P.C. is mainly responsible for writing the manuscript and conducting experiments throughout the entire re-
search. S.L. is responsible for the overall direction and supervision of the paper. W.L. and F.L. are responsible for
the overall layout of the paper and embellishment. B.D. is responsible for project management and coordination
to ensure that the project schedule meets expectations. All authors reviewed the manuscript.
Declarations
Competing interests
e authors declare no competing interests.
Additional information
Supplementary Information e online version contains supplementary material available at h t t p s : / / d o i . o r g / 1
0 . 1 0 3 8 / s 4 1 5 9 8 - 0 2 4 - 7 7 8 7 8 - 6 .
Correspondence and requests for materials should be addressed to S.L.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access is article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives
4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in
any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide
a link to the Creative Commons licence, and indicate if you modied the licensed material. You do not have
permission under this licence to share adapted material derived from this article or parts of it. e images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence
and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to
obtain permission directly from the copyright holder. To view a copy of this licence, visit h t t p : / / c r e a t i v e c o m m o
n s . o r g / l i c e n s e s / b y - n c - n d / 4 . 0 / .
© e Author(s) 2024
Scientic Reports | (2024) 14:26702 15
| https://doi.org/10.1038/s41598-024-77878-6
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Legumes are primarily grown agriculturally for human consumption, livestock forage, silage, and as green manure. However, production has declined primarily due to fungal pathogens. Among them, this study focused on Fusarium spp. that cause Fusarium wilt in minor legumes in Korea. Diseased legume plants were collected from 2020 to 2021, and diverse fungal genera were isolated from the internal tissues of the plant roots and stems. Fusarium spp. were the most dominant, accounting for 71% of the isolates. They were identified via morphological characteristics and molecular identification. In the pathogenicity test, Fusarium oxysporum and Fusarium fujikuroi generally exhibited high virulence. The host range investigation revealed that the NC20-738, NC20-739, and NC21-950 isolates infected all nine crops, demonstrating the widest host range. In previous studies, the focus was solely on Fusarium wilt disease in soybeans. Therefore, in this study, we aimed to investigate Fusarium wilt occurred in minor legumes, which are consumed as extensively as soybeans, due to the scarcity of data on the diversity and characteristics of Fusarium spp. existing in Korea. The diverse information obtained in this study will serve as a foundation for implementing effective management strategies against Fusarium-induced plant diseases.
Article
Full-text available
Hospital emergency departments frequently receive lots of bone fracture cases, with pediatric wrist trauma fracture accounting for the majority of them. Before pediatric surgeons perform surgery, they need to ask patients how the fracture occurred and analyze the fracture situation by interpreting X-ray images. The interpretation of X-ray images often requires a combination of techniques from radiologists and surgeons, which requires time-consuming specialized training. With the rise of deep learning in the field of computer vision, network models applying for fracture detection has become an important research topic. In this paper, we use data augmentation to improve the model performance of YOLOv8 algorithm (the latest version of You Only Look Once) on a pediatric wrist trauma X-ray dataset (GRAZPEDWRI-DX), which is a public dataset. The experimental results show that our model has reached the state-of-the-art (SOTA) mean average precision (mAP 50). Specifically, mAP 50 of our model is 0.638, which is significantly higher than the 0.634 and 0.636 of the improved YOLOv7 and original YOLOv8 models. To enable surgeons to use our model for fracture detection on pediatric wrist trauma X-ray images, we have designed the application “Fracture Detection Using YOLOv8 App” to assist surgeons in diagnosing fractures, reducing the probability of error analysis, and providing more useful information for surgery.
Article
Full-text available
Digital radiography is one of the most common and cost-effective standards for the diagnosis of bone fractures. For such diagnoses expert intervention is required which is time-consuming and demands rigorous training. With the recent growth of computer vision algorithms, there is a surge of interest in computer-aided diagnosis. The development of algorithms demands large datasets with proper annotations. Existing X-Ray datasets are either small or lack proper annotation, which hinders the development of machine-learning algorithms and evaluation of the relative performance of algorithms for classification, localization, and segmentation. We present FracAtlas, a new dataset of X-Ray scans curated from the images collected from 3 major hospitals in Bangladesh. Our dataset includes 4,083 images that have been manually annotated for bone fracture classification, localization, and segmentation with the help of 2 expert radiologists and an orthopedist using the open-source labeling platform, makesense.ai. There are 717 images with 922 instances of fractures. Each of the fracture instances has its own mask and bounding box, whereas the scans also have global labels for classification tasks. We believe the dataset will be a valuable resource for researchers interested in developing and evaluating machine learning algorithms for bone fracture diagnosis.
Article
Full-text available
The use of artificial intelligence (AI) is rapidly growing across many domains, of which the medical field is no exception. AI is an umbrella term defining the practical application of algorithms to generate useful output, without the need of human cognition. Owing to the expanding volume of patient information collected, known as ‘big data’, AI is showing promise as a useful tool in healthcare research and across all aspects of patient care pathways. Practical applications in orthopaedic surgery include: diagnostics, such as fracture recognition and tumour detection; predictive models of clinical and patient-reported outcome measures, such as calculating mortality rates and length of hospital stay; and real-time rehabilitation monitoring and surgical training. However, clinicians should remain cognizant of AI’s limitations, as the development of robust reporting and validation frameworks is of paramount importance to prevent avoidable errors and biases. The aim of this review article is to provide a comprehensive understanding of AI and its subfields, as well as to delineate its existing clinical applications in trauma and orthopaedic surgery. Furthermore, this narrative review expands upon the limitations of AI and future direction. Cite this article: Bone Joint Res 2023;12(7):447–454.
Chapter
In recent years, computer-aided diagnosis systems have shown great potential in assisting radiologists with accurate and efficient medical image analysis. This paper presents a novel approach for bone pathology localization and classification in wrist X-ray images using a combination of YOLO (You Only Look Once) and the Shifted Window Transformer (Swin) with a newly proposed block. The proposed methodology addresses two critical challenges in wrist X-ray analysis: accurate localization of bone pathologies and precise classification of abnormalities. The YOLO framework is employed to detect and localize bone pathologies, leveraging its real-time object detection capabilities. Additionally, the Swin, a transformer-based module, is utilized to extract contextual information from the localized regions of interest (ROIs) for accurate classification.
Chapter
A bone fracture is defined as a medical condition in which a partial or complete interruption of the bone integrity occurs due to mechanical trauma. The bone tissue repairs without scarring. The repair starts with the formation of an elastic soft tissue called bone callus, which is then replaced by a new hard bone. The fracture callus aims to prevent mobility in the fracture site. Fractures occurring during growth have specific features since the bone tissue is elastic and the growth plate is open. The standard classifications of fractures are based on the anatomical features of the lesion and depend on each bone. However, comprehensive classification systems have been proposed in order to improve communication between surgeons. In some cases, fractures may determine specific complications such as consolidation delay (or pseudoarthrosis) and infections.