PreprintPDF Available

MamT4^4: Multi-view Attention Networks for Mammography Cancer Classification

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

In this study, we introduce a novel method, called MamT4^4, which is used for simultaneous analysis of four mammography images. A decision is made based on one image of a breast, with attention also devoted to three additional images: another view of the same breast and two images of the other breast. This approach enables the algorithm to closely replicate the practice of a radiologist who reviews the entire set of mammograms for a patient. Furthermore, this paper emphasizes the preprocessing of images, specifically proposing a cropping model (U-Net based on ResNet-34) to help the method remove image artifacts and focus on the breast region. To the best of our knowledge, this study is the first to achieve a ROC-AUC of 84.0 ±\pm 1.7 and an F1 score of 56.0 ±\pm 1.3 on an independent test dataset of Vietnam digital mammography (VinDr-Mammo), which is preprocessed with the cropping model.
MamT4: Multi-view Attention Networks for
Mammography Cancer Classification
Alisher Ibragimov1, Sofya Senotrusova1, Arsenii Litvinov1, Egor Ushakov1, Evgeny Karpulevich1, and Yury Markin1
1Information Systems Department, ISP RAS, Russia
{ibragimov,senotrusova,filashkov,ushakov,karpulevich,ustas}@ispras.ru
Abstract—In this study, we introduce a novel method, called
MamT4, which is used for simultaneous analysis of four mam-
mography images. A decision is made based on one image of a
breast, with attention also devoted to three additional images:
another view of the same breast and two images of the other
breast. This approach enables the algorithm to closely replicate
the practice of a radiologist who reviews the entire set of
mammograms for a patient. Furthermore, this paper emphasizes
the preprocessing of images, specifically proposing a cropping
model (U-Net based on ResNet-34) to help the method remove
image artifacts and focus on the breast region. To the best of
our knowledge, this study is the first to achieve a ROC-AUC
of 84.0 ± 1.7 and an F1 score of 56.0 ± 1.3 on an independent
test dataset of Vietnam digital mammography (VinDr-Mammo),
which is preprocessed with the cropping model.
Index Terms—Breast cancer, Computer-aided diagnosis, Deep
learning, Multi-view mammogram
I. INTRODUCTION
Breast cancer is a leading cause of cancer-related deaths
among women [1]. Regular screening is essential for early
detection, with mammography being the primary screening
tool [2]. Mammography utilizes low-dose X-rays to detect
tissue changes in the breast, making it effective in detecting
malignancies like microcalcifications and clusters of calci-
fications [3]. Radiologists interpret mammograms based on
standard terminology and the BI-RADS classification system,
facilitating standardized reporting and risk assessment [4].
Although mammography is effective, it can result in false
positives or negatives, requiring additional tests like biop-
sies [5]. To improve the efficiency of early screening, au-
tomated approaches in mammography, such as computer-
aided diagnosis (CAD) systems, as well as solutions using
machine learning and deep learning (DL) technologies, are
being actively developed, assisting radiologists in interpreting
mammogram [6].
Deep Learning has emerged as a highly effective method
for image classification [7]. Furthermore, DL has become one
of the popular methods for detection of cancer pathologies,
particularly, on mammograms [8].
A key aspect of mammographic examinations is the acquisi-
tion of images in different projections for each breast, requir-
ing four images in total two for each breast (MLO and CC).
Radiologists analyze the symmetry of lesions [9]. This unique
feature affects the training of DL models and the use of multi-
view, a novel approach based on learning four projections at
CNN
Classier
{cancer,
normal}
Fig. 1. An overview of the training process of the CNN and classification layer
applied to a binary classification problem using a single view. Subsequently,
the trained CNN block is employed to derive a feature vector (zi) from the
mammography image (xi).
once, presented in this study. Given the widespread signif-
icance of breast cancer diagnosis, this research on utilizing
deep learning methods offers a valuable chance to enhance
the automation of breast pathology detection and streamline
the tasks of radiologists.
To sum up, the contributions of our paper are:
1) We propose MamT4: a novel classification framework
based on Transformer Encoder that utilizes feature vec-
tor representations from four views of mammography
studies and outperforms single-view methods in classi-
fying cancer status.
2) To the best of our knowledge, this paper is the first to
achieve ROC-AUC of 84.0±1.7and F1 of 56.0±1.3
on the VinDr-Mammo dataset (test subset).
3) As a preprocessing, we propose the following method:
cropping the breast along its border using U-Net to
increase the quality of classification.
II. BACKGROU ND A ND PRELIMINARIES
A. U-Net
U-Net [10], introduced in 2015, is an encoder-decoder
network tailored for semantic segmentation, which excels
in medical image segmentation. The architecture efficiently
maps low-resolution encoder features to high-resolution inputs
through a decoder that uses pooling indices from the encoder
for precise pixelwise classification. This setup enables U-Net
to accurately delineate detailed features in medical images,
crucial for identifying and segmenting various anatomical
structures and abnormalities [11]. Its ability to handle small
datasets effectively and its adaptability to various medical
imaging modalities have made U-Net a standard choice in
arXiv:2411.01669v1 [eess.IV] 3 Nov 2024
medical image analysis, enhancing diagnostic accuracy and
aiding in clinical decision-making.
B. Transformer Encoder
Our approach draws inspiration from the ViT (Vision Trans-
former) framework [12]. Transformer Encoder (TE) block
consists of layer norm, multi-head self-attention (MSA) and
multi-layer perceptron (MLP). Also, as shown in Figure 2
the TE block accepts combined embeddings as input. For all
subsequent blocks, the inputs are the outputs from the previous
block of the TE. There is a total of Lsuch TE blocks. Inside
the TE, the inputs are first passed through a layer norm, and
then fed to MSA layer with Nheads. Once we get the outputs
from the MSA layer, these are added to the inputs (with skip
connection) to get the outputs that again get passed to layer
norm before being fed to the MLP block. The MLP consists
of two linear layers and a GELU non-linearity. The outputs
from the MLP block are again added to the inputs to get the
final output from one TE block.
C. Loss Function
We use the focal loss (FL) function was selected to achieve a
greater stability when training on both frequent (normal cases)
and rare (cancer cases) images [13]. For the case of a binary
classification, the focal loss can be written in the following
form [14]:
FL (pt) = αt(1 pt)γlog (pt),(1)
where γ0is a tunable focusing parameter, and αtis a
weighting factor for different classes to balance the importance
for positive and negative examples. Namely, α1= 1 Nc
Nfor
class {cancer}and α0= 1 α1for class {normal}, where
Ncis the number of images marked as {cancer}and Nis the
total number of images in dataset. For notational convenience,
ptwas defined as:
pt=(pif y= cancer
1potherwise. (2)
In the above, yspecifies the ground-truth class and p[0,1]
is the model’s estimated probability for the class with label
y= cancer. This approach improved the performance of the
small number of cancer images due to the modulating factor
(1 pt)γ.
III. METHODOLOGY
In this paper, we propose a novel method to achieve a
better quality to classify mammography images. To achieve
such a goal, the two main components of our approach are
presented in Section II: image preprocessing using U-Net and
Transformer Encoder to implement MamT4. An overview of
our proposed method is shown in Figure 2.
A. Crop mammogram
The first step in image preprocessing is selecting the region
of interest (ROI) through cropping. This process involves
segmentation of large images to focus on specific areas that
are of interest for further analysis, thereby making the task
easier for subsequent neural network predictions. Cropping
is useful as it helps concentrate the model’s attention on
relevant features without the distraction of background ar-
tifacts, which can be especially beneficial in datasets with
limited examples [15]. By selecting these regions and resizing
them uniformly, models can more consistently learn from and
recognize similar patterns in new data.
B. Multi-view Analysis
Inspired by previous works on an utilization of several
images (Section V) to predict labels in classification tasks, we
present a method that includes two training stages to construct
a classification model, considering feature vectors from all
images in mammography exams.
First, we train a CNN with replaced last layer by a clas-
sification layer with only one neuron to solve the binary
classification problem ({cancer, normal}), as shown in Fig-
ure 1. The CNN weights are initialized using a pretrained on
ImageNet [16] model. The trained model serves as a mapping
of the image xiinto the feature vector f(xi) = zi. Second, the
CNN block from previous stage are taken to build a four-view
mammograms model classifier based on Transformer Encoder
(MamT4, Figure 2). The CNN extract feature vectors (z0
i,z1
i,
z2
i,z3
i) from both breasts (left and right) and both projections
(MLO and CC). During a train in this step weights of CNN
block are not trainable. MamT4predicts labels for the x0
i
image, x1
i,x2
i,x3
iimages survey as a additional information
(or as a metadata).
Each feature vector zifrom EfficientNet-B3 (motivation to
utilize EfficientNet-B3 is shown in Section IV) has length
1536. We divide each vector into 8 tokens (the number can
be considered as a hyperparameter) with a size of 192. Four
vectors yield 32 tokens. Learnable position embeddings are
added to the token to retain positional information. Similar to
BERT’s [17] and ViT’s we add a learnable [class] token,
whose state at the output of the TE fed to the MLP head (which
is nothing but a linear layer 192 ×1) to get class predictions.
IV. EXP ER IM EN TS
We evaluate our proposed MamT4framework on cancer
classification tasks. Experiments are conducted on the VinDr-
Mammo dataset [18], which was released quite recently and
contain 5,000 mammography exams (four images per patient,
20,000 digital mammograms in total). The annotated exams
are split into a training set of 4,000 exams and a test set
of 1,000 exams. The dataset has BI-RADS classification, the
images are classified similarly to the proposed solution [19],
specifically: categories 1, 2 “normal”, 4, 5 “cancer”,
category 3 is not included. Thus, images have two labels
{cancer, normal}.
Embedded
Patches
Norm Norm
Multi-Head
Attention MLP
+
+
CNN
{cancer,
normal}
Transformer
encoder MLP
Head
1192
8
16
24
32
2
class token
patch
embeddings positional
embeddings
...
+
...
... ...
Transformer encoder
Fig. 2. A summary of the MamT4framework that is used for cancer classification on x0
iimages. Here, x0
irepresents the primary view, while x1
iis the
corresponding ipsilateral view to the primary one. Similarly, x2
idepicts the corresponding bilateral view to the main view, whereas x3
iillustrates the ipsilateral
view of x2
i. The CNN block, which gives the features vectors (z0
i,z1
i,z2
i,z3
i) with fixed length 1536 per each one, are untrainable during this stage. Each
vector is divided into fixed-size patches, each of which is linearly embedded. After adding position embeddings, the resulting sequence of vectors is fed to
a Transformer Encoder. In order to perform classification, we use the standard approach of adding an extra learnable [class] token to the sequence. The
illustration of the Transformer Encoder was inspired by Dosovitskiy et al. [12]
Evaluation. We use ROC-AUC (the “gold standard” for
binary classification with neural networks [20]) and F1 score
(the harmonic mean of the Precision and Recall) to measure
classification performance. Although we use 5-fold cross-
validation to choose the EfficientNet-B3 model, it is not very
convenient in the proposed training method which consists
of two stages. Due to the following: for the second stage of
training MamT4, we would have to keep the split information
of the dataset from the first stage of training CNN block. Thus,
the results in the Table III, IV are obtained by training on
various 5 seeds.
A. Experimental Setup
Implementation Details. Our models are trained with the
PyTorch framework. In the training process, we set the initial
learning rate to 105, and start to attenuate the learning
rate when a F1 score on the test set stops improving within
5 epochs. If not specified otherwise, we use FL with the
following parameters: α0= 0.05,α1= 0.95 and γ= 2.0.
In the proposed approach, we first randomly apply the crop-
ping method to mammogram. Then the image is resized to
512 ×512 ×3. We train the model for 200 epochs with the
option to stop early if the F1 score metric does not improve
within 10 epochs on the test set. We train the MamT4model
with L=N= 12 number of TE blocks and heads of
MSA. The optimal number of L,Ncould be considered as
hyperparameters in future studies.
Baseline. Our framework is based on the EfficientNet-
B3 [21] model, pretrained on ImageNet [16], that is used as an
encoder. That model is chosen due to its superior performance
on ImageNet, publicly available pre-trained weights, and its
optimal balance between high performance and a manageable
number of parameters. We performed comparisons with other
models on the VinDr-Mammo dataset. Prior to the main
TABLE I
ENC ODE R PE RFO RMA NC E COM PARI SO N BAS ED ON ROC-AUC METRICS
ON VI NDR-M AM MO DATASE T.
Encoder ROC-AUC
EfficientNet-B3 79.2±0.8
Swin (Tiny) 58.5±3.2
Swin-V2 (Tiny) 69.5±2.3
SegFormer-B1 61.6±2.5
ResNet-18 78.1±1.6
MobileNet-V3 (Large) 76.6±1.8
TABLE II
DIS TRI BUT IO N OF DATASE TS S PLI T IN TO TR AIN IN G AND T ES TIN G
SU BSE TS ,INDICATING PERCENTAGE CONTRIBUTIONS TO THE CROPPING
DATASE T.
Split
Dataset Train (80%) Test (20%) Total
CBIS-DDSM [27] 46 16 62 (7.8%)
INbreast [28] 49 14 63 (7.9%)
KAU-BCMD [29] 51 6 57 (7.1%)
MIAS [30] 39 13 52 (6.5%)
CMMD [31] 55 11 66 (8.2%)
VinDr-Mammo [18] 400 100 500 (62.5%)
experiment, EfficientNet-B3 demonstrates the highest perfor-
mance, achieving a ROC-AUC score of 79.2±0.8. Models
including Swin (Tiny) [22], Swin-V2 (Tiny) [23], SegFormer-
B1 [24], ResNet-18 [25], and MobileNet-V3 (Large) [26] were
evaluated. Detailed results for these models, averaged over five
independent runs across different data splits, are provided in
Table I.
Preprocessing. We tried different methods for obtaining
a breast mask: a classic method based on color selection
and a neural network method. The color thresholding method
involves two stages. In the first stage, we select a color
threshold for the images, setting it at one-quarter of the mean,
and then apply the threshold to create a binary mask. After
this, we select the largest region by area. This method is
computationally effective and simple; however, it has lim-
itations because color values do not usually represent the
breast accurately, especially due to the presence of extraneous
artifacts such as labels on mammography images. It is also
challenging to separate the breast, which is the ROI, from other
body parts that may accidentally be present in the image.
The second method is based on neural network predictions
using U-Net with a ResNet-34 encoder, pretrained on Ima-
geNet. We use our dataset, where non-professional annotators,
under consultation with mammalogists, labeled the borders of
the breast without considering projection and laterality. This
dataset is compiled using images from six public datasets,
including VinDr-Mammo. The proportions of the cropping
dataset are shown in Table II. We develope a universal model
that works for each type of projection. We teste this method
and observed a quality improvement compared to the color
thresholding method. Our fine-tuned U-Net model shows a
performance of 98.6% mean IoU on the test subset.
After obtaining the mask prediction, we performed a cen-
tered crop of the breast and then fed the cropped image into
a neural network to predict label.
Augmentations. We apply augmentations, such as random
shuffling, blurring, Gaussian noise, horizontal flips, hue sat-
uration value shifts, sharpening, grid dropouts, grid distor-
tions, coarse dropouts, pixel dropouts. Cropping is applied
as follows: Each image in the training set is cropped with
a probability of 0.5 (with crop aug.) or with a probability
of 1 (with crop all), all images in the test set are cropped.
In the four-view model, we randomly replace x1
i,x2
i,x3
iwith
black images (we call it EmptyImage augmentation) to indicate
model that the dataset may not always contain four images for
one patient, with the probability of passing a black image, p,
set to 0.2 during a train and set to 0 during a test.
B. Results and Discussion
For testing, we compare a model trained on the original
dataset without cropping, which tests ordinary non-cropped
images, against a model trained with crop augmentation and
evaluated on the same test set but preprocessed with U-Net
for cropping. The results, presented in Table III, indicate
that cropping improves the quality of predictions. We also
investigate Grad-CAM++ [32] visualizations of the neural
networks’ predictions (Figure 3). Cropping creates a simpler
dataset for the deep learning model, containing only relevant
information. We also believe that cropping enhances prediction
quality because it adjusts the entire breast size to fit within the
picture area, thereby achieving scale unification. Furthermore,
with the entire image area covered by the breast, the model
can better focus on the breast, as the crucial elements, breast
tissues, appear larger than in original images.
Table III shows the performance of MamT4on the VinDr-
Mammo dataset compared with the single-view method (with
cancer normal normal cancer normal
tru
labels: cancer normalnormal cancer cancer
normal cancer cancer cancer normal
crop
Fig. 3. The visualization contrasts the predictions of two models: the model
without preprocessing and the model utilizing cropping. In the first two
columns, we demonstrate instances where the cropping model made correct
predictions on the cropped images, while the model without cropping failed
to do so on the original images. The third column presents a less common
scenario (34 instances as opposed to 84) where the model without cropping
correctly identifies the images as normal, and the cropping model errors.
Following that, there is an example where both models correctly identifies
cancer. Finally, the last column shows a case where both models make
incorrect predictions.
crop and with crop aug.) and method without using cropping
(w/o crop). Comparing the two cropping methods, all three
metrics coincide within the standard deviation, so for consis-
tency in the rest of the experiments, “with crop” means “with
crop aug.” The relative improvement of four-view MamT4
(with crop) compared to single-view EfficientNet-B3 (w/o
crop) is 4.4% of ROC-AUC, 11.3% of F1 score and 5.7% of F1
score (macro). Note that the VinDr-Mammo dataset includes
the complete set of four images for each study, with the sole
exception of one patient out of 4,000 from the training set,
who is manually removed from the dataset (4,0003,999).
Our approach with EmptyImage augmentation can be used
for datasets like the CBIS-DDSM [27], which has no all four
mammogram images per patient.
Recently, ROC-AUC of 75.3% and F1 score (macro) of
76.0% have been achieved on the VinDr-Mammo dataset [33].
They divide BI-RADS into two classes differently than we do:
BI-RADS 2 “normal”, 4, 5 “cancer”, BI-RADS 1 and 3
is not included. Table IV show the performance of methods
with their method of dividing them into 2 classes. In that case
FL has α1= 0.87.
We adopt a two-stage approach in our new framework:
MamT4, a cancer classification model, is based on the analysis
of four views. It is important to note that our framework is
versatile and can be adapted to predict different scenarios, such
as two projections for a single breast or any required number
of images in a specific area.
TABLE III
PERFORMANCE COMPARISON OF THREE METHODS ON VINDR-M AMMO
DATASE T.
Method ROC-AUC F1 F1 (macro)
w/o crop 79.6±2.0 44.7±0.4 71.10 ±0.20
with crop aug. 83.8±0.4 53.2±1.1 75.5±0.5
with crop all 82.4±1.0 52.8±0.8 75.3±0.4
MamT4with crop 84.0±1.756.0±1.3 76.8±0.8
TABLE IV
PERFORMANCE COMPARISON OF TWO METHODS ON VINDR-MAMMO
DATASE T,WHE N WE AS SU ME BI-RADS 2 NORMALAND 4, 5
CA NCE R”, BI-RADS 1,3 IS NOT INCLUDED.
Method ROC-AUC F1 F1 (macro)
with crop 79.9±0.9 57.8±1.1 75.0±0.7
MamT4with crop 80.3±1.161.0±1.4 77.2±0.7
V. RE LATE D WOR K
Multi-view Analysis. The rationale for using four images
simultaneously for prediction is that, in radiologist practices,
the symmetry information from other images is utilized to
improve the accuracy of decisions. For example, a lesion in
one breast rarely appears in the corresponding area in the other
breast [34]. Other multi-view approaches, using two to four
images as inputs, have also been proposed [34]–[39]. Recent
studies indicate that multiple-view approaches improve breast
cancer diagnosis [33], [40], [41].
Applications. The proposed method of using a multi-view
model to determine the diagnosis of breast cancer from a mam-
mogram can also be applied to other areas of medicine where
multiple projections or types of images need to be analyzed
for more accurate diagnosis. For example, this technique can
be effectively applied to identify the diagnosis of other types
of cancer, such as lung, stomach, skin, from different types of
scans like computed tomography (CT), magnetic resonance
imaging (MRI), and ultrasound, where deep learning tech-
niques are already being actively applied [42], [43].
One example of medical imaging that requires the analysis
of multiple projections or types of images for a more accurate
diagnosis may be CT in the examination of patients with
head injuries. If there is a suspicion of skull fracture or other
injuries, doctors may need to review images from different
projections to get a complete picture of the injuries and
choose the best treatment method. Studies on multi-class
semantic segmentation and on detection of abnormalities in
traumatic brain injury have already shown their effectiveness
and the positive impact of artificial intelligence techniques in
optimizing workflow in radiology [44], [45]. Therefore, the
proposed method on analyzing multiple projections of CT
scans can significantly increase the performance in detecting
suspicious areas and thus help clinicians to make a more
accurate informed decision on further treatment of the patient.
VI. CONCLUSION
Our study achieved metrics on the independent VinDr-
Mammo dataset, including a ROC-AUC of 84.0 ± 1.7 and a
F1 score of 56.0 ± 1.3. The preprocessing method involved
a cropping model that focused on the breast region and
removed extraneous artifacts, while also enlarging the breast
to the image’s size allowing the classification model to better
distinguish details.
Our framework MamT4utilized multi-view analysis based
on Transformer Encoder to improve cancer classification accu-
racy. Depending on task domain, the number of input images
can be increased or decreased. This approach can also be
beneficial in various medical imaging applications beyond
mammography where different projections of the same object
are used, improving accuracy and helping physicians to make
informed patient treatment decisions.
REFERENCES
[1] P. Hopewood and M. Milroy, Quality Cancer Care: Survivorship
Before, During and After Treatment. Springer International
Publishing, 2018. [Online]. Available: https://books.google.ru/books?
id=W9ldDwAAQBAJ
[2] M. Mainiero, L. Moy, P. Baron, A. Didwania, R. diFlorio, E. Green,
S. Heller, A. Holbrook, S.-J. Lee, A. Lewin, A. Lourenco, K. Nance,
B. Niell, P. Slanetz, A. Stuckey, N. Vincoff, S. Weinstein, M. Yepes,
and M. Newell, Acr appropriateness criteria ® breast cancer screening,”
Journal of the American College of Radiology, vol. 14, pp. S383–S390,
11 2017.
[3] J. Tang, R. M. Rangayyan, J. Xu, I. E. Naqa, and Y. Yang, “Computer-
aided detection and diagnosis of breast cancer with mammography:
Recent advances, IEEE Transactions on Information Technology in
Biomedicine, vol. 13, no. 2, pp. 236–251, 2009.
[4] A. Y. Lee, D. J. Wisner, S. Aminololama-Shakeri, V. A. Arasu,
S. A. Feig, J. Hargreaves, H. Ojeda-Fournier, L. W. Bassett, C. J.
Wells, J. De Guzman, C. I. Flowers, J. E. Campbell, S. L. Elson,
H. Retallack, and B. N. Joe, “Inter-reader variability in the use of bi-
rads descriptors for suspicious findings on diagnostic mammography,
Academic Radiology, vol. 24, no. 1, p. 60–66, Jan. 2017. [Online].
Available: http://dx.doi.org/10.1016/j.acra.2016.09.010
[5] C. D. Lehman, R. F. Arao, B. L. Sprague, J. M. Lee, D. S. M. Buist,
K. Kerlikowske, L. M. Henderson, T. L. Onega, A. N. Tosteson, G. H.
Rauscher, and D. L. Miglioretti, “National performance benchmarks for
modern screening digital mammography: Update from the breast cancer
surveillance consortium.” Radiology, vol. 283 1, pp. 49–58, 2017.
[Online]. Available: https://api.semanticscholar.org/CorpusID:4786906
[6] Y. Almalki, T. Soomro, M. Irfan, S. Alduraibi, and A. Ali, “Comput-
erized analysis of mammogram images for early detection of breast
cancer, Healthcare, vol. 10, p. 801, 04 2022.
[7] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
pp. 436–44, 05 2015.
[8] M. Prodan, E. Paraschiv, and A. Stanciu, “Applying deep learning meth-
ods for mammography analysis and breast cancer detection,” Applied
Sciences, vol. 13, p. 4272, 03 2023.
[9] R. Warren, S. Duffy, and S. Alija, “The value of the second view in
screening mammography, The British journal of radiology, vol. 69, pp.
105–8, 02 1996.
[10] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
for biomedical image segmentation,” CoRR, vol. abs/1505.04597, 2015.
[Online]. Available: http://arxiv.org/abs/1505.04597
[11] A. Ibragimov, S. Senotrusova, K. Markova, E. Karpulevich, A. Ivanov,
E. Tyshchuk, P. Grebenkina, O. Stepanova, A. Sirotskaya, A. Kovaleva
et al., “Deep semantic segmentation of angiogenesis images,” Interna-
tional Journal of Molecular Sciences, vol. 24, no. 2, p. 1102, 2023.
[12] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai,
T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al.,
“An image is worth 16x16 words: Transformers for image recognition
at scale,” arXiv preprint arXiv:2010.11929, 2020.
[13] S. Asgari Taghanaki, K. Abhishek, J. P. Cohen, J. Cohen-Adad, and
G. Hamarneh, “Deep semantic segmentation of natural and medical
images: a review,” Artificial Intelligence Review, vol. 54, no. 1, pp. 137–
178, 2021.
[14] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll´
ar, “Focal loss
for dense object detection,” in Proceedings of the IEEE international
conference on computer vision, 2017, pp. 2980–2988.
[15] D. Abdelhafiz, C. Yang, R. Ammar, and S. Nabavi, “Deep
convolutional neural networks for mammography: advances, challenges
and applications,” BMC Bioinformatics, vol. 20, no. 11, p. 281, Jun
2019. [Online]. Available: https://doi.org/10.1186/s12859-019-2823-4
[16] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large
scale visual recognition challenge,” International journal of computer
vision, vol. 115, pp. 211–252, 2015.
[17] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
of deep bidirectional transformers for language understanding,” arXiv
preprint arXiv:1810.04805, 2018.
[18] H. T. Nguyen, H. Q. Nguyen, H. H. Pham, K. Lam, L. T. Le, M. Dao, and
V. Vu, “VinDr-Mammo: A large-scale benchmark dataset for computer-
aided diagnosis in full-field digital mammography,” Sci. Data, vol. 10,
no. 1, p. 277, May 2023.
[19] L. Shen, L. R. Margolies, J. H. Rothstein, E. Fluder, R. McBride, and
W. Sieh, “Deep learning to improve breast cancer detection on screening
mammography,” Sci. Rep., vol. 9, no. 1, p. 12495, Aug. 2019.
[20] E. Kegeles, A. Naumov, E. A. Karpulevich, P. Volchkov, and P. Baranov,
“Convolutional neural networks can predict retinal differentiation in
retinal organoids,” Front. Cell. Neurosci., vol. 14, p. 171, Jul. 2020.
[21] M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for
convolutional neural networks, CoRR, vol. abs/1905.11946, 2019.
[Online]. Available: http://arxiv.org/abs/1905.11946
[22] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo,
“Swin transformer: Hierarchical vision transformer using shifted
windows, CoRR, vol. abs/2103.14030, 2021. [Online]. Available:
https://arxiv.org/abs/2103.14030
[23] Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao,
Z. Zhang, L. Dong, F. Wei, and B. Guo, “Swin transformer V2:
scaling up capacity and resolution,” CoRR, vol. abs/2111.09883, 2021.
[Online]. Available: https://arxiv.org/abs/2111.09883
[24] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo,
“Segformer: Simple and efficient design for semantic segmentation with
transformers,” CoRR, vol. abs/2105.15203, 2021. [Online]. Available:
https://arxiv.org/abs/2105.15203
[25] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.
[26] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang,
Y. Zhu, R. Pang, V. Vasudevan et al., “Searching for mobilenetv3,”
in Proceedings of the IEEE/CVF international conference on computer
vision, 2019, pp. 1314–1324.
[27] R. S. Lee, F. Gimenez, A. Hoogi, K. K. Miyake, M. Gorovoy, and D. L.
Rubin, “A curated mammography data set for use in computer-aided
detection and diagnosis research,” Sci. Data, vol. 4, no. 1, p. 170177,
Dec. 2017.
[28] I. C. Moreira, I. Amaral, I. Domingues, A. Cardoso, M. J. Cardoso,
and J. S. Cardoso, “INbreast: toward a full-field digital mammographic
database,” Acad. Radiol., vol. 19, no. 2, pp. 236–248, Feb. 2012.
[29] A. S. Alsolami, W. Shalash, W. Alsaggaf, S. Ashoor, H. Refaat, and
M. Elmogy, “King abdulaziz university breast cancer mammogram
dataset (KAU-BCMD), Data (Basel), vol. 6, no. 11, p. 111, Oct. 2021.
[30] J. Suckling, J. Parker, S. Astley, I. W. Hutt, C. R. M. Boggis,
I. W. Ricketts, E. A. Stamatakis, N. Cerneaz, S. Kok, P. Taylor,
D. Betal, and J. Savage, “The mammographic image analysis
society digital mammogram database,” 1994. [Online]. Available:
https://api.semanticscholar.org/CorpusID:56649461
[31] C. Cui, L. Li, H. Cai, Z. Fan, L. Zhang, T. Dan, J. Li, and J. Wang, “The
chinese mammography database (CMMD): An online mammography
database with biopsy confirmed types for machine diagnosis of breast,”
2021.
[32] A. Chattopadhay, A. Sarkar, P. Howlader, and V. N. Balasubramanian,
“Grad-cam++: Generalized gradient-based visual explanations for
deep convolutional networks, in 2018 IEEE Winter Conference on
Applications of Computer Vision (WACV). IEEE, Mar. 2018. [Online].
Available: http://dx.doi.org/10.1109/WACV.2018.00097
[33] T. T. Truong, H. T. Nguyen, T. B. Lam, D. V. Nguyen, and P. H.
Nguyen, “Delving into ipsilateral mammogram assessment under multi-
view network, in International Workshop on Machine Learning in
Medical Imaging. Springer, 2023, pp. 367–376.
[34] Z. Yang, Z. Cao, Y. Zhang, Y. Tang, X. Lin, R. Ouyang, M. Wu,
M. Han, J. Xiao, L. Huang et al., “Momminet-v2: Mammographic multi-
view mass identification networks, Medical Image Analysis, vol. 73, p.
102204, 2021.
[35] Y. Chen, H. Wang, C. Wang, Y. Tian, F. Liu, Y. Liu, M. Elliott, D. J.
McCarthy, H. Frazer, and G. Carneiro, “Multi-view local co-occurrence
and global consistency learning improve mammogram classification gen-
eralisation,” in International Conference on Medical Image Computing
and Computer-Assisted Intervention. Springer, 2022, pp. 3–13.
[36] H. Wang, J. Feng, Z. Zhang, H. Su, L. Cui, H. He, and L. Liu, “Breast
mass classification via deeply integrating the contextual information
from multi-view data, Pattern Recognition, vol. 80, pp. 42–52, 2018.
[37] G. Carneiro, J. Nascimento, and A. P. Bradley, Automated analysis
of unregistered multi-view mammograms with deep learning, IEEE
transactions on medical imaging, vol. 36, no. 11, pp. 2355–2365, 2017.
[38] Y. Li, H. Chen, L. Cao, and J. Ma, A survey of computer-aided detection
of breast cancer with mammography, J Health Med Inf, vol. 4, no. 7,
pp. 1–6, 2016.
[39] H. T. Nguyen, S. B. Tran, D. B. Nguyen, H. H. Pham, and H. Q.
Nguyen, “A novel multi-view deep learning approach for bi-rads and
density assessment of mammograms,” in 2022 44th Annual International
Conference of the IEEE Engineering in Medicine & Biology Society
(EMBC). IEEE, 2022, pp. 2144–2148.
[40] H. N. Khan, A. R. Shahid, B. Raza, A. H. Dar, and H. Alquhayz,
“Multi-view feature fusion based four views model for mammogram
classification using convolutional neural network, IEEE Access, vol. 7,
pp. 165 724–165 733, 2019.
[41] K. J. Geras, S. Wolfson, Y. Shen, N. Wu, S. Kim, E. Kim, L. Heacock,
U. Parikh, L. Moy, and K. Cho, “High-resolution breast cancer screening
with multi-view deep convolutional neural networks,” arXiv preprint
arXiv:1703.07047, 2017.
[42] V. Kumar, C. Prabha, P. Sharma, N. Mittal, S. Askar, and
M. Abouhawwash, “Unified deep learning models for enhanced lung
cancer prediction with resnet-50–101 and efficientnet-b3 using dicom
images,” BMC Medical Imaging, vol. 24, 03 2024.
[43] H. Ueyama, Y. Kato, Y. Akazawa, N. Yatagai, H. Komori, T. Takeda,
K. Matsumoto, K. Ueda, K. Matsumoto, M. Hojo, T. Yao, A. Nagahara,
and T. Tada, “Application of artificial intelligence using a convolutional
neural network for diagnosis of early gastric cancer based on magnifying
endoscopy with narrow-band imaging, Journal of Gastroenterology and
Hepatology, vol. 36, 07 2020.
[44] M. Monteiro, V. Newcombe, F. Mathieu, K. Adatia, K. Kamnitsas,
E. Ferrante, T. Das, D. Whitehouse, D. Rueckert, D. Menon, and
B. Glocker, “Multiclass semantic segmentation and quantification of
traumatic brain injury lesions on head ct using deep learning: an
algorithm development and multicentre validation study,” The Lancet
Digital Health, vol. 2, 05 2020.
[45] L. Poonamallee and S. Joshi, “Automated detection of intracranial
hemorrhage from head ct scans applying deep learning techniques
in traumatic brain injuries: A comparative review,” Indian Journal of
Neurotrauma, vol. 20, 07 2023.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Significant advancements in machine learning algorithms have the potential to aid in the early detection and prevention of cancer, a devastating disease. However, traditional research methods face obstacles, and the amount of cancer-related information is rapidly expanding. The authors have developed a helpful support system using three distinct deep-learning models, ResNet-50, EfficientNet-B3, and ResNet-101, along with transfer learning, to predict lung cancer, thereby contributing to health and reducing the mortality rate associated with this condition. This offer aims to address the issue effectively. Using a dataset of 1,000 DICOM lung cancer images from the LIDC-IDRI repository, each image is classified into four different categories. Although deep learning is still making progress in its ability to analyze and understand cancer data, this research marks a significant step forward in the fight against cancer, promoting better health outcomes and potentially lowering the mortality rate. The Fusion Model, like all other models, achieved 100% precision in classifying Squamous Cells. The Fusion Model and ResNet-50 achieved a precision of 90%, closely followed by EfficientNet-B3 and ResNet-101 with slightly lower precision. To prevent overfitting and improve data collection and planning, the authors implemented a data extension strategy. The relationship between acquiring knowledge and reaching specific scores was also connected to advancing and addressing the issue of imprecise accuracy, ultimately contributing to advancements in health and a reduction in the mortality rate associated with lung cancer.
Article
Full-text available
Traumatic brain injury (TBI) is not only an acute condition but also a chronic disease with long-term consequences. Intracranial hematomas are considered the primary consequences that occur in TBI and may have devastating effects that may lead to mass effect on the brain and eventually cause secondary brain injury. Emergent detection of hematoma in computed tomography (CT) scans and assessment of three major determinants, namely, location, volume, and size, is crucial for prognosis and decision-making, and artificial intelligence (AI) using deep learning techniques, such as convolutional neural networks (CNN) has received extended attention after demonstrations that it could perform at least as well as humans in imaging classification tasks. This article conducts a comparative review of medical and technological literature to update and establish evidence as to how technology can be utilized rightly for increasing the efficiency of the clinical workflow in emergency cases. A systematic and comprehensive literature search was conducted in the electronic database of PubMed and Google Scholar from 2013 to 2023 to identify studies related to the automated detection of intracranial hemorrhage (ICH). Inclusion and exclusion criteria were set to filter out the most relevant articles. We identified 15 studies on the development and validation of computer-assisted screening and analysis algorithms that used head CT scans. Our review shows that AI algorithms can prioritize radiology worklists to reduce time to screen for ICH in the head scans sufficiently and may also identify subtle ICH overlooked by radiologists, and that automated ICH detection tool holds promise for introduction into routine clinical practice.
Article
Full-text available
Mammography, or breast X-ray imaging, is the most widely used imaging modality to detect cancer and other breast diseases. Recent studies have shown that deep learning-based computer-assisted detection and diagnosis (CADe/x) tools have been developed to support physicians and improve the accuracy of interpreting mammography. A number of large-scale mammography datasets from different populations with various associated annotations and clinical data have been introduced to study the potential of learning-based methods in the field of breast radiology. With the aim to develop more robust and more interpretable support systems in breast imaging, we introduce VinDr-Mammo, a Vietnamese dataset of digital mammography with breast-level assessment and extensive lesion-level annotations, enhancing the diversity of the publicly available mammography data. The dataset consists of 5,000 mammography exams, each of which has four standard views and is double read with disagreement (if any) being resolved by arbitration. The purpose of this dataset is to assess Breast Imaging Reporting and Data System (BI-RADS) and breast density at the individual breast level. In addition, the dataset also provides the category, location, and BI-RADS assessment of non-benign findings. We make VinDr-Mammo publicly available as a new imaging resource to promote advances in developing CADe/x tools for mammography interpretation.
Article
Full-text available
Breast cancer is a serious medical condition that requires early detection for successful treatment. Mammography is a commonly used imaging technique for breast cancer screening, but its analysis can be time-consuming and subjective. This study explores the use of deep learning-based methods for mammogram analysis, with a focus on improving the performance of the analysis process. The study is focused on applying different computer vision models, with both CNN and ViT architectures, on a publicly available dataset. The innovative approach is represented by the data augmentation technique based on synthetic images, which are generated to improve the performance of the models. The results of the study demonstrate the importance of data pre-processing and augmentation techniques for achieving high classification performance. Additionally, the study utilizes explainable AI techniques, such as class activation maps and centered bounding boxes, to better understand the models’ decision-making process.
Article
Full-text available
Angiogenesis is the development of new blood vessels from pre-existing ones. It is a complex multifaceted process that is essential for the adequate functioning of human organisms. The investigation of angiogenesis is conducted using various methods. One of the most popular and most serviceable of these methods in vitro is the short-term culture of endothelial cells on Matrigel. However, a significant disadvantage of this method is the manual analysis of a large number of microphotographs. In this regard, it is necessary to develop a technique for automating the annotation of images of capillary-like structures. Despite the increasing use of deep learning in biomedical image analysis, as far as we know, there still has not been a study on the application of this method to angiogenesis images. To the best of our knowledge, this article demonstrates the first tool based on a convolutional Unet++ encoder–decoder architecture for the semantic segmentation of in vitro angiogenesis simulation images followed by the resulting mask postprocessing for data analysis by experts. The first annotated dataset in this field, AngioCells, is also being made publicly available. To create this dataset, participants were recruited into a markup group, an annotation protocol was developed, and an interparticipant agreement study was carried out.
Article
Full-text available
Breast cancer is widespread worldwide and can be cured if diagnosed early. Using digital mammogram images and image processing with artificial intelligence can play an essential role in breast cancer diagnosis. As many computerized algorithms for breast cancer diagnosis have significant limitations, such as noise handling and varying or low contrast in the images, it can be difficult to segment the abnormal region. These challenges could be overcome by proposing a new pre-processing model, exploring its impact on the post-processing module, and testing it on an extensive database. In this research work, the three-step method is proposed and validated on large databases of mammography images. The first step corresponded to the database classification, followed by the second step, which removed the pectoral muscle from the mammogram image. The third stage utilized new image-enhancement techniques and a new segmentation module to detect abnormal regions in a well-enhanced image to diagnose breast cancer. The pre-and post-processing modules are based on novel image processing techniques. The proposed method was tested using data collected from different hospitals in the Qassim Health Cluster, Qassim Province, Saudi Arabia. This database contained the five categories in the Breast Imaging and Reporting and Data System and consisted of 2892 images; the proposed method is analyzed using the publicly available Mammographic Image Analysis Society database, which contained 322 images. The proposed method gives good contrast enhancement with peak-signal to noise ratio improvement of 3 dB. The proposed method provides an accuracy of approximately 92% on 2892 images of Qassim Health Cluster, Qassim Province, Saudi Arabia. The proposed method gives approximately 97% on the Mammographic Image Analysis Society database. The novelty of the proposed work is that it could work on all Breast Imaging and Reporting and Data System categories. The performance of the proposed method demonstrated its ability to improve the diagnostic performance of the computerized breast cancer detection method.
Article
Full-text available
The current era is characterized by the rapidly increasing use of computer-aided diagnosis (CAD) systems in the medical field. These systems need a variety of datasets to help develop, evaluate, and compare their performances fairly. Physicians indicated that breast anatomy, especially dense ones, and the probability of breast cancer and tumor development, vary highly depending on race. Researchers reported that breast cancer risk factors are related to culture and society. Thus, there is a massive need for a local dataset representing breast cancer in our region to help develop and evaluate automatic breast cancer CAD systems. This paper presents a public mammogram dataset called King Abdulaziz University Breast Cancer Mammogram Dataset (KAU-BCMD) version 1. To our knowledge, KAU-BCMD is the first dataset in Saudi Arabia that deals with a large number of mammogram scans. The dataset was collected from the Sheikh Mohammed Hussein Al-Amoudi Center of Excellence in Breast Cancer at King Abdulaziz University. It contains 1416 cases. Each case has two views for both the right and left breasts, resulting in 5662 images based on the breast imaging reporting and data system. It also contains 205 ultrasound cases corresponding to a part of the mammogram cases, with 405 images as a total. The dataset was annotated and reviewed by three different radiologists. Our dataset is a promising dataset that contains different imaging modalities for breast cancer with different cancer grades for Saudi women.
Article
We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding, thereby avoiding the interpolation of positional codes which leads to decreased performance when the testing resolution differs from training. 2) SegFormer avoids complex decoders. The proposed MLP decoder aggregates information from different layers, and thus combining both local attention and global attention to render powerful representations. We show that this simple and lightweight design is the key to efficient segmentation on Transformers. We scale our approach up to obtain a series of models from SegFormer-B0 to SegFormer-B5, reaching significantly better performance and efficiency than previous counterparts. For example, SegFormer-B4 achieves 50.3% mIoU on ADE20K with 64M parameters, being 5x smaller and 2.2% better than the previous best method. Our best model, SegFormer-B5, achieves 84.0% mIoU on Cityscapes validation set and shows excellent zero-shot robustness on Cityscapes-C. Code will be released at: http://github.com/NVlabs/SegFormer
Conference Paper
Advanced deep learning (DL) algorithms may predict the patient's risk of developing breast cancer based on the Breast Imaging Reporting and Data System (BI-RADS) and density standards. Recent studies have suggested that the combination of multi-view analysis improved the overall breast exam classification. In this paper, we propose a novel multi-view DL approach for BI-RADS and density assessment of mammograms. The proposed approach first deploys deep convolutional networks for feature extraction on each view separately. The extracted features are then stacked and fed into a Light Gradient Boosting Machine (LightGBM) classifier to predict BI-RADS and density scores. We conduct extensive experiments on both the internal mammography dataset and the public dataset Digital Database for Screening Mammogra-phy (DDSM). The experimental results demonstrate that the proposed approach outperforms the single-view classification approach on two benchmark datasets by huge F1-score margins (+5% on the internal dataset and +10% on the DDSM dataset). These results highlight the vital role of combining multi-view information to improve the performance of breast cancer risk prediction.
Article
Many existing approaches for mammogram analysis are based on single view. Some recent DNN-based multi-view approaches can perform either bilateral or ipsilateral analysis, while in practice, radiologists use both to achieve the best clinical outcome. MommiNet is the first DNN-based tri-view mass identification approach, which can simultaneously perform bilateral and ipsilateral analysis of mammographic images, and in turn, can fully emulate the radiologists’ reading practice. In this paper, we present MommiNet-v2, with improved network architecture and performance. Novel high-resolution network (HRNet)-based architectures are proposed to learn the symmetry and geometry constraints, to fully aggregate the information from all views for accurate mass detection. A multi-task learning scheme is adopted to incorporate both Breast Imaging-Reporting and Data System (BI-RADS) and biopsy information to train a mass malignancy classification network. Extensive experiments have been conducted on the public DDSM (Digital Database for Screening Mammography) dataset and our in-house dataset, and state-of-the-art results have been achieved in terms of mass detection accuracy. Satisfactory mass malignancy classification result has also been obtained on our in-house dataset.