ArticlePDF Available

Comparative analysis of deep learning methods of detection of diabetic retinopathy


Abstract and Figures

Diabetic retinopathy is a common complication of diabetes, that affects blood vessels in the light-sensitive tissue called the retina. It is the most common cause of vision loss among people with diabetes and the leading cause of vision impairment and blindness among working-age adults. Recent progress in the use of automated systems for diabetic retinopathy diagnostics has offered new challenges for the industry, namely the search for a less resource-intensive architecture, e.g., for the development of low-cost embedded software. This paper proposes a comparison between two widely used conventional architectures (DenseNet, ResNet) with the new optimized one (EfficientNet). The proposed methods classify the retinal image as one of 5 class cases based on the dataset obtained from the 4th Asia Pacific Tele-Ophthalmology Society (APTOS) Symposium.
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
Cogent Engineering
ISSN: (Print) (Online) Journal homepage:
Comparative analysis of deep learning methods of
detection of diabetic retinopathy
Alexandr Pak , Atabay Ziyaden , Kuanysh Tukeshev , Assel Jaxylykova & Dana
Abdullina |
To cite this article: Alexandr Pak , Atabay Ziyaden , Kuanysh Tukeshev , Assel Jaxylykova &
Dana Abdullina | (2020) Comparative analysis of deep learning methods of detection of diabetic
retinopathy, Cogent Engineering, 7:1, 1805144, DOI: 10.1080/23311916.2020.1805144
To link to this article:
© 2020 The Author(s). This open access
article is distributed under a Creative
Commons Attribution (CC-BY) 4.0 license.
Published online: 13 Aug 2020.
Submit your article to this journal
Article views: 323
View related articles
View Crossmark data
Comparative analysis of deep learning methods
of detection of diabetic retinopathy
Alexandr Pak
*, Atabay Ziyaden
, Kuanysh Tukeshev
, Assel Jaxylykova
and Dana Abdullina
Abstract: Diabetic retinopathy is a common complication of diabetes, that affects
blood vessels in the light-sensitive tissue called the retina. It is the most common
cause of vision loss among people with diabetes and the leading cause of vision
impairment and blindness among working-age adults. Recent progress in the use of
automated systems for diabetic retinopathy diagnostics has offered new challenges
for the industry, namely the search for a less resource-intensive architecture, e.g.,
for the development of low-cost embedded software. This paper proposes
a comparison between two widely used conventional architectures (DenseNet,
ResNet) with the new optimized one (EfficientNet). The proposed methods classify
the retinal image as one of 5 class cases based on the dataset obtained from the
4th Asia Pacific Tele-Ophthalmology Society (APTOS) Symposium.
Subjects: Machine Learning - Design; Neural Networks; Computer Science; General;
Keywords: deep learning; diabetic retinopathy; efficientnet; convolutional neural net
Our team comprises of 5 people. We specialize in
artificial neural networks, natural language pro-
cessing, digital signal and image processing. The
goals and objectives of the research are the
development and testing of methods, algo-
rithms, and architectures of applied neural net-
works to medical data, in particular to diabetic
Pak Aleхandr Aleхandrovich - Candidate of
Technical Sciences, Assistant Professor at FIT
KBTU, Scientific Leader of the Doctoral
Candidate, Project Manager: No. AP05132760
“The development of methods for deep learning
of semantic probability inference” of the
Institute of Information and Computational
Atabay Ziyaden- software engineer in (IICT), Big
data mining laboratory.
Tukeshev Kuanysh – Medical Doctor at Institute
of Eye Diseases.
Jaxylykova Assel- PhD student from KazNU, IICT
in specialty: «Information Systems».
Abdullina Dana – Medical Doctor at Institute of
Eye Diseases.
Imagine that we can stop blindness before it
becomes irreversible, with the help of simple
hardware and common cell phone. The problem
of blindness is closely related to the disease of
diabetic retinopathy (DR) - a vascular complica-
tion of diabetes mellitus (DM). DR takes one of the
first places as the cause of blindness and visual
impairment in the age group from 20 to 70 years.
In this study, we offer testing results of various
state-of-the-art neural network architectures
applied to DR diagnosis, in terms of accuracy and
performance. The purpose of such work is the
hardware-software complex in the form of
a service that can be used by a non-specialized
medical worker (paramedic and nurse), this is
especially important for the rural and regional
population, where there are not enough doctors
(ophthalmologists and endocrinologists). And
moreover, not everywhere there is access to the
Pak et al., Cogent Engineering (2020), 7: 1805144
Page 1 of 9
Received: 29 July 2019
Accepted: 12 June 2020
*Corresponding author: Alexandr Pak,
Institute of Information and
Computational Technologies, Almaty
050010, Kazakhstan
Reviewing editor:
Duc Pham, School of Mechanical
Engineering, University of
Birmingham, Birmingham, UK
Additional information is available at
the end of the article
© 2020 The Author(s). This open access article is distributed under a Creative Commons
Attribution (CC-BY) 4.0 license.
1. Introduction
Diabetic retinopathy (DR) is one of the significant causes of blindness. Since DR is a progressive
process, medical experts suggest that patients with diabetes must be screened at least twice
a year to diagnose the signs of the disease promptly. With the current clinical diagnosis, detection
is mainly based on the fact that the ophthalmologist examines the color image of the fundus. This
detection is laborious and time-consuming, which leads to a more significant error. Also, due to
a large number of patients with diabetes and the lack of medical resources in some areas, many
patients with DR cannot diagnose and treat promptly, thus losing the best treatment options and
ultimately leading to irreversible loss of vision. Especially for those patients at an early stage, if DR
can be detected and treated immediately, the process can be well controlled and delayed. At the
same time, the effect of manual interpretation is exceptionally dependent on the experience of the
clinician. Misdiagnosis often occurs due to a lack of experience with doctors. (Aljawadi & Shaya,
2007; Kaleeva & Libman, 2010; Lisochkina et al., 2004).
In connection with the problem of an erroneous diagnosis, the development of intelligent
systems to support decision-making by ophthalmologists has aroused the interest of the scientific
community in several works (Gadekallu et al. 2020; Gulshan et al., 2016; Mateen et al., 2018; Reddy
et al., 2020).
We should note the pioneering work (Gulshan et al., 2016), the main idea is dedicated to
telemedicine issues, namely the ability to diagnose retinopathy remotely based on images
obtained using a cell phone. The study shows a methodological approach for obtaining fundus
images with subsequent diagnosis by neurocomputing algorithms and comparing the results with
the opinion of ophthalmologists. The mentioned above neurocomputing algorithms are
Convolutional Neural Networks (CNN). The base hardware is the smartphone iPhone 4. The sample
size of patients was small, only 55 people, 110 eye shots. In general, remote diagnostics showed
high values of sensitivity and specificity.
CNN have made remarkable achievements in a large number of computer vision and image
classification, significantly exceeding all previous image analysis methodologies.
(Russakovsky et al., 2012) described new CNN architecture—AlexNet. It showed significant
performance improvements at the 2012 LSVRC competition, achieving top-5 error 15,31 %. For
comparison, a method that does not use convolutional neural networks received an error of 26.1%.
The network contains 62.3 million parameters and spends 1.1 billion calculations in a direct pass.
Convolutional layers, which account for 6% of all parameters, perform 95% of the calculations.
VGG Net is a convolutional neural network model proposed in (Simonyan and Zisserman (2014). At
the 2014 ILSVRC competition, an ensemble of two VGG Net received a top-5 error of 7.3%. Due to the
depth and number of fully connected nodes, the VGG16 weighs over 533 MB and contains 138 million
parameters. The enormous size of VGG makes the deployment process of the model a tedious task.
The next step in the development of CNN model was the winner of ILSVRC 2015 with a top-5
error of 3.57%—an ensemble of six ResNet type (Residual Network) networks developed by
Microsoft Research (K. He et al., 2016). The ResNet-50 has over 23 million trainable parameters.
The next architecture, in which it was possible to significantly reduce the number of parameters
without significant loss of quality, was published in (Huang et al., 2017) and called DenseNet
(Dense Convolutional Network). The main idea of the architecture is to shorten the connection at
CNN, which allowed to train deeper and more accurate models. With dense connection, fewer
parameters and high accuracy are achieved compared with ResNet and Pre-Activation ResNet.
DenseNet121 number of model parameters was 7 million.
Pak et al., Cogent Engineering (2020), 7: 1805144
Page 2 of 9
Recent work (Real et al., 2019) presented a new artificially discovered architecture—
AmoebaNet-A. The architecture sets a new state-of-the-art 83.9% top-1/96.6% top-5 ImageNet
accuracy. The results are comparable to the current state-of-the-art ImageNet models.
There is a modern EfficientNet architecture that achieve much better accuracy and efficiency
than AmoebaNet-C. EfficientNet-B7 achieves state-of-the-art 84.4% top-1/97.1% top-5 accuracy
on ImageNet while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet
(Tan & Le, 2019). Thus, one of the important areas of research was the issue of reducing the
computational complexity of neural network architectures while maintaining an acceptable level
of accuracy. Indeed, the modern requirements for diagnostic software imply working offline, since
not all settlements have high-quality high-speed Internet. This paper explores the use of low
resource CNN architectures for the diagnosis of diabetic retinopathy.
2. Related work
The task of the early detection of DR is a hard problem in the field of computer vision area. The
goal of detection is to find clinical features of retinopathy due to diagnostic transparency require-
ments. In retinal color fundus images, there are various clinical features of DR, such as micro-
aneurysms, hemorrhages, exudates, and others. The extraction of these signs are an essential
matter for the precise diagnostics; they help to determine the actual status of DR.
(Hann et. al., 2009) states a conventional approach based on the morphology of digital images
that extracts the number such features from fundus images. The advantages of the approach are
transparency and simplicity. The presence of exudates next to macular on fundus image are an
important diagnostical sign of diabetic macular edema.
The work of (Walter et al., 2002) presented efficient algorithms for the extraction of the retinal
exudates and optic disc. The main idea of the paper that exudates can be extracted with the help of
high grey level variations, and their contour can be detected with morphological reconstruction
Another work of (Agurto et al., 2010) proposed an approach based on multiscale amplitude-
modulation frequency-modulation (AM-FM) methods to discriminate between normal and patho-
logical retinal images. The modulations were applied to the set of different small regions of fundus
images with different types of lesions. After that, the feature vector of a small region was derived
from the amplitude-frequency response. The authors claimed that there is a statistical differentia-
tion of normal retinal structures and pathological lesions based on AM-FM features. (Kazakh-British
et al., 2018) conducted numerical experiments with the following processing pipeline; first of all,
Frangi & Sato filters were applied to fundus images for blood vessels extraction; after that, the CNN
classifier was trained to detect lesions.
(Abràmoff et al., 2010) tested wavelet detectors and k Nearest Neighbors for clinical feature
extraction from fundus images. The results of the extraction are AUC 0.86 and Standard Error
0.0084. It worth noting that the dataset was produced with the help of “non-mydriatic” digital
retinal cameras. The size of a fundus image varied from 0.15 to 0.5MB.
(Gulshan et al., 2016) tested the deep learning algorithm of diabetic retinopathy on the dataset
that was produced with the help of a smartphone. This paper shows the opportunity of access to
diagnostical methods of diabetic retinopathy for a broad audience.
(Pratt et al., 2016) developed and tested CNN with 13 layers to detect the stage of DR. Their CNN
was trained and tested on NVIDIA K40c. Also, the authors stated that image with size 512 × 512
could be processed in 0.04 seconds, which makes possible real-time feedback.
Pak et al., Cogent Engineering (2020), 7: 1805144
Page 3 of 9
(Tan et al., 2017) proposed another CNN with 10 layers to simultaneously segment and dis-
criminate exudates, hemorrhages, and micro-aneurisms. The results of exudates detection are
sensitivity 0,87 and 0,71. The sensitivities of hemorrhages and micro-aneurisms are 0,62 and 0,46.
(Choi et al., 2017) tested VGG-19 architecture on STructured Analysis of the REtina (STARE)
database. The results achieved an accuracy of 30.5%, relative classifier information (RCI) of
0.052, and Cohen’s kappa of 0.224 on 10 categories as the target variable. In the case of 3
categories, results are showed an accuracy of 72.8%, 0.283 RCI, and 0.577 kappa.
(Mateen et al., 2018) proposed a symmetrically optimized solution through the combination
of a Gaussian mixture model (GMM), visual geometry group network (VGGNet), singular value
decomposition (SVD) and principal component analysis (PCA), and softmax, for region seg-
mentation, high dimensional feature extraction, feature selection, and fundus image classi-
fication, respectively. The authors claimed that the VGG-19 model outperformed the AlexNet
and spatial invariant feature transform (SIFT) in terms of classification accuracy and compu-
tational time.
(Khalifa et al., 2019) presented deep transfer learning models for medical DR detection were
investigated. The numerical experiments were conducted on the Asia Pacific Tele-Ophthalmology
Society (APTOS) 2019 dataset. The models in this research were AlexNet, Res-Net18, SqueezeNet,
GoogleNet, VGG16, and VGG19. These models were selected, as they consist of a small number of
layers when compared to larger models, such as DenseNet and InceptionResNet. Data augmen-
tation techniques were used to render the models more robust and to overcome the overfitting
We are working on the classification of fundus images according to the severity of DR so that
we can perform end-to-end classification in real-time from the fundus image to the state of
patients. For this task, we use pixel normalization techniques to highlight various clinical
features (blood vessels, exudates, microaneurysms, and others) and then classify the retinal
image into the appropriate stage of the disease. We adopt the CNN architecture for DR
discovery in the “APTOS 2019 Blindness Detection” Dataset. The contribution of this article is
summarized as follows:
(1) We test the latest CNN model (DenseNet121, ResNet50, ResNet101, EfficientNet-b4) to recog-
nize small differences between classes of images for DR detection (F. He et al., 2019; Hu et al.,
2017; Tan & Le, 2019)
(2) Hand-held training and tuning of hyper-parameters are adopted, and the experimental
results have demonstrated better accuracy than the non-transmitting training method for
classifying DR images.
3. Dataset
Aravind Eye Hospital collected the dataset of fundus images of high quality in India, information
about the dataset can be found at It
consists of 10 GB of data across 5,590 RGB-images of the fundus(approximately). The data owners
divided the dataset into two parts, in particular, the training dataset consists of 3662 images with
target labels and 1928 of the testing dataset without labels. Like any real-world data set, there is
a noise in both the images and labels. Images may contain artifacts, be out of focus, under-
exposed, or overexposed. The images were gathered from multiple clinics using a variety of
cameras over an extended period, which affects further variation.
There are five classes in data; from Table 1, one can see that classes are not uniformly represented.
Pak et al., Cogent Engineering (2020), 7: 1805144
Page 4 of 9
4. Preprocessing
There is a first step of preprocessing of all images before augmentation and training. All images
were normalized to keep the efficiency of models pre-trained on ImageNet. Preprocessing involved
several steps:
1. Balancing image sizes: images were rescaled to have the same radius and cropped to avoid
uninformative black pixels around the edges of the fundus.
2. Reducing lighting-condition effects: images come with many different lighting conditions:
some images are very dark. That was fixed by subtracting the local average color from each pixel.
3. CLAHE: Contrast adjustment was performed using the contrast limited adaptive histogram
equalization (CLAHE) filtering algorithm (Reza, 2004).
We applied augmentation on images in real-time to reduce overfitting. During each epoch,
a random augmentation of images that preserve collinearity and distance ratios was performed.
The training was started from no augmentation applied, retrained with light, mid, substantial
1. Light: randomly rotate an image 90 degrees, randomly flip an image horizontally.
2. Mid: randomly rotate an image 90 degrees, randomly flip an image horizontally and transpose
an image by swapping rows and columns. Apply median blur with randomly picked parameters.
Randomly apply CLAHE, Sharpening or Randomize Contrast and Brightness.
3. Strong: randomly rotate image 90 degrees, randomly flip the image horizontally, transpose
the image by swapping rows and columns. Apply median blur with randomly picked parameters.
Randomly apply CLAHE, sharpening filter or randomized contrast and brightness adjustments,
distortion and hue shift.
All steps of preprocessing presented in Table 2. There is a visual result of preprocessing in
Figure 1.
Table 1. Distribution of images across classes
Class Index The degree of DR Number of data
0 No DR 1805
1 Mild 999
2 Moderate 370
3 Severe 295
4 Proliferative 193
Table 2. Preprocessing flowchart
Step 1 Step 2 Step 3 Step 4 Step 5
Image size
Correcting lighting
Applying CLAHE Applying image
Training the
Pak et al., Cogent Engineering (2020), 7: 1805144
Page 5 of 9
5. EfficientNet model
It is common practice to develop convolutional neural networks (CNNs) at a lower resource cost,
and then if more resources are available, scaled it up to achieve better performance. There are
several options to scale a model, namely, there is arbitrarily increasing of the CNN depth or width,
or to use high-resolution input images during the training phase to grab data dependencies in
While such an approach is good at improving model accuracy, it usually requires manual tuning,
and still often yield suboptimal performance.
For comparative analysis, we chose the EfficientNet-B4 model that is ahead of the results,
remaining significantly less in the number of model parameters. Its architecture is presented in
Figure 2.
In general, one can define a convolutional layer as tensor function, namely Yi¼FiXi
ð Þ, where F
tensor operator, Y
is output tensor, X
is input tensor. The input tensor shape Hi;Wi;Ci
ð Þ, H
and W
are spatial dimensions and C
is a color dimension. One should define the model of convolutional
networks as a sequence of embedded functions (Tan & Le, 2019), where FL
i is a layer repeated
L times at stage i, Hi;Wi;Ci
ð Þ is the shape of the input tensor of the particular layer; . . .ð Þ is the
operator of a Hadamard product. The complexity and performance of a CNN model depend on
width, height, depth parameters. There is a method to scale up a CNN to obtain better accuracy
and efficiency based on compound scaling method with the following coefficient Φ to uniformly
scales network width, height, and resolution (Tan & Le, 2019), where α, β, γ are constants that can
Figure 1. Examples of fundus
images before(left) and after-
(right) preprocessing.
Figure 2. Architecture of
efficientNet-B4 model.
Pak et al., Cogent Engineering (2020), 7: 1805144
Page 6 of 9
be defined by grid search. Briefly, Φ is a user-defined parameter that controls the number of free
resources for model scaling, while α, β, γ specify the scaled aspect of a model (Tan & Le, 2019). On
the mentioned above principles EfficientNet-b4 was generated and used at the presented work.
N¼ FL
ð Þ
 �i¼1::n(1)
Depth :d¼αϕ
Width :w¼βϕ
Resolution :r¼γϕ
6. Numerical experiments
To compare the quality and performance of various models, we conducted experiments with pre-
processed and not preprocessed images. The dataset is divided into training and validation data and
also compared the variants with 5 classes. The assessment value of the classification is the Κ-statistic,
which is used to control only those instances that may have been correctly classified by chance. The Κ
can be calculated using both the observed (total) accuracy and random accuracy, where ATtotal
accuracy; AR—random accuracy. Non-preprocessed images obtained a Κ below 0.656 with various
models. The error was calculated using cross-entropy for discrete values. Accuracy was calculated
using a conventional formula. At the Table 2 there are the results of the numerical experiments of
various models. The preprocessed photos have a Κ below 0.690, as you can see at Table 3, the accuracy
had significantly increased when strong augmentation was added to the pipeline of the models. The
best model we obtained through the encoding of network output with the help of ordinal regression,
the K coefficient below 0.790. On another hand, the best results were obtained on contemporary
model EfficientNet-b4 with respect to familiar models such as DenseNet and ResNet, which explains
the good convergence of EfficientNet-b4 and complete applicability in the task of DR diagnosis. The
estimation of computational complexities presented at Table 4.
ð Þ
ð Þ (2)
7. Conclusion
The task of early detection of diabetic retinopathy is an actual problem of predictive medicine. Diabetic
retinopathy is the most common cause of blindness among the old aged group of people. Due to the
Table 4. Numbers of parameters of classification models
Method DenseNet121 ResNet50 ResNet101 EfficientNet-b4
Params 7 M 26 M 42 M 19 M
Table 3. Comparative analysis of classification models
DenseNet121 ResNet50 ResNet101 EfficientNet-b4
no preprocessing 0.586 0.607 0.656
+ preprocessing 0.680 0.635 0.650 0.690
+ augmentation 0.680 0.668 0.680 0.744
+ ordinal regression 0.690 0.708 0.734 0.790
Pak et al., Cogent Engineering (2020), 7: 1805144
Page 7 of 9
development of technologies the diagnostics methods become available for all segments of the
population. The most advanced techniques that help detect the stage of diabetic retinopathy resides
in a field of neurocomputing. The open problem of the area is computational complexity because there
is a demand of smartphone embedded software with low computational requirements. In the paper,
we show the possibility of the various pixel normalization techniques to highlight various clinical
features (blood vessels, exudates, microaneurysms, and others) and then classify the retinal image
into the appropriate stage of the disease. Also, we show the possibility of the development of such
a diagnosis method with the help of EfficentNet in comparison with other neural network architec-
tures. EfficientNet is an optimal model in terms of accuracy and number of parameters. The numerical
experiments were conducted on the Asia Pacific Tele-Ophthalmology Society (APTOS) 2019 dataset.
Further work assumes the augmentation of the algorithm with preprocessing in order to reveal
clinical-pathological features and performance upgrades.
We gratefully acknowledge the financial support of the
Ministry of Education and Sciences, Republic of
Kazakhstan (Grant AP05132760).
This work was supported by the Ministry of Education and
Science of the Republic of Kazakhstan [AP05132760].
Author details
Alexandr Pak
Atabay Ziyaden
Kuanysh Tukeshev
Assel Jaxylykova
Dana Abdullina
Institute of Information and Computational
Technologies, Almaty 050010, Kazakhstan.
Kazakh Scientific Research Institute of Eye Diseases,
Almaty 050010, Kazakhstan.
Citation information
Cite this article as: Comparative analysis of deep learning
methods of detection of diabetic retinopathy, Alexandr
Pak, Atabay Ziyaden, Kuanysh Tukeshev, Assel Jaxylykova
& Dana Abdullina, Cogent Engineering (2020), 7: 1805144.
Abramoff, M. D., Garvin, M. K., & Sonka, M. (2010). Retinal
Imaging and Image Analysis. IEEE Reviews in
Biomedical Engineering, 3, 169–208. doi:10.1109/
Agurto, C., Murray, V., Barriga, E., Murillo, S., Pattichis, M.,
Davis, H., … Soliz, P. (2010). Multiscale AM-FM meth-
ods for diabetic retinopathy lesion detection. IEEE
Transactions on Medical Imaging, 29(2), 502–512.
Aljawadi, M., & Shaya, F. T. (2007). Diabetic retinopathy.
Clinical Ophthalmology, 3(1), 259–265. https://
Choi, J. Y., Yoo, T. K., Seo, J. G., Kwak, J., Um, T. T., & Rim, T. H.
(2017). Multi-categorical deep learning neural network
to classify retinal images: A pilot study employing small
database. PloS One, 12(11), e0187336.
Gadekallu, T. R., Khare, N., Bhattacharya, S., Singh, S.,
Reddy Maddikunta, P. K., Ra, I. H., & Alazab, M.
(2020). Early detection of diabetic retinopathy
using PCA-firefly based deep learning model.
Electronics, 9(2), 274.
Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D.,
Narayanaswamy, A., … Kim, R. (2016). Development
and validation of a deep learning algorithm for
detection of diabetic retinopathy in retinal fundus
photographs. Jama, 316(22), 2402–2410. https://doi.
Hann, C. E., Revie, J. A., Hewett, D., Chase, J. G., & Shaw, G.
M. (2009). Screening for Diabetic Retinopathy Using
Computer Vision and Physiological Markers. Journal
of Diabetes Science and Technology, 3(4),819–834.
He, F., Liu, T., & Tao, D. (2019). Why resnet works? resi-
duals generalize. arXiv Preprint arXiv:1904.01367.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual
learning for image recognition. In Proceedings of the IEEE
conference on computer vision and pattern recognition
(pp. 770–778).
Hu, H., Dey, D., Del Giorno, A., Hebert, M., & Bagnell, J. A.
(2017). Log-DenseNet: How to sparsify a DenseNet.
arXiv Preprint arXiv:1711.00002. https://ui.adsabs.har
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q.
(2017). Densely connected convolutional networks.
In Proceedings of the IEEE conference on computer
vision and pattern recognition (pp. 4700–4708), Los
Alamitos, CA, USA.
Kaleeva, E. V., & Libman, E. S. (2010). The state and dynamics
of disability due to visual impairment in Russia. In
Materials of the IX Congress of Ophthalmologists.
Kazakh-British, N. S., Pak, A. A., & Abdullina, D. (2018,
October). Automatic detection of blood vessels and
classification in retinal images for diabetic retinopathy
diagnosis with application of convolution neural
network. In Proceedings of the 2018 international con-
ference on sensors, signal and image processing (pp.
60–63), International Conference on Sensors, Signal and
Image Processing, Prague. ACM.
Khalifa, N. E. M., Loey, M., Taha, M. H. N., &
Mohamed, H. N. E. T. (2019). Deep transfer learning
models for medical diabetic retinopathy detection.
Acta Informatica Medica, 27(5), 327.
Krizhevsky, A., Ilya Sutskever, & Hinton, G. E. (2012).
ImageNet Classification with Deep Convolutional
Neural Networks. Neural Information Processing
Systems Conference, 25, 1097–1105. https://papers.
Lisochkina, S. B., Astakhov, Y. S., & Shadrichev, F. E. (2004).
Diabetic retinopathy (tactics of patient management).
Clinical Ophthalmology, 5(2), 85–92.
Pak et al., Cogent Engineering (2020), 7: 1805144
Page 8 of 9
Mateen, M., Wen, J., Song, S., & Huang, Z. (2018). Fundus
image classification using VGG-19 architecture with
PCA and SVD. Symmetry, 11(1), 1.
Pratt, H., Coenen, F., Broadbent, D. M., Harding, S. P., &
Zheng, Y. (2016). Convolutional neural networks for
diabetic retinopathy. Procedia Computer Science, 90,
Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2019, July).
Regularized evolution for image classifier architec-
ture search. In Proceedings of the aaai conference on
artificial intelligence (Vol. 33, pp. 4780–4789),
Honolulu, Hawaii, USA.
Reddy, G. T., Reddy, M. P. K., Lakshmanna, K., Kaluri, R.,
Rajput, D. S., Srivastava, G., & Baker, T. (2020).
Analysis of dimensionality reduction techniques on
big data. IEEE Access, 8, 54776–54788. https://doi.
Reza, A. M. (2004). Realization of the contrast limited
adaptive histogram equalization (CLAHE) for
real-time image enhancement. Journal of VLSI signal
processing systems for signal. Image and Video
Technology, 38(1), 35–44.
Simonyan, K., & Zisserman, A. (2014). Very deep convo-
lutional networks for large-scale image recognition.
arXiv Preprint arXiv:1409.1556.
Tan, J. H., Fujita, H., Sivaprasad, S., Bhandary, S. V.,
Rao, A. K., Chua, K. C., & Acharya, U. R. (2017).
Automated segmentation of exudates, haemor-
rhages, microaneurysms using single convolutional
neural network. Information Sciences, 420, 66–76.
Tan, M., & Le, Q. V. (2019). EfficientNet: rethinking model
scaling for convolutional neural networks. arXiv
Preprint, arXiv:1905.11946. http://proceedings.mlr.
Walter, T., Klein, J.-C., Massin, P., & Erginay, A. (2002). A
contribution of image processing to the diagnosis of
diabetic retinopathy-detection of exudates in color
fundus images of the human retina. IEEE
Transactions on Medical Imaging, 21(10),1236–1243.
© 2020 The Author(s). This open access article is distributed under a Creative Commons Attribution (CC-BY) 4.0 license.
You are free to:
Share — copy and redistribute the material in any medium or format.
Adapt — remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made.
You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions
You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Cogent Engineering (ISSN: ) is published by Cogent OA, part of Taylor & Francis Group.
Publishing with Cogent OA ensures:
Immediate, universal access to your article on publication
High visibility and discoverability via the Cogent OA website as well as Taylor & Francis Online
Download and citation statistics for your article
Rapid online publication
Input from, and dialog with, expert editors and editorial boards
Retention of full copyright of your article
Guaranteed legacy preservation of your article
Discounts and waivers for authors in developing regions
Submit your manuscript to a Cogent OA journal at
Pak et al., Cogent Engineering (2020), 7: 1805144
Page 9 of 9
... In our study, fundus images used in this study are publicly available from Kaggle 1 dataset. Images were provided by the Asia Pacific Tele-Ophthalmology Society (APTOS) as part of the 2019 Blindness Detection Competition [30,40]. The Kaggle dataset is one of the widely used and wellreported datasets for diabetic retinopathy. ...
Full-text available
During the recent years, diabetic retinopathy (DR) has been one of the most threatening complications of diabetes that leads to permanent blindness. Further, DR mutilates the retinal blood vessels of a patient having diabetes. Accordingly, various artificial intelligence techniques and deep learning have been proposed to automatically detect abnormalities in DR and its different stages from retina images. In this paper, we propose a hybrid deep learning approach using deep convolutional neural network (CNN) method and two VGG network models (VGG16 and VGG19) to diabetic retinopathy detection and classification according to the visual risk linked to the severity of retinal ischemia. Indeed, the classification of DR deals with understanding the images and their context with respect to the categories. The experimental results, performed on 5584 images, which are an ensemble of online datasets, yielded an accuracy of 90.60%, recall of 95% and F1 score of 94%. The main aim of this work is to develop a robust system for detecting and classifying DR automatically.
... They achieved a validation precision of 86%, a recall of 87%, an f1-score of 86%, and a kappa score of 91.96%. Besides, Pak et al. [28] have drawn a comparison between two widely conventional architectures DenseNet and ResNet with the new optimized on EfficientNet. The proposed methods classify the retinal image of APTOS dataset into 5 classes. ...
Full-text available
Diabetic retinopathy is one of the most dangerous complications of diabetes. It affects the eyes causing damage to the blood vessels of the retina. Eventually, as the disease develops, it is possible to lose sight. The main cure for this pathology is based on the early detection which plays a crucial role in slowing the progress of the underlying disease and protecting many patients from losing their sight. However, the detection of diabetic retinopathy at its early stages remains an arduous task that requires human expert interpretation of fundus images in order to vigilantly follow-up the patient. In this paper, we shall propose a new automatic diabetic retinopathy detection method that based on deep-learning. The aforementioned approach is composed of two main steps: an initial pre-processing step where the deformable registration is applied on the retina to occupy the entire image and eliminate the effect of the background on the classification process. The second step is the classification phase in which we train four convolutional neural networks (CNN) models (Densenet-121, Xception, Inception-v3, Resnet-50) to detect the stage of diabetic retinopathy. The performance of our proposed architecture has been tested on the APTOS 2019 dataset. As the latter is relatively small, a transfer learning is adopted by pre-training the mentioned CNNs on the ImageNet dataset and fine-tuning them on the APTOS dataset. In the testing phase, the final prediction is obtained by a system of voting based on the output of the four convolutional neural networks. Our model has performed an accuracy of 85.28% in the testing phase.
... The reasons for this would require further study but it may be conceptually important that the performance of the model depends on factors other than image resolution [15] or set size alone, with the network architecture possibly also an important factor contributing to model performance. The E cientNet [6] family of models has shown among other Convolutional Neural Networks e cacy in terms of performance and speed using commercially available GPU processing capabilities in the classi cation of skin lesions [16], CT lung scans [17] and diabetic retinopathy [18] but this is the probably one of the rst papers employing this model in paediatric elbow radiographs. In this study, a lower powered B1 version of the model was employed as compared to higher (i.e. ...
Full-text available
Background To compare the performance of an AI model based on strategies designed to overcome small sized development sets to pediatric ER physicians at a classification triage task of pediatric elbow radiographs. Methods1,314 pediatric elbow lateral radiographs (mean age: 8.2 years) were retrospectively retrieved, binomially classified based on their annotation as normal or abnormal (with pathology), and randomly partitioned into a development set (993 images), tuning set (109 images), second tuning set (100 images) and test set (112 images). The AI model was trained on the development set and utilized the EfficientNet B1 compound scaling network architecture and online augmentations. Its performance on the test set was compared to a group of five physicians (inter-rater agreement: fair). Statistical analysis: AUC of AI model - DeLong method. Performance of AI model and physician groups - McNemar test. ResultsAccuracy of the model on the test set - 0.804 (95% CI, 0.718 - 0.873), AUROC - 0.872 (95% CI, 0.831 - 0.947). AI model performance compared to the physician group on the test set - sensitivity 0.790 (95% CI 0.684 to 0.895) vs 0.649 (95% CI 0.525 to 0.773), p value 0.088; specificity 0.818 (95% CI 0.716 to 0.920) vs 0.873 (95% CI 0.785 to 0.961), p value 0.439.Conclusions The AI model for elbow radiograph triage designed with strategies to optimize performance for a small sized development set showed comparable performance to physicians.
... Pak et al. [54] preprocessed the dataset and applied techniques such as data normalization and contrast adjustment. Then, they used out augmentation technique for training data in the dataset and used multiple CNN models in their study. ...
Background and Objective : Diabetes-related cases can cause glaucoma, cataracts, optic neuritis, paralysis of the eye muscles, or various retinal damages over time. Diabetic retinopathy is the most common form of blindness that occurs with diabetes. Diabetic retinopathy is a disease that occurs when the blood vessels in the retina of the eye become damaged, leading to loss of vision in advanced stages. This disease can occur in any diabetic patient, and the most important factor in treating the disease is early diagnosis. Nowadays, deep learning models and machine learning methods, which are open to technological developments, are already used in early diagnosis systems. In this study, two publicly available datasets were used. The datasets consist of five types according to the severity of diabetic retinopathy. The objectives of the proposed approach in diabetic retinopathy detection are to positively contribute to the performance of CNN models by processing fundus images through preprocessing steps (morphological gradient and segmentation approaches). The other goal is to detect efficient sets from type-based activation sets obtained from CNN models using Atom Search Optimization method and increase the classification success. Methods : The proposed approach consists of three steps. In the first step, the Morphological Gradient method is used to prevent parasitism in each image, and the ocular vessels in fundus images are extracted using the segmentation method. In the second step, the datasets are trained with transfer learning models and the activations for each class type in the last fully connected layers of these models are extracted. In the last step, the Atom Search optimization method is used, and the most dominant activation class is selected from the extracted activations on a class basis. Results : When classified by the severity of diabetic retinopathy, an overall accuracy of 99.59% was achieved for dataset #1 and 99.81% for dataset #2. Conclusions : In this study, it was found that the overall accuracy achieved with the proposed approach increased. To achieve this increase, the application of preprocessing steps and the selection of the dominant activation sets from the deep learning models were implemented using the Atom Search optimization method.
... Pak et al. performed multiple classifications on the dataset samples with various deep learning architectures such as DenseNet121, ResNet50, ResNet101, and EfficientNet-b4. They achieved the highest accuracy using EfficientNet-b4 [35]. Gangwar and Ravi presented a Transfer Learning-based deep learning algorithm where they classified the images of this dataset with a pre-trained Inception-ResNet-v2 [36]. ...
Full-text available
Diabetic Retinopathy (DR) refers to the damages endured by the retina as an effect of diabetes. DR has become a severe health concern worldwide, as the number of diabetes patients is soaring uncountably. Periodic eye examination allows doctors to detect DR in patients at an early stage to initiate proper treatments. Advancements in artificial intelligence and camera technology have allowed us to automate the diagnosis of DR, which can benefit millions of patients indeed. This paper inscribes a novel method for DR diagnosis based on the gray-level intensity and texture features extracted from fundus images using a decision tree-based ensemble learning technique. This study primarily works with the Asia Pacific Tele-Ophthalmology Society 2019 Blindness Detection (APTOS 2019 BD) dataset. We undertook several steps to curate its contents to make them more suitable for machine learning applications. Our approach incorporates several image processing techniques, two feature extraction techniques, and one feature selection technique, which results in a classification accuracy of 94.20% (margin of error: 0.32%) and an F-measure of 93.51% (margin of error: 0.5%). Several other parameters regarding the proposed method’s performance have been presented to manifest its robustness and reliability. Details on each employed technique have been included to make the provided results reproducible. This method can be a valuable tool for mass retinal screening to detect DR, thus drastically reducing the rate of vision loss attributed to it.
Diabetic retinopathy is one of the most threatening complications of diabetes that leads to permanent blindness if left untreated. Severity of the diabetic retinopathy disease is based on presence of microaneurysms, exudates, neovascularisation and haemorrhages. Convolutional neural networks have been successfully applied in many adjacent subjects, and for diagnosis of diabetic retinopathy itself. In this paper, an automatic deep-learning-based method for stage detection of diabetic retinopathy by single photography of the human fundus is proposed. Additionally, the multistage approach to transfer learning, which makes use of similar datasets with different labelling, is experimented. The proposed architecture gives high accuracy in classification through spatial analysis. Amongst other supervised algorithms involved, proposed solution is to find a better and optimized way to classifying the fundus image with little pre-processing techniques. The proposed architecture deployed with dropout layer techniques yields 78 percent accuracy.
Full-text available
Final publication: Severity Classification of Diabetic Retinopathy Using an Ensemble Learning Algorithm through Analyzing Retinal Images (
Full-text available
Due to digitization, a huge volume of data is being generated across several sectors such as healthcare, production, sales, IoT devices, Web, organizations. Machine learning algorithms are used to uncover patterns among the attributes of this data. Hence, they can be used to make predictions that can be used by medical practitioners and people at managerial level to make executive decisions. Not all the attributes in the datasets generated are important for training the machine learning algorithms. Some attributes might be irrelevant and some might not affect the outcome of the prediction. Ignoring or removing these irrelevant or less important attributes reduces the burden on machine learning algorithms. In this work two of the prominent dimensionality reduction techniques, Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are investigated on four popular Machine Learning (ML) algorithms, Decision Tree Induction, Support Vector Machine (SVM), Naive Bayes Classifier and Random Forest Classifier using publicly available Cardiotocography (CTG) dataset from University of California and Irvine Machine Learning Repository. The experimentation results prove that PCA outperforms LDA in all the measures. Also, the performance of the classifiers, Decision Tree, Random Forest examined is not affected much by using PCA and LDA.To further analyze the performance of PCA and LDA the eperimentation is carried out on Diabetic Retinopathy (DR) and Intrusion Detection System (IDS) datasets. Experimentation results prove that ML algorithms with PCA produce better results when dimensionality of the datasets is high. When dimensionality of datasets is low it is observed that the ML algorithms without dimensionality reduction yields better results.
Full-text available
Diabetic Retinopathy is a major cause of vision loss and blindness affecting millions of people across the globe. Although there are established screening methods - fluorescein angiography and optical coherence tomography for detection of the disease but in majority of the cases, the patients remain ignorant and fail to undertake such tests at an appropriate time. The early detection of the disease plays an extremely important role in preventing vision loss which is the consequence of diabetes mellitus remaining untreated among patients for a prolonged time period. Various machine learning and deep learning approaches have been implemented on diabetic retinopathy dataset for classification and prediction of the disease but majority of them have neglected the aspect of data pre-processing and dimensionality reduction, leading to biased results. The dataset used in the present study is a diabetes retinopathy dataset collected from the UCI machine learning repository. At its inceptions, the raw dataset is normalized using the Standardscalar technique and then Principal Component Analysis (PCA) is used to extract the most significant features in the dataset. Further, Firefly algorithm is implemented for dimensionality reduction. This reduced dataset is fed into a Deep Neural Network Model for classification. The results generated from the model is evaluated against the prevalent machine learning models and the results justify the superiority of the proposed model in terms of Accuracy, Precision, Recall, Sensitivity and Specificity.
Full-text available
Automated medical image analysis is an emerging field of research that identifies the disease with the help of imaging technology. Diabetic retinopathy (DR) is a retinal disease that is diagnosed in diabetic patients. Deep neural network (DNN) is widely used to classify diabetic retinopathy from fundus images collected from suspected persons. The proposed DR classification system achieves a symmetrically optimized solution through the combination of a Gaussian mixture model (GMM), visual geometry group network (VGGNet), singular value decomposition (SVD) and principle component analysis (PCA), and softmax, for region segmentation, high dimensional feature extraction, feature selection and fundus image classification, respectively. The experiments were performed using a standard KAGGLE dataset containing 35,126 images. The proposed VGG-19 DNN based DR model outperformed the AlexNet and spatial invariant feature transform (SIFT) in terms of classification accuracy and computational time. Utilization of PCA and SVD feature selection with fully connected (FC) layers demonstrated the classification accuracies of 92.21%, 98.34%, 97.96%, and 98.13% for FC7-PCA, FC7-SVD, FC8-PCA, and FC8-SVD, respectively.
Full-text available
Deep learning emerges as a powerful tool for analyzing medical images. Retinal disease detection by using computer-aided diagnosis from fundus image has emerged as a new method. We applied deep learning convolutional neural network by using MatConvNet for an automated detection of multiple retinal diseases with fundus photographs involved in STructured Analysis of the REtina (STARE) database. Dataset was built by expanding data on 10 categories, including normal retina and nine retinal diseases. The optimal outcomes were acquired by using a random forest transfer learning based on VGG-19 architecture. The classification results depended greatly on the number of categories. As the number of categories increased, the performance of deep learning models was diminished. When all 10 categories were included, we obtained results with an accuracy of 30.5%, relative classifier information (RCI) of 0.052, and Cohen’s kappa of 0.224. Considering three integrated normal, background diabetic retinopathy, and dry age-related macular degeneration, the multi-categorical classifier showed accuracy of 72.8%, 0.283 RCI, and 0.577 kappa. In addition, several ensemble classifiers enhanced the multi-categorical classification performance. The transfer learning incorporated with ensemble classifier of clustering and voting approach presented the best performance with accuracy of 36.7%, 0.053 RCI, and 0.225 kappa in the 10 retinal diseases classification problem. First, due to the small size of datasets, the deep learning techniques in this study were ineffective to be applied in clinics where numerous patients suffering from various types of retinal disorders visit for diagnosis and treatment. Second, we found that the transfer learning incorporated with ensemble classifiers can improve the classification performance in order to detect multi-categorical retinal diseases. Further studies should confirm the effectiveness of algorithms with large datasets obtained from hospitals.
Full-text available
Screening for vision threatening diabetic retinopathy by grading digital retinal images reduces the risk of blindness in people with diabetes. Computer-aided diagnosis can aid human graders to cope with this mounting problem. We propose to use a 10-layer convolutional neural network to automatically, simultaneously segment and discriminate exudates, haemorrhages and micro-aneurysms. Input image is normalized before segmentation. The net is trained in two stages to improve performance. On average, our net on 30,275,903 effective points achieved a sensitivity of 0.8758 and 0.7158 for exudates and dark lesions on the CLEOPATRA database. It also achieved a sensitivity of 0.6257 and 0.4606 for haemorrhages and micro-aneurysms. This study shows that it is possible to get a single convolutional neural network to segment these pathological features on a wide range of fundus images with reasonable accuracy.
Residual connections significantly boost the performance of deep neural networks. However, few theoretical results address the influence of residuals on the hypothesis complexity and the generalization ability of deep neural networks. This article studies the influence of residual connections on the hypothesis complexity of the neural network in terms of the covering number of its hypothesis space. We first present an upper bound of the covering number of networks with residual connections. This bound shares a similar structure with that of neural networks without residual connections. This result suggests that moving a weight matrix or nonlinear activation from the bone to a vine would not increase the hypothesis space. Afterward, an O(1/√N) margin-based multiclass generalization bound is obtained for ResNet, as an exemplary case of any deep neural network with residual connections. Generalization guarantees for similar state-of-the-art neural network architectures, such as DenseNet and ResNeXt, are straightforward. According to the obtained generalization bound, we should introduce regularization terms to control the magnitude of the norms of weight matrices not to increase too much, in practice, to ensure a good generalization ability, which justifies the technique of weight decay.
Introduction: Diabetic retinopathy (DR) is the most common diabetic eye disease worldwide and a leading cause of blindness. The number of diabetic patients will increase to 552 million by 2034, as per the International Diabetes Federation (IDF). Aim: With advances in computer science techniques, such as artificial intelligence (AI) and deep learning (DL), opportunities for the detection of DR at the early stages have increased. This increase means that the chances of recovery will increase and the possibility of vision loss in patients will be reduced in the future. Methods: In this paper, deep transfer learning models for medical DR detection were investigated. The DL models were trained and tested over the Asia Pacific Tele-Ophthalmology Society (APTOS) 2019 dataset. According to literature surveys, this research is considered one the first studies to use of the APTOS 2019 dataset, as it was freshly published in the second quarter of 2019. The selected deep transfer models in this research were AlexNet, Res-Net18, SqueezeNet, GoogleNet, VGG16, and VGG19. These models were selected, as they consist of a small number of layers when compared to larger models, such as DenseNet and InceptionResNet. Data augmentation techniques were used to render the models more robust and to overcome the overfitting problem. Results: The testing accuracy and performance metrics, such as the precision, recall, and F1 score, were calculated to prove the robustness of the selected models. The AlexNet model achieved the highest testing accuracy at 97.9%. In addition, the achieved performance metrics strengthened our achieved results. Moreover, AlexNet has a minimum number of layers, which decreases the training time and the computational complexity.
Conference Paper
Described for the first time by MacKenzie (1879), diabetic retinopathy (DR) and today is the most common cause of blindness among persons of working age in most countries of the world. Prevention of DR is the early detection of a violation of morphology and a deterioration in the light sensitivity of the retina associated with this disease. To do this, highly informative methods of non-invasive retinal research are needed, with predictive capabilities. In this article, we propose an autonomous algorithm for such diagnostics, based on the training of the Artificial Neural Network (ANN) and the preprocessing of the image by an anisotropic diffusion filter. It allows not only to detect pathologies moreover to provide them with probabilistic evaluation of a possible variant of the disease.
The effort devoted to hand-crafting image classifiers has motivated the use of architecture search to discover them automatically. Reinforcement learning and evolution have both shown promise for this purpose. This study introduces a regularized version of a popular asynchronous evolutionary algorithm. We rigorously compare it to the non-regularized form and to a highly-successful reinforcement learning baseline. Using the same hardware, compute effort and neural network training code, we conduct repeated experiments side-by-side, exploring different datasets, search spaces and scales. We show regularized evolution consistently produces models with similar or higher accuracy, across a variety of contexts without need for re-tuning parameters. In addition, regularized evolution exhibits considerably better performance than reinforcement learning at early search stages, suggesting it may be the better choice when fewer compute resources are available. This constitutes the first controlled comparison of the two search algorithms in this context. Finally, we present new architectures discovered with regularized evolution that we nickname AmoebaNets. These models set a new state of the art for CIFAR-10 (mean test error = 2.13%) and mobile-size ImageNet (top-5 accuracy = 92.1% with 5.06M parameters), and reach the current state of the art for ImageNet (top-5 accuracy = 96.2%).