Conference PaperPDF Available

A Novel SE-Xception Model for Robust Classification of Garbage

Authors:

Abstract and Figures

In this study, we propose a novel deep learning model, SE-Xception, which integrates Squeeze-and-Excitation (SE) blocks into the Xception architecture for solid waste classification. This model is designed to address the growing global challenge of waste accumulation by enhancing the classification accuracy of waste materials. Utilizing a publicly available dataset, known as the garbage dataset, SE-Xception demonstrates a substantial improvement in performance by amplifying the importance of key features, such as the correlation of feature maps and channel-wise attention. Our results validate the effectiveness of the SE-Xception model, achieving a high accuracy rate of 98.90%, grounded in experimental validation rather than theoretical assumptions. The model also exhibits superior performance metrics, including an accuracy of 0.9867, precision of 0.9879, recall of 0.9873, and an F1-score of 0.9871. These outcomes highlight the SE-Xception model's potential as a robust solution for waste classification problems and suggest its capacity to contribute to mitigating future climate challenges.
Content may be subject to copyright.
5th International Conference on Engineering and Applied
Natural Sciences
August 25-26, 2024 : Konya, Turkey
https://www.iceans.org/
© 2024 Published by All Sciences Academy
1065
A Novel SE-Xception Model for Robust Classification of Garbage
Ishak Pacal 1*
1Department of Computer Engineering/ Faculty of Engineering, Igdir University, TURKEY
*(ishak.pacal@igdir.edu.tr) Email of the corresponding author
Abstract In this study, we propose a novel deep learning model, SE-Xception, which integrates
Squeeze-and-Excitation (SE) blocks into the Xception architecture for solid waste classification. This
model is designed to address the growing global challenge of waste accumulation by enhancing the
classification accuracy of waste materials. Utilizing a publicly available dataset, known as the garbage
dataset, SE-Xception demonstrates a substantial improvement in performance by amplifying the
importance of key features, such as the correlation of feature maps and channel-wise attention. Our results
validate the effectiveness of the SE-Xception model, achieving a high accuracy rate of 98.90%, grounded
in experimental validation rather than theoretical assumptions. The model also exhibits superior
performance metrics, including an accuracy of 0.9867, precision of 0.9879, recall of 0.9873, and an F1-
score of 0.9871. These outcomes highlight the SE-Xception model's potential as a robust solution for
waste classification problems and suggest its capacity to contribute to mitigating future climate
challenges.
Keywords CNN, Deep Learning, Classification, Xception
I. INTRODUCTION
In recent years, waste generation has increased greatly worldwide due to population growth, rapid
urbanization, and changes in lifestyle. Approximately 2.01 billion tons of solid waste was produced
worldwide in 2016, and this amount is expected to reach 2.59 billion and 3.40 billion tons by 2030 and
2050, respectively [1]. This increase brings environmental problems and prevents sustainable growth [2].
Waste recognition methods offer solutions using automatic sorting devices that promote recycling by
classifying waste made of plastic, glass, or aluminum [3], [4], [5]. More than 33% of waste is not
managed in an environmentally friendly manner and is dumped illegally on the side of roads or in
abandoned lands [6]. This situation poses serious threats to water and air pollution, soil degradation, and
public health, and brings about the problem of illegal dumping [7], [8].
Waste classification is a very crucial part of waste management. The difficulty of the matter comes from
the fact that there are different standards in different regions. On top of that, the cost of manual
classification is too high for the companies [9], [10], [11], [12]. Machine learning and deep learning
techniques have been used to counteract the disadvantage. The technologies that have been created have
brought drastic changes to the waste management sector. The advanced technologies such as Internet of
Things (IoT), information sciences, and the advancement of machine learning (ML) have increased the
productivity of the waste estimation, collection, transportation, separation, and recycling processes [13],
[14]. One of the uses of deep learning is the efficient treatment of environmental concerns by clearing the
air from unnecessary pollution and employing proper waste management procedures [15]. The
appropriate application and further development of deep learning techniques over the last few years have
contributed to significant progress in waste processing and recycling. In addition to the arrival of novel
network constructions such as ResNet and EfficientNet, which are an essential component of an algorithm
1066
using classification through segmentation, this is the striking case that is indicative of a real turn to
automating not only recyclable but dangerous substances processes as well [16], [17]. These
technological breakthroughs are seen to be deep learning procedures aimed at environmental
sustainability, which allows for the exact classification of waste of different shapes, textures, and
recycling properties [3], [18]. Methods of deep learning have proved to be effective in recognizing and
categorizing the trash of households. Wang’s work has shown that the VGG16 model solves the problem
of identifying and classifying household garbage using deep learning [19]. Additionally, Sousa et al
highlighted that hierarchical deep learning approaches outperform traditional methods for waste detection
and classification in food trays [20]. Robots (for example, Lego EV3) can be used to automate all kinds of
tasks [21]. Othman et al. proposed an application which automatic garbage classification and collect with
Lego EV3 robot kit [3].
Contactless methods and technologies are starting to show their significance in response to a health
crisis such as COVID-19 that has led to the interruption of the most traditional waste collection. The deep
learning approach that is used in these technologies helps to reduce human resource use and supports the
environment by creating the waste classification and detection process more efficiently [22], [23]. In this
sphere, the application of YOLO and RCNN algorithms greatly benefits from the higher accuracy and
faster processing of camera data to identify and categorize waste materials [24]. In addressing
environmental issues, which are often intertwined with social and economic dimensions, technology plays
a dual role: it can both contribute to and mitigate these problems. However, innovations in technology
and the application of deep learning techniques in waste management have the potential to effectively
address environmental challenges while simultaneously boosting economic outcomes [1], [25], [26].
II. MATERIALS AND METHOD
A. Dataset
Datasets are the cornerstone for training deep learning models because these models need large data
sets to learn complex patterns and relationships [27]. The importance of the data sets directly affects the
accuracy, generalizability, and performance of the model. Diverse and comprehensive data sets enable
deep learning models to succeed in different scenarios. Additionally, large data sets allow the model to
perform deeper and more complex learning, while accurately labeled data sets help the model reduce
unwanted error rates. As a result, the selection of appropriate and comprehensive datasets is of great
importance for researchers and developers working in the field of deep learning, because these datasets
increase the chances of the model's success and strengthen artificial intelligence applications [28].
Fig. 1 Example images from Garbage Dataset
1067
This study focuses on the classification of solid waste. Garbage dataset [29] consists of 10 categories
containing solid waste [30]. These categories consist of materials such as metal, glass, biological as seen
in Fig.1.
Table 1. Images counts of garbage classes by split
Class
Training (70%)
Validation (15%)
Test (15%)
Battery
661
142
142
Biological
689
148
148
Cardboard
1639
351
351
Clothes
3727
799
799
Glass
2867
615
615
Metal
1309
280
280
Paper
1909
409
409
Plastic
1780
381
381
Shoes
1383
297
297
Trash
584
125
125
As presented in Table 1, the categories are divided into train 70%, validation 15% and test 15% to
obtain realistic and efficient results. The least data is in the trash category with 834 images, and the most
data is in the clothes category with 5325 images.
B. Deep Learning
Deep learning is a subdiscipline that has attracted great interest in the field of artificial intelligence in
recent years [31]. This technique is capable of learning from complex data sets using artificial neural
networks that mimic the way the human brain works. Deep learning techniques have shown great success,
especially in areas such as image recognition, natural language processing, and game strategy
development. Deep learning models are often fed large amounts of data and discover patterns and
relationships through this data on their own [32]. In this process, multilayer artificial neural networks
carry out the learning process by processing input data. During the training phase, training data is
presented to the network with thousands or millions of samples, and the network adjusts its parameters
through this data. The model learns to predict correct answers and can produce results with accuracy that
often exceeds human performance. Deep learning gives effective results, especially in large data sets and
complex problems. However, the use of these techniques requires high computational power and large
data sets [33]. Today, deep learning is accepted as an important tool in many fields, from medical
diagnostics to the automotive industry. Therefore, it is of great importance for those working in the field
of artificial intelligence to learn and apply deep learning techniques, as this technology has the potential
to revolutionize many fields in the future.
C. Xception Architecture
The main point of the way we tackle the Xception architecture [34] is the help it provides to the
efficient handling of image data through depthwise separable convolutions, that is already well known to
be very efficient. This model structure consists of 36 convolutional layers in total, which are grouped into
14 blocks. Every block employ depthwise and pointwise convolutions to process spatial and channel-wise
information individually thus resulting in a significant reduction in the model's computational load while
still getting the same quality. Its powerful feature in the architecture is that it can describe difficult
processes of image classification in a way to explain it by large and varied dataset’s ability.
Fig. 2 depicts the Xception architecture. Initially, the data passes through the entry flow, followed by
the middle flow, which is repeated eight times and concludes with the exit flow. It is important to note
that each Convolution and Separable Convolution layer is accompanied by batch normalization.
1068
Additionally, all Separable Convolution layers employ a depth multiplier of 1, meaning there is no depth
expansion.
Fig. 2 Xception architecture [34]
D. SE-Net
Squeeze-and-Excitation Network (SE-Net) was proposed by [35]. This network is designed to enhance
the performance of convolutional neural networks (CNNs). The primary goal of SE-Net is to dynamically
rescale the importance of feature maps to improve the representation power of the network. SE-Net
consists of three main steps: squeeze, excitation, and scaling. Each feature map undergoes global average
pooling in the squeeze step to compress and obtain summary information channel-wise. Next, these
summary statistics pass through two fully connected layers to compute scaling coefficients using sigmoid
activation. Finally, these coefficients are applied to the feature maps to emphasize important features and
suppress others. An innovative approach put forward by SE-Net is that it figures out the weights of
features adaptively and this makes the network grab the most effective and, of course, precise features.
E. Proposed Model (SE-Xception)
The issue of garbage classification or better to say it is a very important topic for managing waste
efficiently and protecting the environment. Image recognition plays an important role in the waste sorting
task. As a result, the demand for highly accurate and efficient image classification systems is very high.
So, we are presenting a new implementation of the Xception design, which we have called the SE-
Xception, and this has Squeeze-and-Excitation (SE) blocks, thus giving it the ability of dynamic feature
recalibration. In our SE-Xception design, we have embedded Squeeze-and-Excitation (SE) blocks as units
in the convolutional structure of the base model inspired by the impressive Xception architecture. These
changes are speculative as a key step to develop the advancement of long-term technology that could
strengthen the engineering of future different tasks [36].
Hu et al. [35] were the ones who developed the Squeeze-and-Excitation (SE) block in 2017, this is the
way to figure out how to fix the model in CNN adjustments, allowing it to adapt by perceiving the
reactions to features by channels. The convolution layer typically has the same weights for every channel
and the channels are treated equally by performing a 2D kernel over the channels. On the other hand, the
SE block calculates the weight of each channel on the basis of their contexts separately. It gets the
1069
average pooling result from each channel's spatial dimensions first which results in a vector of size n,
where n is the number of channels the input tensor has. Then, this vector is fed through a two-layer neural
network that has all the capabilities it needs to capture all unique interactions among the channels, which
results in the creation of another vector of size 'n' such as a per-channel-induced weight vector. Finally,
those weights are later applied in channel scaling, which is achieved via the SE block, this allows
highlighting the most important features while underemphasizing those less important; the effects are
given in the image, whose generated features are estimated using the Euler equations.
As a result, we propose a Model with the (SE-Xception) Variation in which SE blocks appear after both
dedicated and multipoint convolution layers, as Algorithm 1 describes. This approach is very important as
it enables to recalibrate feature maps right after their production by convolutional processes, so that the
system can better perceive subtle inter-channel relationships, and because of that the features can be
interpreted better.
Algorithm 1 details the implementation of the SE-Xception model, which integrates Squeeze-and-
Excitation (SE) blocks into the Xception framework. These SE blocks operate by applying global average
1070
pooling to the output of convolutional layers, generating a compressed channel descriptor. This descriptor
is processed through two fully connected layers: the first reduces dimensionality to focus on the most
important features, while the second restores the dimensions and uses a sigmoid activation to produce
adaptive weights. These weights adjust the original convolutional features, enhancing the most
informative ones and suppressing less relevant features. The SE-Xception model leverages the Xception
architecture’s depthwise and pointwise convolutions, paired with SE blocks, to optimize feature
recalibration. Each SE block begins with global average pooling, which condenses spatial information
into a scalar representing key global features for each channel. The model employs fully connected layers
to refine feature focus and apply sigmoid activations for recalibration, adjusting convolutional outputs to
prioritize relevant data. Batch normalization and ReLU activations are used to normalize inputs and
introduce nonlinearity, enhancing pattern recognition in complex datasets. Residual connections facilitate
deeper learning without degrading performance. This integration ensures SE-Xception is not only
effective in classifying diverse waste types but also computationally efficient, making it suitable for real-
world waste management applications.
III. RESULTS
A key concern in the systems optimization optimization of several antisocial learning models for the
process of landfill waste separation is taken here. We hold our amazing SE-Xception model up to many
other forefront concepts like Res2NeXt50, Xception, EfficientNet-Medium, ViT-Base, Swin-Base, and
DeiT-Base. This evaluation's metrics include accuracy, precision, recall, and the F1-score. These
measures give an inclusive assessment of every model's capability of detecting waste and measuring the
effectiveness of the methods by picturing both strengths and drawbacks. The outcomes emphasize the
success of the SE-Xception transportation system proposed by our group which has absolutely surpassed
all the results.
Table 2 demonstrates the performance of some models for task categorisation on waste, as seen in the
magnification of the table. All models are determined among accuracy, precision, recall, and F1-score.
The proposed SE-Xception model earns the greatest metrics across all categories, demonstrating that it is
superior at trash categorization.
Table 2. Model’s performances
Model
Precision
Recall
F1-score
Res2NeXt50 [37]
0.9772
0.9737
0.9753
Xception [34]
0.9774
0.9724
0.9746
EfficientNet-Medium [17]
0.9841
0.9880
0.9860
ViT-Base [38]
0.9684
0.9687
0.9684
Swin-Base [39]
0.9820
0.9825
0.9822
DeiT-Base [40]
0.9802
0.9750
0.9774
Proposed Model
0.9867
0.9879
0.9873
The model Res2NeXt50 demonstrates an amazing accuracy of 0.9780. The precision, recall, and F1-
score of it are 0.9772, 0.9737, and 0.9753, respectively. This is a significant interaction; listeners are
given an overall strong impression by all three descriptive tools. Nonetheless, the results it produces are
not quite at par with the rest of the architectures. Enhancement could be the key, particularly the way it
extracts features and classifies them. The Xception model, which is the basis of model, has a slightly
better accuracy of 0.9803. However, its precision and recall scores of 0.9774 and 0.9724 show some
classification problems of the model, too. This points to Xception being a good model but that it may be
enhanced by such methods as including SE blocks. EfficientNet-Medium with the best accuracy of
0.9870, a precision of 0.9841, and a recall of 0.9880, resulting in a respective F1-score of 0.9860, swiftly
stood out. This model shows extremely good execution of space-saving and accurate calculations while
having very strong feature extraction and reliable classification. Outstanding indeed is its very high recall
that indicates the model is capable of reliably classifying true positives which in turn make it the preferred
option when it comes to garbage classification tasks.
1071
One of the Vision Transformer models, ViT-Base, yields an accuracy value of 0.9735. Even if its
accuracy value is almost as high as the one of EfficientNet-Medium, the precision and recall metrics of
0.9684 around suggest that it inconsistently recognizes all classes correctly. The Vision Transformer
mechanism is robust, but it may require some degree of further enhancement to effectively deal with
specific garbage classification problems. Swin-Base, another model based on transformer architecture,
shows robust performance with an accuracy of 0.9850. Its precision, recall, and F1-score are all above
0.982, demonstrating high consistency and reliability in its classifications. The architecture of Swin-Base,
designed to manage hierarchical features, proves to be very effective in this context. The DeiT-Base
model achieves a strong accuracy of 0.9817, with precision and recall values of 0.9802 and 0.9750,
respectively. Its F1-score of 0.9774 indicates that it performs well in the garbage classification task but is
surpassed by more advanced models. While DeiT-Base performs commendably, there is room for
improvement, particularly with more sophisticated feature recalibration techniques.
Fig. 3 Accuracy and F1-score of models
Our proposed SE-Xception model outperforms every other model with a maximum accuracy of 0.9890.
Precision, recall, and F1-score values of 0.9867, 0.9879, and 0.9873 have been achieved, respectively.
Hence, the accuracy of our model is remarkable, and it can very well identify and classify the different
categories of trash (Fig. 3). SE blocks have been fused together with the Xception architecture which in
turn has allowed the model to adjust the importance of various features dynamically and thus improving
the overall performance a lot. The findings are very clear showing that it is really a SE-Xception model
our group has proposed, which is to be the best-qualified one in the garbage classification, as we have
already obtained the highest metrics in all evaluated categories. This excellent result is strong evidence of
the fact that employing SE blocks in the Xception framework is a very important step to increase its
efficiency in getting and sorting out mixed field images of the garbage dataset. The minute-by-minute
performance of Xception and SE blocks is the major reason e.g. for the high success of the models. This
is because the combination of Xception and SE blocks has the main contribution to the capturing of
derails, the sorting of the images, which are the cell phone charger, and the coke can.
1072
Fig. 4 Confusion matrix of proposed model
The confusion matrix of Fig. 4 illustrates high positive results of accuracy showing that the model is
good at the close prediction of the positive classes.
IV. DISCUSSION
The aim of this research was to propose a novel model, SE-Xception, designed to classify waste
effectively by integrating the Squeeze-and-Excitation (SE) blocks into the Xception architecture. Our
primary research question focused on comparing SE-Xception with several state-of-the-art deep learning
models, including Res2NeXt50, Xception, EfficientNet-Medium, ViT-Base, Swin-Base, and DeiT-Base.
We evaluated these models using four traditional performance indicators: accuracy, precision, recall, and
F1-score, which collectively provide a comprehensive assessment of the waste sorting models'
effectiveness. Our findings indicate that SE-Xception outperformed the other models across nearly all test
metrics. Notably, it achieved an accuracy of 98.90%, with precision, recall, and F1-score values of
98.67%, 98.79%, and 98.73%, respectively. These results underscore the effectiveness of integrating SE
blocks within the Xception framework, enabling the model to learn features more accurately and improve
classification performance.
The comparison highlighted several strengths of SE-Xception. It demonstrated high reliability in
correctly identifying and sorting various types of solid waste, largely due to the dynamic feature
recalibration capabilities introduced by the SE blocks. This enhancement allowed SE-Xception to
outperform not only the Xception model alone but also newer architectures such as Swin-Base and DeiT-
Base. Furthermore, the integration of SE blocks with the Xception model has proven to be an effective
solution to the complex and variable challenges inherent in waste classification, enhancing cognitive
recognition within the neural network. These findings suggest that SE-Xception has significant potential
to contribute to environmental sustainability by improving waste sorting accuracy, thereby reducing the
burden of waste and promoting more effective waste management practices.
1073
V. CONCLUSION
In this study, we introduced the SE-Xception model, a newly developed deep learning model
specifically designed for the detection and classification of waste. Among various models evaluated, SE-
Xception has been demonstrated to be the most effective, achieving superior precision, recall, F1-score,
and accuracy through a robust and error-free process. This performance improvement is largely attributed
to the SE-Net's adaptive feature recalibration, which is seamlessly integrated into the Xception
architecture. The application of SE-Xception presents a significant advancement in waste classification,
which, in turn, offers substantial environmental benefits. Future research could explore several directions
to enhance the SE-Xception model further. These include refining the model to reduce the error dynamics
and implementing hyperparameter optimization and data augmentation strategies. Additionally, SE-
Xception has potential applications in waste management and environmental monitoring, where it could
play a crucial role in preventive management. With its strengths derived from the combination of SE-Net
and Xception, our proposed model contributes to the ongoing development of deep learning
methodologies aimed at effectively tackling real-world environmental challenges.
REFERENCES
[1] S. Shahab, M. Anjum, and M. S. Umar, ‘Deep Learning Applications in Solid Waste Management: A Deep Literature
Review’, International Journal of Advanced Computer Science and Applications, vol. 13, no. 3, pp. 381395, 2022, doi:
10.14569/IJACSA.2022.0130347.
[2] A. S. Nagpure, ‘Assessment of quantity and composition of illegal dumped municipal solid waste (MSW) in Delhi’,
Resour Conserv Recycl, vol. 141, pp. 5460, Feb. 2019, doi: 10.1016/j.resconrec.2018.10.012.
[3] Z. Othman et al., ‘Comparison on Cloud Image Classification for Thrash Collecting LEGO Mindstorms EV3 Robot’,
International Journal of Human and Technology Interaction (IJHaTI), vol. 2, no. 1, pp. 2933, 2018.
[4] G. Mittal, K. B. Yagnik, M. Garg, and N. C. Krishnan, ‘SpotGarbage: Smartphone app to detect garbage using deep
learning’, in UbiComp 2016 - Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous
Computing, Association for Computing Machinery, Inc, Sep. 2016, pp. 940945. doi: 10.1145/2971648.2971731.
[5] T. Gupta et al., ‘A deep learning approach based hardware solution to categorise garbage in environment’, Complex and
Intelligent Systems, vol. 8, no. 2, pp. 11291152, Apr. 2022, doi: 10.1007/s40747-021-00529-0.
[6] C. Magazzino, M. Mele, and N. Schneider, ‘The relationship between municipal solid waste and greenhouse gas
emissions: Evidence from Switzerland’, Waste Management, vol. 113, pp. 508–520, Jul. 2020, doi:
10.1016/j.wasman.2020.05.033.
[7] M. Triassi, R. Alfano, M. Illario, A. Nardone, O. Caporale, and P. Montuori, ‘Environmental Pollution from Illegal Waste
Disposal and Health Effects: A Review on the “Triangle of Death”’, Int J Environ Res Public Health, vol. 12, no. 2, pp.
12161236, Jan. 2015, doi: 10.3390/ijerph120201216.
[8] D. H. F. da Paz, K. P. V. Lafayette, M. J. de O. Holanda, M. do C. M. Sobral, and L. A. R. de C. Costa, ‘Assessment of
environmental impact risks arising from the illegal dumping of construction waste in Brazil’, Environ Dev Sustain, vol.
22, no. 3, pp. 22892304, Mar. 2020, doi: 10.1007/s10668-018-0289-6.
[9] X. Shen, Y. Wu, S. Chen, and X. Luo, ‘An Intelligent Garbage Sorting System Based on Edge Computing and Visual
Understanding of Social Internet of Vehicles’, Mobile Information Systems, vol. 2021, pp. 112, Aug. 2021, doi:
10.1155/2021/5231092.
[10] G. Yang et al., ‘Garbage Classification System with YOLOV5 Based on Image Recognition’, in 2021 IEEE 6th
International Conference on Signal and Image Processing (ICSIP), IEEE, Oct. 2021, pp. 1118. doi:
10.1109/ICSIP52628.2021.9688725.
[11] F. Liu, H. Xu, M. Qi, D. Liu, J. Wang, and J. Kong, ‘Depth-Wise Separable Convolution Attention Module for Garbage
Image Classification’, Sustainability, vol. 14, no. 5, p. 3099, Mar. 2022, doi: 10.3390/su14053099.
[12] Z. Lv, H. Li, and Y. Liu, ‘Garbage detection and classification method based on YoloV5 algorithm’, in Fourteenth
International Conference on Machine Vision (ICMV 2021), W. Osten, D. Nikolaev, and J. Zhou, Eds., SPIE, Mar. 2022,
p. 2. doi: 10.1117/12.2622439.
[13] P. Nowakowski and T. Pamuła, ‘Application of deep learning object classifier to improve e-waste collection planning’,
Waste Management, vol. 109, pp. 19, May 2020, doi: 10.1016/j.wasman.2020.04.041.
[14] Md. W. Rahman, R. Islam, A. Hasan, N. I. Bithi, Md. M. Hasan, and M. M. Rahman, ‘Intelligent waste management
system using deep learning with IoT’, Journal of King Saud University - Computer and Information Sciences, vol. 34, no.
5, pp. 20722087, May 2022, doi: 10.1016/j.jksuci.2020.08.016.
1074
[15] Z. Ye, J. Yang, N. Zhong, X. Tu, J. Jia, and J. Wang, ‘Tackling environmental challenges in pollution controls using
artificial intelligence: A review’, Science of The Total Environment, vol. 699, p. 134279, Jan. 2020, doi:
10.1016/j.scitotenv.2019.134279.
[16] S. Ioffe and C. Szegedy, ‘Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate
Shift’, Feb. 2015.
[17] M. Tan and Q. V Le, ‘EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks’, in International
conference on machine learning, 2019, pp. 61056114.
[18] M. Zeng, X. Lu, W. Xu, T. Zhou, and Y. Liu, ‘PublicGarbageNet : A Deep Learning Framework for Public Garbage
Classification’, in 2020 39th Chinese Control Conference (CCC), IEEE, Jul. 2020, pp. 72007205. doi:
10.23919/CCC50068.2020.9189561.
[19] H. Wang, ‘Garbage Recognition and Classification System Based on Convolutional Neural Network VGG16’, in 2020
3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE),
IEEE, Apr. 2020, pp. 252255. doi: 10.1109/AEMCSE50948.2020.00061.
[20] J. Sousa, A. Rebelo, and J. S. Cardoso, ‘Automation of Waste Sorting with Deep Learning’, in 2019 XV Workshop de
Visão Computacional (WVC), IEEE, Sep. 2019, pp. 4348. doi: 10.1109/WVC.2019.8876924.
[21] İ. Kunduracioğlu, ‘Examining the Interface of Lego Mindstorms Ev3 Robot Programming’, 2018. doi:
10.31681/jetol.372826.
[22] S. Kumar, D. Yadav, H. Gupta, O. P. Verma, I. A. Ansari, and C. W. Ahn, ‘A Novel YOLOv3 Algorithm-Based Deep
Learning Approach for Waste Segregation: Towards Smart Waste Management’, Electronics (Basel), vol. 10, no. 1, p. 14,
Dec. 2020, doi: 10.3390/electronics10010014.
[23] H. Deng, D. Ergu, F. Liu, B. Ma, and Y. Cai, ‘An Embeddable Algorithm for Automatic Garbage Detection Based on
Complex Marine Environment’, Sensors, vol. 21, no. 19, p. 6391, Sep. 2021, doi: 10.3390/s21196391.
[24] J. Peng et al., ‘TPM: Multiple object tracking with tracklet-plane matching’, Pattern Recognit, vol. 107, p. 107480, Nov.
2020, doi: 10.1016/j.patcog.2020.107480.
[25] P. Aela, L. Zong, M. Esmaeili, M. Siahkouhi, and G. Jing, ‘Angle of repose in the numerical modeling of ballast particles
focusing on particle-dependent specifications: Parametric study’, Particuology, vol. 65, pp. 3950, Jun. 2022, doi:
10.1016/j.partic.2021.06.006.
[26] Y. Tong, J. Liu, and S. Liu, ‘China is implementing “Garbage Classification” action’, Environmental Pollution, vol. 259,
p. 113707, Apr. 2020, doi: 10.1016/j.envpol.2019.113707.
[27] I. Kunduracioglu and I. Pacal, ‘Advancements in deep learning for accurate classification of grape leaves and diagnosis of
grape diseases’, Journal of Plant Diseases and Protection, Jun. 2024, doi: 10.1007/s41348-024-00896-z.
[28] I. Pacal, “MaxCerVixT: A novel lightweight vision transformer-based Approach for precise cervical cancer detection,”
Knowl Based Syst, vol. 289, Apr. 2024, doi: 10.1016/j.knosys.2024.111482.
[29] Suman Kunwar, ‘Garbage Dataset’, https://www.kaggle.com/datasets/sumn2u/garbage-classification-v2.
[30] S. Kunwar, ‘Managing Household Waste through Transfer Learning’, Jan. 2024.
[31] I. Pacal, O. Celik, B. Bayram, and A. Cunha, ‘Enhancing EfficientNetv2 with global and efficient channel attention
mechanisms for accurate MRI-Based brain tumor classification’, Cluster Comput, May 2024, doi: 10.1007/s10586-024-
04532-1.
[32] I. Pacal, M. Alaftekin, and F. D. Zengul, ‘Enhancing Skin Cancer Diagnosis Using Swin Transformer with Hybrid
Shifted Window-Based Multi-head Self-attention and SwiGLU-Based MLP’, Journal of Imaging Informatics in
Medicine, Jun. 2024, doi: 10.1007/s10278-024-01140-8.
[33] I. Pacal, “A novel Swin transformer approach utilizing residual multi-layer perceptron for diagnosing brain tumors in
MRI images,” International Journal of Machine Learning and Cybernetics, Mar. 2024, doi: 10.1007/s13042-024-02110-
w.
[34] F. Chollet, ‘Xception: Deep Learning with Depthwise Separable Convolutions’, in Proceedings of the IEEE conference
on computer vision and pattern recognition, Oct. 2017, pp. 12511258.
[35] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, ‘Squeeze-and-Excitation Networks’, in Proceedings of the IEEE
conference on computer vision and pattern recognition, Sep. 2018, pp. 71327141.
[36] İ. Paçal and İ. Kunduracıoğlu, ‘Data-Efficient Vision Transformer Models for Robust Classification of Sugarcane’,
Journal of Soft Computing and Decision Analytics, vol. 2, no. 1, pp. 258271, Jun. 2024, doi: 10.31181/jscda21202446.
[37] S.-H. Gao, M.-M. Cheng, K. Zhao, X.-Y. Zhang, M.-H. Yang, and P. Torr, ‘Res2Net: A New Multi-scale Backbone
Architecture’, Apr. 2019, doi: 10.1109/TPAMI.2019.2938758.
[38] A. Dosovitskiy et al., ‘An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale’, Oct. 2020.
[39] Z. Liu et al., ‘Swin Transformer: Hierarchical Vision Transformer using Shifted Windows’, Mar. 2021.
[40] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, ‘Training data-efficient image transformers &
distillation through attention’, in International conference on machine learning, 2021, pp. 1034710357.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Sugar cane is an important agricultural product that provides 75% of the world's sugar production. As with all plant species, any disease affecting sugarcane can significantly impact yields and planning. Diagnosing diseases in sugarcane leaves using traditional methods is slow, inefficient, and often lacking in accuracy. This study presents a deep learning-based approach for accurate diagnosis of diseases in sugarcane leaves. Specifically, training and evaluation were conducted on the publicly available Sugarcane Leaf Dataset using leading ViT (Vision Transformer) architectures such as DeiT3-Small and DeiT-Tiny. This dataset includes 11 different disease classes and a total of 6748 images. Additionally, these models were compared with popular CNN models. The findings of the study show that there is no direct relationship between model complexity, depth, and accuracy for the 11-class sugarcane dataset. Among the 12 models tested, the DeiT3-Small model showed the highest performance with 93.79% accuracy, 91.27% precision, and 90.96% F1-score. These results highlight that rapid, accurate, and automatic disease diagnosis systems developed using deep learning techniques can significantly improve sugarcane disease management and contribute to increased yields.
Article
Full-text available
Skin cancer is one of the most frequently occurring cancers worldwide, and early detection is crucial for effective treatment. Dermatologists often face challenges such as heavy data demands, potential human errors, and strict time limits, which can negatively affect diagnostic outcomes. Deep learning–based diagnostic systems offer quick, accurate testing and enhanced research capabilities, providing significant support to dermatologists. In this study, we enhanced the Swin Transformer architecture by implementing the hybrid shifted window-based multi-head self-attention (HSW-MSA) in place of the conventional shifted window-based multi-head self-attention (SW-MSA). This adjustment enables the model to more efficiently process areas of skin cancer overlap, capture finer details, and manage long-range dependencies, while maintaining memory usage and computational efficiency during training. Additionally, the study replaces the standard multi-layer perceptron (MLP) in the Swin Transformer with a SwiGLU-based MLP, an upgraded version of the gated linear unit (GLU) module, to achieve higher accuracy, faster training speeds, and better parameter efficiency. The modified Swin model-base was evaluated using the publicly accessible ISIC 2019 skin dataset with eight classes and was compared against popular convolutional neural networks (CNNs) and cutting-edge vision transformer (ViT) models. In an exhaustive assessment on the unseen test dataset, the proposed Swin-Base model demonstrated exceptional performance, achieving an accuracy of 89.36%, a recall of 85.13%, a precision of 88.22%, and an F1-score of 86.65%, surpassing all previously reported research and deep learning models documented in the literature.
Article
Full-text available
The early and accurate diagnosis of brain tumors is critical for effective treatment planning, with Magnetic Resonance Imaging (MRI) serving as a key tool in the non-invasive examination of such conditions. Despite the advancements in Computer-Aided Diagnosis (CADx) systems powered by deep learning, the challenge of accurately classifying brain tumors from MRI scans persists due to the high variability of tumor appearances and the subtlety of early-stage manifestations. This work introduces a novel adaptation of the EfficientNetv2 architecture, enhanced with Global Attention Mechanism (GAM) and Efficient Channel Attention (ECA), aimed at overcoming these hurdles. This enhancement not only amplifies the model’s ability to focus on salient features within complex MRI images but also significantly improves the classification accuracy of brain tumors. Our approach distinguishes itself by meticulously integrating attention mechanisms that systematically enhance feature extraction, thereby achieving superior performance in detecting a broad spectrum of brain tumors. Demonstrated through extensive experiments on a large public dataset, our model achieves an exceptional high-test accuracy of 99.76%, setting a new benchmark in MRI-based brain tumor classification. Moreover, the incorporation of Grad-CAM visualization techniques sheds light on the model’s decision-making process, offering transparent and interpretable insights that are invaluable for clinical assessment. By addressing the limitations inherent in previous models, this study not only advances the field of medical imaging analysis but also highlights the pivotal role of attention mechanisms in enhancing the interpretability and accuracy of deep learning models for brain tumor diagnosis. This research sets the stage for advanced CADx systems, enhancing patient care and treatment outcomes.
Article
Full-text available
Plant diseases cause significant agricultural losses, demanding accurate detection methods. Traditional approaches relying on expert knowledge may be biased, but advancements in computing, particularly deep learning, offer non-experts effective tools. This study focuses on fine-tuning cutting-edge pre-trained CNN and vision transformer models to classify grape leaves and diagnose grape leaf diseases through digital images. Our research examined a PlantVillage dataset, which comprises 4062 leaf images distributed across four categories. Additionally, we utilized the Grapevine dataset, consisting of 500 leaf images. This dataset is organized into five distinct groups, with each group containing 100 images corresponding to one of the five grape types. The PlantVillage dataset focuses on four classes related to grape diseases, namely Black Rot, Leaf Blight, Healthy, and Esca leaves. On the other hand, the Grapevine dataset includes five classes for leaf recognition, specifically Ak, Alaidris, Buzgulu, Dimnit, and Nazli. In experiments with 14 CNN and 17 vision transformer models, deep learning demonstrated high accuracy in distinguishing grape diseases and recognizing leaves. Notably, four models achieved 100% accuracy on PlantVillage and Grapevine datasets, with Swinv2-Base standing out. This approach holds promise for enhancing crop productivity through early disease detection and providing insights into grape variety characterization in agriculture.
Article
Full-text available
As the world continues to face the challenges of climate change, it is crucial to consider the environmental impact of the technologies we use. In this study, we investigate the performance and computational carbon emissions of various transfer learning models for garbage classification. We examine the MobileNet, ResNet50, ResNet101, and EfficientNetV2S and EfficientNetV2M models. Our findings indicate that the EfficientNetV2 family achieves the highest accuracy, recall, f1-score, and IoU values. However, the EfficientNetV2M model requires more time and produces higher carbon emissions. ResNet50 outperforms ResNet110 in terms of accuracy, recall, f1-score, and IoU, but it has a larger carbon footprint. We conclude that EfficientNetV2S is the most sustainable and accurate model with 96.41% accuracy. Our research highlights the significance of considering the ecological impact of machine learning models in garbage classification.
Article
Full-text available
Serious consequences due to brain tumors necessitate a timely and accurate diagnosis. However, obstacles such as suboptimal imaging quality, issues with data integrity, varying tumor types and stages, and potential errors in interpretation hinder the achievement of precise and prompt diagnoses. The rapid identification of brain tumors plays a pivotal role in ensuring patient safety. Deep learning-based systems hold promise in aiding radiologists to make diagnoses swiftly and accurately. In this study, we present an advanced deep learning approach based on the Swin Transformer. The proposed method introduces a novel Hybrid Shifted Windows Multi-Head Self-Attention module (HSW-MSA) along with a rescaled model. This enhancement aims to improve classification accuracy, reduce memory usage, and simplify training complexity. The Residual-based MLP (ResMLP) replaces the traditional MLP in the Swin Transformer, thereby improving accuracy, training speed, and parameter efficiency. We evaluate the Proposed-Swin model on a publicly available brain MRI dataset with four classes, using only test data. Model performance is enhanced through the application of transfer learning and data augmentation techniques for efficient and robust training. The Proposed-Swin model achieves a remarkable accuracy of 99.92%, surpassing previous research and deep learning models. This underscores the effectiveness of the Swin Transformer with HSW-MSA and ResMLP improvements in brain tumor diagnosis. This method introduces an innovative diagnostic approach using HSW-MSA and ResMLP in the Swin Transformer, offering potential support to radiologists in timely and accurate brain tumor diagnosis, ultimately improving patient outcomes and reducing risks.
Article
Full-text available
Solid waste management (SWM) has recently received more attention, especially in developing countries, for smart and sustainable development. SWM system encompasses various interconnected processes which contain numerous complex operations. Recently, deep learning (DL) has attained momentum in providing alternative computational techniques to determine the solution of various SWM problems. Researchers have focused on this domain; therefore, significant research has been published, especially in the last decade. The literature shows that no study evaluates the potential of DL to solve the various SWM problems. The study performs a systematic literature review (SLR) which has complied 40 studies published between 2019 and 2021 in reputed journals and conferences. The selected research studies have implemented the various DL models and analyzed the application of DL in different SWM areas, namely waste identification and segregation and prediction of waste generation. The study has defined the systematic review protocol that comprises various criteria and a quality assessment process to select the research studies for review. The review demonstrates the comprehensive analysis of different DL models and techniques implemented in SWM. It also highlights the application domains and compares the reported performance of selected studies. Based on the reviewed work, it can be concluded that DL exhibits the plausible performance to detect and classify the different types of waste. The study also explains the deep convolutional neural network with the computational requirement and determines the research gaps with future recommendations.
Article
Full-text available
Currently, how to deal with the massive garbage produced by various human activities is a hot topic all around the world. In this paper, a preliminary and essential step is to classify the garbage into different categories. However, the mainstream waste classification mode relies heavily on manual work, which consumes a lot of labor and is very inefficient. With the rapid development of deep learning, convolutional neural networks (CNN) have been successfully applied to various application fields. Therefore, some researchers have directly adopted CNNs to classify garbage through their images. However, compared with other images, the garbage images have their own characteristics (such as inter-class similarity, intra-class variance and complex background). Thus, neglecting these characteristics would impair the classification accuracy of CNN. To overcome the limitations of existing garbage image classification methods, a Depth-wise Separable Convolution Attention Module (DSCAM) is proposed in this paper. In DSCAM, the inherent relationships of channels and spatial positions in garbage image features are captured by two attention modules with depth-wise separable convolutions, so that our method could only focus on important information and ignore the interference. Moreover, we also adopt a residual network as the backbone of DSCAM to enhance its discriminative ability. We conduct the experiments on five garbage datasets. The experimental results demonstrate that the proposed method could effectively classify the garbage images and that it outperforms some classical methods.