Fig 2 - uploaded by Abdelaziz Abohamama
Content may be subject to copyright.
Source publication
Recently, deep neural networks (DNNs) have been used successfully in many fields, particularly, in medical diagnosis. However, deep learning (DL) models are expensive in terms of memory and computing resources, which hinders their implementation in limited-resources devices or for delay-sensitive systems. Therefore, these deep models need to be acc...
Contexts in source publication
Context 1
... L CE is the cross-entropy, / is a balancing hyper-parameter, y is the one-hot vector of ground truths, r is the soft-max function, Z s and Z T are the student and teacher models output logits, respectively while T is the temperature hyper-parameter ( . Fig. 2 shows the knowledge distillation process steps. The student model is employed to mimic the generalization ability of a teacher model. The teacher's network's class prediction probabilities are used as ''soft targets" for training the student model. Generally, the same training set is used for transferring knowledge, although a ...
Context 2
... general structure of the proposed framework is presented in Fig. 11. This framework is adapted from the Aneka framework (Hassan et al., 2022) which uses the FogBus as a software platform for developing integrated Fog-Cloud environments. It connects different IoT devices with gateway devices to send tasks and data to fog worker nodes and facilitates the development of distributed applications over clouds. ...
Context 3
... connects different IoT devices with gateway devices to send tasks and data to fog worker nodes and facilitates the development of distributed applications over clouds. It offers APIs to developers so they can use virtual resources in the cloud (Hassan et al., 2022). ...
Context 4
... proposed framework has a preprocessing module in Fig. 11. The general structure of the proposed architecture (Hassan et al., 2022). ...
Context 5
... deep learning model is developed for identifying the visual features of malaria lesions. As shown in Fig. 20, the microscopic blood images is used as the DL model input. Several operations are performed in the data preprocessing stage including image size normalization, data augmentation, and dataset partitioning. The proposed model is compressed using the knowledge distillation technique to be deployable on the resource-limited edge devices. ...
Context 6
... framework is presented in Fig. 22. The objective of the proposed model Is to correctly classify the microscopic blood images as parasitized or normal patients. The values of the model's parameters are shown in Table ...
Context 7
... main parameter values of the proposed DL model based on a COVID-19 dataset. 23 shows the student model architecture of the DL model based on a Malaria dataset. ...
Context 8
... steps of developing the architectures of both the teacher and student models are mentioned in Section 5.1.1. Based on the COVID-19 dataset, the best student model is obtained with T = 5 as shown in Fig. 24, several valued have been tried for the temperature T. However, the best result is obtained when T = 5. In the following subsections, the training process, teacher model, student model, and knowledge distillation process are evaluated with respect to the suitable ...
Context 9
... optimization algorithms have been used in training the proposed models namely, Adam, SGD and RMSprop. The settings of the used optimizers are shown in Table 4. Figs. 25-27 show the average values for accuracy and loss of these optimizer algorithms before and after applying the knowledge distillation technique in the proposed model. As shown in the different model's accuracy plots, an expanding line of training accuracy is gradually increasing. Also, it can be observed from the model's loss plots that ...
Context 10
... can be observed from the model's loss plots that both the lines representing training loss and testing loss are gradually decreasing. Also, the loss and accuracy of the distilled model are presented, respectively. For Adam optimizer, the best performance of the proposed model is after 100 epochs with 99.63% accuracy and 2.01% loss as presented in Fig. 25-a and Fig. 25-b. After applying the KD algorithm on the model, based on Fig. 25-c and Fig. 25-d, the accuracy and loss decrease to be 98.1% and 1.8%, ...
Context 11
... loss and testing loss are gradually decreasing. Also, the loss and accuracy of the distilled model are presented, respectively. For Adam optimizer, the best performance of the proposed model is after 100 epochs with 99.63% accuracy and 2.01% loss as presented in Fig. 25-a and Fig. 25-b. After applying the KD algorithm on the model, based on Fig. 25-c and Fig. 25-d, the accuracy and loss decrease to be 98.1% and 1.8%, ...
Context 12
... SGD optimizer, the best performance of the proposed model is also obtained after 100 epochs with 99.2% accuracy and 4.02% loss, as presented in Fig. 26-a and Fig. 26-b. After applying the KD algorithm on the model, based on Fig. 26-c and Fig. 26-d, the values of accuracy and loss are 95.6% and 5.08%, ...
Context 13
... SGD optimizer, the best performance of the proposed model is also obtained after 100 epochs with 99.2% accuracy and 4.02% loss, as presented in Fig. 26-a and Fig. 26-b. After applying the KD algorithm on the model, based on Fig. 26-c and Fig. 26-d, the values of accuracy and loss are 95.6% and 5.08%, ...
Context 14
... for RMSprop optimizer, the best performance of the model is also obtained after 100 epochs with 99.8% accuracy and 5.06% loss, as presented in Fig. 27-a and Fig. 27-b. After applying the KD algorithm on the model, based on Fig. 27-c and Fig. 27-d, the values of accuracy and loss are 98.9% and 6.08%, ...
Context 15
... for RMSprop optimizer, the best performance of the model is also obtained after 100 epochs with 99.8% accuracy and 5.06% loss, as presented in Fig. 27-a and Fig. 27-b. After applying the KD algorithm on the model, based on Fig. 27-c and Fig. 27-d, the values of accuracy and loss are 98.9% and 6.08%, ...
Context 16
... is clear from these results that the Rmsprop optimizer achieves the best accuracy and loss results with learning rate = 0.001 and Rho factor = 0.9. Fig. 28 presents the teacher-student model accuracy with different optimization algorithms; Adam, SGD and RMSprop, in order. The obtained experimental results reveal that the knowledge distillation approach can be efficiently used to accelerate and compress the model without significantly decreasing the model's ...
Context 17
... on the Malaria dataset, the best student model is obtained with T = 7. As shown in Fig. 29, several valued have been tried for the temperature T. However, the best result is obtained when T = 7. In the following subsections, the training process, teacher model, student model, and knowledge distillation process are evaluated with respect to the suitable ...
Context 18
... optimization algorithms have been used in training the proposed models namely, Adam, SGD and RMSprop. The settings of the used optimizers are shown in Table 4. Figs. 30-32 show the average values of accuracy and loss of these optimizer algorithms before and after applying the knowledge distillation technique in the proposed model. As shown in the different model's accuracy plots, an expanding line of training accuracy is gradually increasing. Also, it can be observed from the model's loss plots that both ...
Context 19
... 3.2% loss as presented in Fig. 30-a For SGD optimizer, the best performance of the proposed model is also obtained after 100 epochs with 99.45% accuracy and 4.86% loss, as presented in Fig. 31-a Finally, for RMSprop optimizer, the best performance of the model is also obtained after 100 epochs with 96.82% accuracy and 15.7% loss, as presented in Fig. 32-a ...
Similar publications
Citations
... This process of KD not only facilitates model compression but also enhances the generalization capabilities of the student model [10]. The success of KD is inherently tied to the quality and diversity of the datasets used during the training step, as well as the large applications of the KD learning-based processes [1,12,[14][15][16][17][18][19]. ...
... For example, Li et al. proposed a transferred attention method to improve the performance of convolutional neural networks [27], while Yazdanbakhsh et al. studied the application of knowledge distillation in specific domains such as healthcare [19]. However, despite these significant advances, little attention has been paid to the impact of data on this knowledge transfer process. ...
As the demand for efficient and lightweight models in image classification grows, knowledge distillation has emerged as a promising technique to transfer expertise from complex teacher models to simpler student models. However, the efficacy of knowledge distillation is intricately linked to the choice of datasets used during training. Datasets are pivotal in shaping a model’s learning process, influencing its ability to generalize and discriminate between diverse patterns. While considerable research has independently explored knowledge distillation and image classification, a comprehensive understanding of how different datasets impact knowledge distillation remains a critical gap. This study systematically investigates the impact of diverse datasets on knowledge distillation in image classification. By varying dataset characteristics such as size, domain specificity, and inherent biases, we aim to unravel the nuanced relationship between datasets and the efficacy of knowledge transfer. Our experiments employ a range of datasets to comprehensively explore their impact on the performance gains achieved through knowledge distillation. This study contributes valuable guidance for researchers and practitioners seeking to optimize image classification models through kno-featured applications. By elucidating the intricate interplay between dataset characteristics and knowledge distillation outcomes, our findings empower the community to make informed decisions when selecting datasets, ultimately advancing the field toward more robust and efficient model development.
... Given the comparatively lighter and shallower structure of the student model, its performance may occasionally lag behind that of its more complex counterparts. To address this challenge, we introduced the concept of Knowledge Distillation (KD) [59,60], which facilitates the transfer of valuable knowledge between models. KD operates on the principle of compressing heavyweight models into lightweight versions, often with a tradeoff in accuracy. ...
Accurate and timely diagnosis of pulmonary diseases is critical in the field of medical imaging. While deep learning models have shown promise in this regard, the current methods for developing such models often require extensive computing resources and complex procedures, rendering them impractical. This study focuses on the development of a lightweight deep-learning model for the detection of pulmonary diseases. Leveraging the benefits of knowledge distillation (KD) and the integration of the ConvMixer block, we propose a novel lightweight student model based on the MobileNet architecture. The methodology begins with training multiple teacher model candidates to identify the most suitable teacher model. Subsequently, KD is employed, utilizing the insights of this robust teacher model to enhance the performance of the student model. The objective is to reduce the student model's parameter size and computational complexity while preserving its diagnostic accuracy. We perform an in-depth analysis of our proposed model's performance compared to various well-established pre-trained student models, including MobileNetV2, ResNet50, InceptionV3, Xception, and NasNetMobile. Through extensive experimentation and evaluation across diverse datasets, including chest X-rays of different pulmonary diseases such as pneumonia, COVID-19, tuberculosis, and pneumothorax, we demonstrate the robustness and effectiveness of our proposed model in diagnosing various chest infections. Our model showcases superior performance, achieving an impressive classification accuracy of 97.92%. We emphasize the significant reduction in model complexity, with 0.63 million parameters, allowing for efficient inference and rapid prediction times, rendering it ideal for resource-constrained environments. Outperforming various pre-trained student models in terms of overall performance and computation cost, our findings underscore the effectiveness of the proposed KD strategy and the integration of the ConvMixer block. This highlights the importance of incorporating advanced techniques and innovative architectural elements in the development of highly effective models for medical image analysis.
... Similarly, the DeepEdgeSoc framework Al Koutayni et al. (2023) accelerates DL network design for energyefficient FPGA implementations, aligning with our resource efficiency goal. Moreover, approaches like resource-frugal quantized CNNs Nalepa et al. (2020) and knowledge distillation methods Alabbasy et al. (2023) resonate with our efforts to compress model size while maintaining performance. These studies highlight the importance of balancing computational demands with resource limitations, a core aspect of our research. ...
In this paper, we address the question of achieving high accuracy in deep learning models for agricultural applications through edge computing devices while considering the associated resource constraints. Traditional and state-of-the-art models have demonstrated good accuracy, but their practicality as end-user available solutions remains uncertain due to current resource limitations. One agricultural application for deep learning models is the detection and classification of plant diseases through image-based crop monitoring. We used the publicly available PlantVillage dataset containing images of healthy and diseased leaves for 14 crop species and 6 groups of diseases as example data. The MobileNetV3-small model succeeds in classifying the leaves with a test accuracy of around 99.50%. Post-training optimization using quantization reduced the number of model parameters from approximately 1.5 million to 0.93 million while maintaining the accuracy of 99.50%. The final model is in ONNX format, enabling deployment across various platforms, including mobile devices. These findings offer a cost-effective solution for deploying accurate deep-learning models in agricultural applications.
This study conducts a bibliometric analysis and systematic review to examine research trends in the application of knowledge distillation for medical image segmentation. A total of 806 studies from 343 distinct sources, published between 2019 and 2023, were analyzed using Publish or Perish and VOSviewer, with data retrieved from Scopus and Google Scholar. The findings indicate a rising trend in publications indexed in Scopus, whereas a decline was observed in Google Scholar. Citation analysis revealed that the United States and China emerged as the leading contributors in terms of both publication volume and citation impact. Previous research predominantly focused on optimizing knowledge distillation techniques and their implementation in resource-constrained devices. Keyword analysis demonstrated that medical image segmentation appeared most frequently with 144 occurrences, followed by medical imaging with 110 occurrences. This study highlights emerging research opportunities, particularly in leveraging knowledge distillation for U-Net architectures with large-scale datasets and integrating transformer models to enhance medical image segmentation performance
Accurately segmenting and staging tumor lesions in cancer patients presents a significant challenge for radiologists, but it is essential for devising effective treatment plans including radiation therapy, personalized medicine, and surgical options. The integration of artificial intelligence (AI), particularly deep learning (DL), has become a useful tool for radiologists, enhancing their ability to understand tumor biology and deliver personalized care to patients with H&N tumors. Segmenting H&N tumor lesions using Positron Emission Tomography/Computed Tomography (PET/CT) images has gained significant attention. However, the diverse shapes and sizes of tumors, along with indistinct boundaries between malignant and normal tissues, present significant challenges in effectively fusing PET and CT images. To overcome these challenges, various DL-based models have been developed for segmenting tumor lesions in PET/CT images. This article reviews multimodality (PET/CT) based H&N tumor lesions segmentation methods. We firstly discuss the strengths and limitations of PET/CT imaging and the importance of DL-based models in H&N tumor lesion segmentation. Second, we examine the current state-of-the-art DL models for H&N tumor segmentation, categorizing them into UNet, VNet, Vision Transformer, and miscellaneous models based on their architectures. Third, we explore the annotation and evaluation processes, addressing challenges in segmentation annotation and discussing the metrics used to assess model performance. Finally, we discuss several open challenges and provide some avenues for future research in H&N tumor lesion segmentation.