Fig 1 - uploaded by Abdelaziz Abohamama
Content may be subject to copyright.
The teacher-student architecture in the KD technique (Gou et al., 2021).

The teacher-student architecture in the KD technique (Gou et al., 2021).

Source publication
Article
Full-text available
Recently, deep neural networks (DNNs) have been used successfully in many fields, particularly, in medical diagnosis. However, deep learning (DL) models are expensive in terms of memory and computing resources, which hinders their implementation in limited-resources devices or for delay-sensitive systems. Therefore, these deep models need to be acc...

Contexts in source publication

Context 1
... distillation technique. Knowledge distillation refers to the process of learning a smallermodel from a larger one. In knowledge distillation, a teacher model (the lager one) typically supervises the student model (the smaller one). The main principle is that the students should emulate the teachers to achieve a superior performance. As seen in Fig. 1, a knowledge distillation system is made of: knowledge, teacherstudent architecture, and knowledge transfer method ( Gou et al., ...
Context 2
... some cases, data is available in multiple modalities. However, sometimes the labels or data from various modalities could be incorrect, damaged, or useless. Thus, knowledge transfer between modalities is essential. Applications like image captioning and visual question answering benefit from cross-modal distillation. Fig. 10 shows the cross-model distillation training scheme ( Gou et al., ...
Context 3
... knowledge distillation technique to reduce the computation and storage requirements of the used deep learning models. Hence, complex deep learning networks could be embedded on the network edge to provide fast decisions which is crucial for delay-sensitive healthcare applications. The general structure of the proposed framework is presented in Fig. 11. This framework is adapted from the Aneka framework (Hassan et al., 2022) which uses the FogBus as a software platform for developing integrated Fog-Cloud environments. It connects different IoT devices with gateway devices to send tasks and data to fog worker nodes and facilitates the development of distributed applications over ...
Context 4
... dataset, which is divided into infected and normal patients, is provided as the input to the proposed framework as shown in Fig. 12. The proposed framework has a preprocessing module in Fig. 11. The general structure of the proposed architecture (Hassan et al., 2022). ...
Context 5
... dataset, which is divided into infected and normal patients, is provided as the input to the proposed framework as shown in Fig. 12. The proposed framework has a preprocessing module in Fig. 11. The general structure of the proposed architecture (Hassan et al., 2022). ...
Context 6
... steps of knowledge distillation technique are shown in Fig. 13. First, the original (teacher) model is built and trained on the training set in the usual way. Then, its performance is evaluated. Then, the student model is built, and the distiller is initialized to distill the teacher knowledge to the student. Then, the distilled student model is trained, and its performance is evaluated. If the ...
Context 7
... proposed models are classification models with naturally exclusive class where each input belongs to only one class. Last layer of the model produces a logit value for each class. Logits are raw predictions of the model. Logits are converted to class probabilities using softmax activation function as shown in Fig. ...
Context 8
... student model has to generalize in the same way as the pretrained teacher model using soft targets. The detailed steps of distilling the original teacher model into the student model are presented in Fig. 15. Same training set is used for training the pretrained teacher and the student models. Teacher and student produce logits using inputs from dataset, logits from teacher and student are fed to softmax functions with same high T > 1. Raising Temperature makes logits smaller and smoother (called soft targets). The teacher's knowledge ...
Context 9
... is obvious that improving equations 3 or 7 can make the logits z s of the student match the z t of the teacher. Fig. 15 shows the knowledge distillation process which describes both the distillation and student losses. Note that the cross-entropy loss L CE y; p z s ; T ¼ 1 ð Þ ...
Context 10
... the second term, T is set to 1 for softmax computation of student model Fig. 15. The knowledge distillation ...
Context 11
... the model for detecting COVID-19 is shown in Fig. 16 and explained ...
Context 12
... used dataset is obtained from Kaggle website. It includes CXR images which are classified into two categories: COVID-19 patients and normal patients. It consists of 2623 images which are divided into training, validation, and testing sets with a ratio of 60:20:20, respectively. CXR samples for infected and normal patients are shown in Fig. ...
Context 13
... If the evaluation shows a good performance of the student model, the model is deployed on the worker node, otherwise, the student model architecture is modified by changing the T value. Then, the process of distillation is performed on the modified student model. The whole process is repeated until an acceptable student model is obtained. Fig. 19 shows the student model architecture of the DL model based on a COVID-19 ...
Context 14
... used dataset is obtained from Kaggle website. It consists of 27,558 microscopic blood images which are divided into parasitized and uninfected patients. The dataset is divided into training and testing with percentages of 75% and 25%, respectively. Fig. 21 shows random samples of Malaria ...
Context 15
... initialized to distill the teacher knowledge to the student model. Finally, the student is trained and evaluated. If the student model has a good performance, the model is deployed on the worker node, otherwise, the student model architecture is modified, and the process of distillation is repeated until an acceptable student model is obtained. Fig. 18. The architecture of deep learning model based on a COVID-19 dataset (the teacher ...
Context 16
... measures how the model is efficient in identifying the noninfected cases (TN). It is defined by Eq. ...
Context 17
... Adam optimizer, the best performance of the proposed model is after 100 epochs with 99.88% accuracy and 3.2% loss as presented in Fig. 30-a For SGD optimizer, the best performance of the proposed model is also obtained after 100 epochs with 99.45% accuracy and 4.86% loss, as presented in Fig. 31-a Finally, for RMSprop optimizer, the best performance of the model is also obtained after 100 epochs with 96.82% accuracy and 15.7% loss, as presented in Fig. 32-a ...

Citations

... This process of KD not only facilitates model compression but also enhances the generalization capabilities of the student model [10]. The success of KD is inherently tied to the quality and diversity of the datasets used during the training step, as well as the large applications of the KD learning-based processes [1,12,[14][15][16][17][18][19]. ...
... For example, Li et al. proposed a transferred attention method to improve the performance of convolutional neural networks [27], while Yazdanbakhsh et al. studied the application of knowledge distillation in specific domains such as healthcare [19]. However, despite these significant advances, little attention has been paid to the impact of data on this knowledge transfer process. ...
Article
Full-text available
As the demand for efficient and lightweight models in image classification grows, knowledge distillation has emerged as a promising technique to transfer expertise from complex teacher models to simpler student models. However, the efficacy of knowledge distillation is intricately linked to the choice of datasets used during training. Datasets are pivotal in shaping a model’s learning process, influencing its ability to generalize and discriminate between diverse patterns. While considerable research has independently explored knowledge distillation and image classification, a comprehensive understanding of how different datasets impact knowledge distillation remains a critical gap. This study systematically investigates the impact of diverse datasets on knowledge distillation in image classification. By varying dataset characteristics such as size, domain specificity, and inherent biases, we aim to unravel the nuanced relationship between datasets and the efficacy of knowledge transfer. Our experiments employ a range of datasets to comprehensively explore their impact on the performance gains achieved through knowledge distillation. This study contributes valuable guidance for researchers and practitioners seeking to optimize image classification models through kno-featured applications. By elucidating the intricate interplay between dataset characteristics and knowledge distillation outcomes, our findings empower the community to make informed decisions when selecting datasets, ultimately advancing the field toward more robust and efficient model development.
... Given the comparatively lighter and shallower structure of the student model, its performance may occasionally lag behind that of its more complex counterparts. To address this challenge, we introduced the concept of Knowledge Distillation (KD) [59,60], which facilitates the transfer of valuable knowledge between models. KD operates on the principle of compressing heavyweight models into lightweight versions, often with a tradeoff in accuracy. ...
Article
Full-text available
Accurate and timely diagnosis of pulmonary diseases is critical in the field of medical imaging. While deep learning models have shown promise in this regard, the current methods for developing such models often require extensive computing resources and complex procedures, rendering them impractical. This study focuses on the development of a lightweight deep-learning model for the detection of pulmonary diseases. Leveraging the benefits of knowledge distillation (KD) and the integration of the ConvMixer block, we propose a novel lightweight student model based on the MobileNet architecture. The methodology begins with training multiple teacher model candidates to identify the most suitable teacher model. Subsequently, KD is employed, utilizing the insights of this robust teacher model to enhance the performance of the student model. The objective is to reduce the student model's parameter size and computational complexity while preserving its diagnostic accuracy. We perform an in-depth analysis of our proposed model's performance compared to various well-established pre-trained student models, including MobileNetV2, ResNet50, InceptionV3, Xception, and NasNetMobile. Through extensive experimentation and evaluation across diverse datasets, including chest X-rays of different pulmonary diseases such as pneumonia, COVID-19, tuberculosis, and pneumothorax, we demonstrate the robustness and effectiveness of our proposed model in diagnosing various chest infections. Our model showcases superior performance, achieving an impressive classification accuracy of 97.92%. We emphasize the significant reduction in model complexity, with 0.63 million parameters, allowing for efficient inference and rapid prediction times, rendering it ideal for resource-constrained environments. Outperforming various pre-trained student models in terms of overall performance and computation cost, our findings underscore the effectiveness of the proposed KD strategy and the integration of the ConvMixer block. This highlights the importance of incorporating advanced techniques and innovative architectural elements in the development of highly effective models for medical image analysis.
... Similarly, the DeepEdgeSoc framework Al Koutayni et al. (2023) accelerates DL network design for energyefficient FPGA implementations, aligning with our resource efficiency goal. Moreover, approaches like resource-frugal quantized CNNs Nalepa et al. (2020) and knowledge distillation methods Alabbasy et al. (2023) resonate with our efforts to compress model size while maintaining performance. These studies highlight the importance of balancing computational demands with resource limitations, a core aspect of our research. ...
Article
Full-text available
In this paper, we address the question of achieving high accuracy in deep learning models for agricultural applications through edge computing devices while considering the associated resource constraints. Traditional and state-of-the-art models have demonstrated good accuracy, but their practicality as end-user available solutions remains uncertain due to current resource limitations. One agricultural application for deep learning models is the detection and classification of plant diseases through image-based crop monitoring. We used the publicly available PlantVillage dataset containing images of healthy and diseased leaves for 14 crop species and 6 groups of diseases as example data. The MobileNetV3-small model succeeds in classifying the leaves with a test accuracy of around 99.50%. Post-training optimization using quantization reduced the number of model parameters from approximately 1.5 million to 0.93 million while maintaining the accuracy of 99.50%. The final model is in ONNX format, enabling deployment across various platforms, including mobile devices. These findings offer a cost-effective solution for deploying accurate deep-learning models in agricultural applications.
Article
Full-text available
This study conducts a bibliometric analysis and systematic review to examine research trends in the application of knowledge distillation for medical image segmentation. A total of 806 studies from 343 distinct sources, published between 2019 and 2023, were analyzed using Publish or Perish and VOSviewer, with data retrieved from Scopus and Google Scholar. The findings indicate a rising trend in publications indexed in Scopus, whereas a decline was observed in Google Scholar. Citation analysis revealed that the United States and China emerged as the leading contributors in terms of both publication volume and citation impact. Previous research predominantly focused on optimizing knowledge distillation techniques and their implementation in resource-constrained devices. Keyword analysis demonstrated that medical image segmentation appeared most frequently with 144 occurrences, followed by medical imaging with 110 occurrences. This study highlights emerging research opportunities, particularly in leveraging knowledge distillation for U-Net architectures with large-scale datasets and integrating transformer models to enhance medical image segmentation performance
Article
Accurately segmenting and staging tumor lesions in cancer patients presents a significant challenge for radiologists, but it is essential for devising effective treatment plans including radiation therapy, personalized medicine, and surgical options. The integration of artificial intelligence (AI), particularly deep learning (DL), has become a useful tool for radiologists, enhancing their ability to understand tumor biology and deliver personalized care to patients with H&N tumors. Segmenting H&N tumor lesions using Positron Emission Tomography/Computed Tomography (PET/CT) images has gained significant attention. However, the diverse shapes and sizes of tumors, along with indistinct boundaries between malignant and normal tissues, present significant challenges in effectively fusing PET and CT images. To overcome these challenges, various DL-based models have been developed for segmenting tumor lesions in PET/CT images. This article reviews multimodality (PET/CT) based H&N tumor lesions segmentation methods. We firstly discuss the strengths and limitations of PET/CT imaging and the importance of DL-based models in H&N tumor lesion segmentation. Second, we examine the current state-of-the-art DL models for H&N tumor segmentation, categorizing them into UNet, VNet, Vision Transformer, and miscellaneous models based on their architectures. Third, we explore the annotation and evaluation processes, addressing challenges in segmentation annotation and discussing the metrics used to assess model performance. Finally, we discuss several open challenges and provide some avenues for future research in H&N tumor lesion segmentation.