ThesisPDF Available

Deep Learning feature Extraction for Image Processing

Authors:

Abstract and Figures

In this thesis, we propose to use methodologies that automatically learn how to extract relevant features from images. We are especially interested in evaluating how these features compare against handcrafted features. More precisely, we are interested in the unsupervised training that is used for the Restricted Boltzmann Machine (RBM) and Convolutional RBM (CRBM) models. These models relaunched the Deep Learning interest of the last decade. During the time of this thesis, the auto-encoders approach, especially Convolutional Auto-Encoders (CAE) have been used more and more. Therefore, one objective of this thesis is also to compare the CRBM approach with the CAE approach. The scope of this work is defined by several machine learning tasks. The first one, handwritten digit recognition, is analysed to see how much the unsupervised pretraining technique introduced with the Deep Belief Network (DBN) model improves the training of neural networks. The second, detection and recognition of Sudoku in images, is evaluating the efficiency of DBN and Convolutional DBN (CDBN) models for classification of images of poor quality. Finally, features are learned fully unsupervised from images for a keyword spotting task and are compared against well-known handcrafted features. Moreover, the thesis was also oriented around a software engineering axis. Indeed, a complete machine learning framework was developed during this thesis to explore possible optimizations and possible algorithms in order to train the tested models as fast as possible.
Content may be subject to copyright.
A preview of the PDF is not available
... Previous research studies [9,10] have utilized CNN architectures to classify brain tumours. These CNN models employ convolution and pooling operations to extract features from scans. ...
... Evaluate the trained model on the test dataset to obtain the final performance metrics. 10. Make predictions on new, unseen brain tumour images. ...
Article
Full-text available
This study presents an analysis of two deep learning models deployed for brain tumour detection: the lightweight pretrained MobileNetV2 and a novel hybrid model by combining light-weight MobileNetV2 with VGG16. The aim is to investigate the performance and efficiency of these models in terms of accuracy and training time. The new hybrid model integrates the strengths of both architectures, leveraging the depth-wise separable convolutions of MobileNetV2 and the deeper feature extraction capabilities of VGG16. Through experimentation and evaluation using a publicly available benchmark brain tumour dataset, the results demonstrate that the hybrid model achieves superior accuracy of training and testing accuracy of 99% and 98%, respectively compared to the standalone MobileNetV2 model, even at lower epochs. This novel fusion model presents a promising approach for enhancing brain tumour detection systems, offering improved accuracy with reduced training time and computational resources.
... Convolutional layers: A convolutional layer is made up of several convolutional filters, also known as kernels, which are convolved with the input image [96]. Convolution of an input with a kernel can be thought of as shifting the filter from one corner (e.g., top left) to the other corner (e.g., bottom right) in steps/strides. ...
Article
Full-text available
Mental stress is a common problem that affects individuals all over the world. Stress reduces human functionality during routine work and may lead to severe health defects. Early detection of stress is important for preventing diseases and other negative health-related consequences of stress. Several neuroimaging techniques have been utilized to assess mental stress, however, due to its ease of use, robustness, and non-invasiveness, electroencephalography (EEG) is commonly used. This paper aims to fill a knowledge gap by reviewing the different EEG-related deep learning algorithms with a focus on Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs) for the evaluation of mental stress. The review focuses on data representation, individual deep neural network model architectures, hybrid models, and results amongst others. The contributions of the paper address important issues such as data representation and model architectures. Out of all reviewed papers, 67% used CNN, 9% LSTM, and 24% hybrid models. Based on the reviewed literature, we found that dataset size and different representations contributed to the performance of the proposed networks. Raw EEG data produced classification accuracy around 62% while using spectral and topographical representation produced up to 88%. Nevertheless, the roles of generalizability across different deep learning models and individual differences remain key areas of inquiry. The review encourages the exploration of innovative avenues, such as EEG data image representations concurrently with graph convolutional neural networks (GCN), to mitigate the impact of inter-subject variability. This novel approach not only allows us to harmonize structural nuances within the data but also facilitates the integration of temporal dynamics, thereby enabling a more comprehensive assessment of mental stress levels.
... Nowadays, DL can begin novel feature extraction that has not been investigated in previous research [28]. Wicht [29] used DL networks to derive significant features autonomously using an unconstrained method and contrasted them to handmade features. According to experts, the DL method can provide better learned features than handcrafted features. ...
... RBMs on the other hand are MRFs with symmetrical, bipartite, bidirectional structures (Salakhutdinov, 2008). DBNs, although successful, have difficulty scaling to large input samples like practically sized natural images (Wicht, 2017). This is because natural images encompass numerous pixels, which make learning very slow, hence computationally intractable for RBMs because of their fully connected nature. ...
Article
Full-text available
Urban scene-level 3D point cloud labeling is a very laborious and expensive task compared to images. Conversely however, image processing techniques, deep learning or otherwise are more established and mature. Thus, in a multi-source data environment, the labeling of a point cloud scene via an automated image process as an initial step, followed by a manual human verification process is an effective way to save man hours and cost. With the above as the goal, this study presents a simple but robust spatio-spectral feature representation approach. In this approach, a class-aware band selection and reduction (CBSR) technique is developed for optimal hyperspectral feature representation. A double-branched convolutional Gaussian Bernoulli deep belief network (CGBDBN) is then used for hierarchical spatial feature extraction from LiDAR-derived data and the CBSR data. Using stacked ensemble learning, spatio-spectral features are generated from the two feature streams via a fusion rule and then classified-the results of which are used in labeling a raw 3D LiDAR point cloud through projection. To evaluate this study, extensive experiments were conducted on the IEEE 2018 Houston dataset-the only publicly available dataset with both hyperspectral image (HSI) and 3D point cloud covering the same area-for urban scene classification. The results indicate that the developed CBSR attained comparatively competitive results with state-of-the-art approaches, thus making it a robust spectral feature representation technique. Also, the weight-sharing property, probabilistic modeling, and hierarchical nature of CGBDBN gives our approach the ability to capture high-level contextual features. Furthermore, compared to the spatial-or spectral-only features, the generated spatio-spectral features are more discriminative and significantly aided in improving the proposed model's efficacy. Overall, the proposed approach, based on the evaluation metrics, is a robust and effective approach for both coarse-and fine-grained raw LiDAR point cloud labeling tasks.
... The proposed model is built as a stacked autoencoder (SAE), which is built by stacking multiple autoencoders to extract features layer by layer to obtain deeper and more abstract features that transform sensitive information into non-sensitive abstract data [35,36]. SAEs have also been proven to be better at producing features than the traditional deep auto-encoders [37]. The distinct splits of the clients, aggregators and server can be seen in Fig. 4. ...
Preprint
Full-text available
The detection of energy thefts is vital for the safety of the whole smart grid system. However, the detection alone is not enough since energy thefts can crucially affect the electricity supply leading to some blackouts. Moreover, privacy is one of the major challenges that must be preserved when dealing with clients' energy data. This is often overlooked in energy theft detection research as most current detection techniques rely on raw, unencrypted data, which may potentially expose sensitive and personal data. To solve this issue, we present a privacy-preserving energy theft detection technique with effective demand management that employs two layers of privacy protection. We explore a split learning mechanism that trains a detection model in a decentralised fashion without the need to exchange raw data. We also employ a second layer of privacy by the use of a masking scheme to mask clients' outputs in order to prevent inference attacks. A privacy-enhanced version of this mechanism also employs an additional layer of privacy protection by training a randomisation layer at the end of the client-side model. This is done to make the output as random as possible without compromising the detection performance. For the energy theft detection part, we design a multi-output machine learning model to identify energy thefts, estimate their volume, and effectively predict future demand. Finally, we use a comprehensive set of experiments to test our proposed scheme. The experimental results show that our scheme achieves high detection accuracy and greatly improves the privacy preservation degree.
... From the past, several computational methods based on CNN architectures have been introduced for the classification of medical images [10,11]. Here, some feature extractions are shown with the help of convolution pooling operations. ...
Article
Magnetic Resonance Imaging (MRI) is a technology mainly used for disease prediction and treatment. Practically, due to poor quality of MRI images, sometimes it is advised to repeat the scan test again which causes some unavoidable situations with increase of costs. Therefore, only the improvement of the quality of MRI images can give us the relief from these unnecessary problems. So, we need an automated supervised machine learning algorithm to generate high resolution data without more efforts. In this paper, some computational techniques like convolutional networks, K-Nearest Neighbor classifier, and Generative Adversarial Network (GAN) are applied on the MRI images to get the high-resolution based MRI images. The methodology follows medical image localization, detection, segmentation, and classification. The validation results on real data of MRI data fundamentally determines its usefulness and demonstrates the effectiveness in compared to state-of-the-art super-resolution techniques.
... Convolution Layer[21] ...
Thesis
Full-text available
In this thesis, self-supervised learning is used to enhance process data monitoring with the help of ML. Industrial, process dataset is not easy to acquire and often the dataset might not be large enough to train the system as desired. Therefore, Self-Supervised Learning (SSL) method is used to build a model in order to overcome this small dataset problem. In the first phase, the proposed model will learn about the given process dataset and will classify it based on the normal and fault case operation properties. In the next phase, the model will try to generate data points which will mimic the original data points that existed in the original dataset. This generated dataset will be combined later with the original dataset, enabling the system to learn about all the fault cases and non fault operation. The extended dataset, which is a combination of the original and generated dataset, can be used in terms of predictive maintenance, process monitoring and optimization.
... The ResNet-50 model obtained the highest accuracy rate with 95%. In studies [33,34] CNN architectures have been introduced to classify brain tumors. In these architectures, the convolution neural network extracts the features from brain MRI using convolution and pooling operations. ...
Article
Full-text available
Brain tumor classification plays an important role in clinical diagnosis and effective treatment. In this work, we propose a method for brain tumor classification using an ensemble of deep features and machine learning classifiers. In our proposed framework, we adopt the concept of transfer learning and uses several pre-trained deep convolutional neural networks to extract deep features from brain magnetic resonance (MR) images. The extracted deep features are then evaluated by several machine learning classifiers. The top three deep features which perform well on several machine learning classifiers are selected and concatenated as an ensemble of deep features which is then fed into several machine learning classifiers to predict the final output. To evaluate the different kinds of pre-trained models as a deep feature extractor, machine learning classifiers, and the effectiveness of an ensemble of deep feature for brain tumor classification, we use three different brain magnetic resonance imaging (MRI) datasets that are openly accessible from the web. Experimental results demonstrate that an ensemble of deep features can help improving performance significantly, and in most cases, support vector machine (SVM) with radial basis function (RBF) kernel outperforms other machine learning classifiers, especially for large datasets.
Article
Full-text available
Human activity recognition (HAR) is an important research area in the fields of human perception and computer vision due to its wide range of applications. These applications include: intelligent video surveillance, ambient assisted living, human computer interaction, human-robot interaction, entertainment, and intelligent driving. Recently, with the emergence and successful deployment of deep learning techniques for image classification, researchers have migrated from traditional handcrafting to deep learning techniques for HAR. However, handcrafted representation-based approaches are still widely used due to some bottlenecks such as computational complexity of deep learning techniques for activity recognition. However, approaches based on handcrafted representation are not able to handle complex scenarios due to their limitations and incapability; therefore, resorting to deep learning-based techniques is a natural option. This review paper presents a comprehensive survey of both handcrafted and learning-based action representations, offering comparison, analysis, and discussions on these approaches. In addition to this, the well-known public datasets available for experimentations and important applications of HAR are also presented to provide further insight into the field. This is the first review paper of its kind which presents all these aspects of HAR in a single review article with comprehensive coverage of each part. Finally, the paper is concluded with important discussions and research directions in the domain of HAR.
Chapter
Researchers will find Neurocomputing an essential guide to the concepts employed in this field that have been taken from disciplines as varied as neuroscience, psychology, cognitive science, engineering, and physics. A number of these important historical papers contain ideas that have not yet been fully exploited, while the more recent articles define the current direction of neurocomputing and point to future research. Each article has an introduction that places it in historical and intellectual perspective. Included among the 43 articles are the pioneering contributions of McCulloch and Pitts, Hebb, and Lashley; innovative work by Von Neumann, Minsky and Papert, Cooper, Grossberg, and Kohonen; exciting new developments in parallel distributed processing. Bradford Books imprint
Conference Paper
Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible, wide and deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks.
Conference Paper
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.
Conference Paper
Multipliers are the most space and power-hungry arithmetic operators of the digital implementation of deep neural networks. We train a set of state-of-the-art neural networks (Maxout networks) on three benchmark datasets: MNIST, CIFAR-10 and SVHN. They are trained with three distinct formats: floating point, fixed point and dynamic fixed point. For each of those datasets and for each of those formats, we assess the impact of the precision of the multiplications on the final error after training. We find that very low precision is sufficient not just for running trained networks but also for training them. For example, it is possible to train Maxout networks with 10 bits multiplications.