Neural Computing and Applications

Published by Springer Nature
Online ISSN: 1433-3058
Learn more about this page
Recent publications
Article
  • Ze-yu LiuZe-yu Liu
  • Jian-wei LiuJian-wei Liu
Few-shot semantic segmentation tackles the problem of recognizing novel class objects from images with only a few annotated exemplars. The key problem in few-shot semantic segmentation is how to effectively model the correspondences between support and query features. Previous works propose to tackle the problem by prototype matching or distance based metric learning. In this work, we introduce a kernel-based similarity matching model, enforcing robust guidance from both foreground and background semantics. In addition, guidance sorting and allocation modules are presented to better explore the guidance from support set. Specifically, guidance sorting module calibrates the most similar semantic patterns on query maps for each support pixel and produces the index vectors. While the allocation module is able to select the most representative correspondences on similarity maps based on index vectors. To integrate the insights of kernel-based similarity features we define a pyramidal paradigm, which progressively integrates guidance signal, query features and mask priors. In this way, the relationships between support and query features are dynamically explored in both foreground and background semantics. Extensive qualitative and quantitative evaluations on PASCAL-5ⁱ, COCO-20ⁱ and FSS-1000 are conducted to prove the efficiency and advantage of our proposed method. Experimental results demonstrate that our method performs favorably against state-of-the-art methods with reasonable computational cost.
 
Article
  • Wenchuan ZangWenchuan Zang
  • Peng YaoPeng Yao
  • Kunling LvKunling Lv
  • Dalei SongDalei Song
Underwater gliders lack the necessary navigation equipment and have low control performance, which deteriorate the autonomy and efficiency of the sampling. The underwater gliders standoff tracking based on the Lyapunov guidance vector fields is introduced in this work to enhance the autonomy of gliders in observing the potential static targets. To avoid designing complex control processes, we convert the standoff tracking into a Markovian decision process and introduce reinforcement learning methods to solve the task. Also, to trade-off the fast training and achieving acceptable results, we design a control framework that integrates classical controller and reinforcement learning. The simulations show that the proposed framework outperform than the comparison method. This work can provide a new pattern for the sampling control of gliders. The proposed method combining reinforcement learning with classical controller can provide a reference for other applications of reinforcement learning.
 
Article
  • Rim MagdichRim Magdich
  • Hanen JemalHanen Jemal
  • Mounir Ben AyedMounir Ben Ayed
The social Internet of Things (SIoT) is the next generation of the Internet of Things network. It entails the evolution of intelligent devices into social ones, aiming at building interactions with people in order to link groups and develop their own social context. Because a high volume of data is shared throughout the network’s diverse nodes, security measures are essential to ensure that users may interact safely. Trust management (TM) models have been presented in the literature to avoid detrimental interactions and preserve a system’s optimal functioning. In reality, given the SIoT context of nodes varies over time, a TM mechanism must contain methods for evaluating the level of trustworthiness. Existing methods, on the other hand, continue to lack effective solutions for addressing contextual SIoT attributes that define the network node while assessing trust. The utmost objective of this paper is to perform an in-depth analysis of contextual trust-awareness based on the defined TM model “CTM-SIoT” in order to more precisely detect malicious SIoT nodes to maintain safe network connections. As part of our trust evaluation process, machine learning techniques are employed to study the behavior of nodes. Our objective is to limit contacts with aggressive and unskilled service providers. Experimentation was carried out using the Cooja simulator on a simulated SIoT dataset based on real social data. With an F-measure value of up to 1, we validated the Artificial Neural Network’s suitability as a classifier for our issue statement. When compared to other conventional trust classification methods, the findings demonstrated that handling contextual SIoT characteristics inside our TM model enhanced the performance of a TM mechanism with a 0.037% rise in F-measure and a 0.13% drop in FPR, in identifying malicious nodes even for a system with 50% of malicious transactions.
 
Article
Human activity recognition (HAR) is a very active yet challenging and demanding area of computer science. Due to the articulated nature of human motion, it is not trivial to detect human activity with high accuracy for all applications. Generally, activities are recognized from a series of actions performed by the human through vision-based sensors or non-vision-based sensors. HAR’s application areas span from health, sports, smart home-based, and other diverse areas. Moreover, detecting human activity is also needed to automate systems to monitor ambient and detect suspicious activity while performing surveillance. Besides, providing appropriate information about individuals is a necessary task in pervasive computing. However, identifying human activities and actions is challenging due to the complexity of activities, speed of action, dynamic recording, and diverse application areas. Besides that, all the actions and activities are performed in distinct situations and backgrounds. There is a lot of work done in HAR; finding a suitable algorithm and sensors for a certain application area is still challenging. While some surveys are already conducted in HAR, the comprehensive survey to investigate algorithms and sensors concerning diverse applications is not done yet. This survey investigates the best and optimal machine learning algorithms and techniques to recognize human activities in the field of HAR. It provides an in-depth analysis of which algorithms might be suitable for a certain application area. It also investigates which vision-based or non-vision-based acquisition devices are mostly employed in the literature and are suitable for a specific HAR application.
 
Article
Volatility plays a crucial role in financial markets and accurate prediction of the stock price indices is of high interest. In multivariate time series, Dynamic Conditional Correlation (DCC)-Generalized Autoregressive Conditional Heteroscedastic (GARCH) is used to model and forecast the volatility (risk) and co-movement between stock prices data. We propose multivariate artificial neural networks (MANNs) hybridized with the DCC-GARCH model to forecast the volatility of stock prices and to examine the time-varying correlation. The daily share price data of five stock markets: S&P 500 (USA), FTSE-100 (UK), KSE-100 (Pakistan), Malaysia (KLSE) and BSESN (India) covering the period from 1st, January 2013 to 17th, March 2020 are considered for empirical analysis. Moreover, the hybrid models of MANNs and DCC-GARCH are developed in two ways: (i) MANNs is provided as an input to DCC-GARCH (1,1) producing a hybrid model of DCC-GARCH(1,1)-MANNs and (ii) DCC-GARCH(1,1) model is set as an input to MANNs resulting hybrid model of MANNs-DCC-GARCH(1,1). Furthermore, the performances of the proposed models are compared with single models via the root mean square (RMSE), mean absolute error (MAE) and relative mean absolute error (RMAE). The empirical results show that DCC-GARCH (1,1)-MANNs, a parametric model, outperforms both in-sample and out-sample forecasts and helps to examine the time-varying correlation and also provides volatility forecast as well, whereas the hybrid model MANNs-DCC-GARCH (1,1) provides forecast only. Therefore, the hybrid model of DCC-GARCH (1,1)-MANNs is found suitable as compared to MANNs-DCC-GARCH(1,1) to model and forecast the stock price indices under consideration.
 
Article
The reliability-based design optimization (RBDO) problem considers the necessary uncertainty of measurements within the scope of planning to minimize the design objective while satisfying probabilistic constraints. Metaheuristic algorithms offer effective tools to address challenges that scientists and practitioners face in RBDO problems, including the use of multimodal objective functions, mixed design variables, and nondifference mathematical models. However, metaheuristic reliability-based design optimization (MRBDO) algorithms require reliability analysis to obtain accurate solutions, which leads to different convergence behaviors than those observed for gradient RBDO algorithms. One of the main drawbacks of such schemes is the high computational cost. In this work, we derive an error propagation rule from the inner reliability analysis to the outer optimization. Then, based on a two-stage water cycle algorithm (TSWCA), an improved MRBDO algorithm called TSWCA-MRBDO is developed to ensure universality and performance. In the proposed algorithm, the water cycle algorithm, with a global capacity, is used to find the best solution. A single-loop strategy is first adopted, in which the MRBDO problem is converted into the deterministic optimization problem to remarkably reduce the computational time of global search. Then, a two-stage algorithm is utilized to perform the local search. Numerical examples demonstrate that the proposed two-stage MRBDO algorithm can converge more quickly and efficiently in the global and local domains than other MRBDO algorithms.
 
Sample images of the COVID-19 CT dataset
U-Net models created by using pre-trained models
Block diagram of the image segmentation model
U-Net architecture
Segmentation results of random samples on dataset
Article
The coronavirus disease (COVID-19) is an important public health problem that has spread rapidly around the world and has caused the death of millions of people. Therefore, studies to determine the factors affecting the disease, to perform preventive actions and to find an effective treatment are at the forefront. In this study, a deep learning and segmentation-based approach is proposed for the detection of COVID-19 disease from computed tomography images. The proposed model was created by modifying the encoder part of the U-Net segmentation model. In the encoder part, VGG16, ResNet101, DenseNet121, InceptionV3 and EfficientNetB5 deep learning models were used, respectively. Then, the results obtained with each modified U-Net model were combined with the majority vote principle and a final result was reached. As a result of the experimental tests, the proposed model obtained 85.03% Dice score, 89.13% sensitivity and 99.38% specificity on the COVID-19 segmentation test dataset. The results obtained in the study show that the proposed model will especially benefit clinicians in terms of time and cost.
 
LSTM architecture [24]
Most common hashtags in dataset
Most common bigrams in dataset
BiLSTM model architecture
Confusion Matrix of Bert Tokenizer LSTM model
Article
COVID-19 is an infectious disease with its first recorded cases identified in late 2019, while in March of 2020 it was declared as a pandemic. The outbreak of the disease has led to a sharp increase in posts and comments from social media users, with a plethora of sentiments being found therein. This paper addresses the subject of sentiment analysis, focusing on the classification of users’ sentiment from posts related to COVID-19 that originate from Twitter. The period examined is from March until mid-April of 2020, when the pandemic had thus far affected the whole world. The data is processed and linguistically analyzed with the use of several natural language processing techniques. Sentiment analysis is implemented by utilizing seven different deep learning models based on LSTM neural networks, and a comparison with traditional machine learning classifiers is made. The models are trained in order to distinguish the tweets between three classes, namely negative, neutral and positive.
 
Article
The development of numerous frameworks and pedagogical practices has significantly improved the performance of deep learning-based speech recognition systems in recent years. The task of developing automatic speech recognition (ASR) in indigenous languages becomes enormously complex due to the wide range of auditory and linguistic components due to a lack of speech and text data, which has a significant impact on the ASR system's performance. The main purpose of the research is to effectively use in-domain data augmentation methods and techniques to resolve the challenges of data scarcity, resulting in an increased neural network consistency. This research further goes into more detail about how to create synthetic datasets via pooled augmentation methodologies in conjunction with transfer learning techniques, primarily spectrogram augmentation. Initially, the richness of the signal has been improved through the process of deformation of the time and/or the frequency axis. The time-warping aims to deform the signal's envelope, whereas frequency-warping alters spectral content. Second, the raw signal is examined using audio-level speech perturbation methods such as speed and vocal tract length perturbation. These methods are shown to be effective in addressing the issue of data scarcity while having a low implementation cost, making them simple to implement. Nevertheless, these methods have the effect of effectively increasing the dataset size because multiple versions of a single input are fed into the network during training, likely to result in overfitting. Consequently, an effort has been made to solve the problem of data overfitting by integrating two-level augmentation procedures via pooling of prosody/spectrogram modified and original speech signals using transfer learning techniques. Finally, the adult ASR system was experimented on using deep neural network (DNN) with concatenated feature analysis employing Mel-frequency cepstral coefficients (MFCC), pitch features, and the normalization technique of Vocal Tract Length Normalization (VTLN) on pooled Punjabi datasets, yielding a relative improvement of 41.16 percent in comparison with the baseline system.
 
Article
Dry eye disease (DED) is a chronic eye disease and a common complication among the world's population. Evaporation of moisture from tear film or a decrease in tear production leads to an unstable tear film which causes DED. The tear film breakup time (TBUT) test is a common clinical test used to diagnose DED. In this test, DED is diagnosed by measuring the time at which the first breakup pattern appears on the tear film. TBUT test is subjective, labour-intensive and time-consuming. These weaknesses make a computer-aided diagnosis of DED highly desirable. The existing computer-aided DED detection techniques use expensive instruments for image acquisition which may not be available in all eye clinics. Moreover, among these techniques, TBUT-based DED detection techniques are limited to finding only tear film breakup area/time and do not identify the severity of DED, which can essentially be helpful to ophthalmologists in prescribing the right treatment. Additionally, a few challenges in developing a DED detection approach are less illuminated video, constant blinking of eyes in the videos, blurred video, and lack of public datasets. This paper presents a novel TBUT-based DED detection approach that detects the presence/absence of DED from TBUT video. In addition, the proposed approach accurately identifies the severity level of DED and further categorizes it as normal, moderate or severe based on the TBUT. The proposed approach exhibits high performance in classifying TBUT frames, detecting DED, and severity grading of TBUT video with an accuracy of 83%. Also, the correlation computed between the proposed approach and the Ophthalmologist's opinion is 90%, which reflects the noteworthy contribution of our proposed approach.
 
Article
Optical fiber links are customarily monitored by Optical Time Domain Reflectometer (OTDR), an optoelectronic instrument that measures the scattered or reflected light along the fiber and returns a signal, namely the OTDR trace . OTDR traces are typically analyzed by experts in laboratories or by hand-crafted algorithms running in embedded systems to localize critical events occurring along the fiber. In this work, we address the problem of automatically detecting optical events in OTDR traces through a deep learning model that can be deployed in embedded systems. In particular, we take inspiration from Faster R-CNN and present the first 1D object-detection neural network for OTDR traces. Thanks to an ad-hoc preprocessing pipeline for OTDR traces, we can also identify unknown events , namely events that are not represented in training data but that might indicate rare and unforeseen situations that need to be reported. The resulting network brings several advantages with respect to existing solutions, as these typically classify fixed-size windows of OTDR traces, thus are less accurate in the localization. Moreover, existing solutions do not report events that cannot be safely associated to any label in the training set. Our experiments, performed on real OTDR traces, show very promising performance, and can be directly executed on embedded OTDR devices.
 
Article
The cash in transit (CIT) problem is a version of the vehicle routing problem (VRP), which deals with the planning of money distribution from the depot(s) to the automated teller machines (ATMs) safely and quickly. This study investigates a novel CIT problem, which is a variant of time-dependent VRP with time windows. To establish a more realistic approach to the time-dependent CIT problem, vehicle speed varying according to traffic density is considered. The problem is formulated as a mixed-integer mathematical model. Artificial neural networks (ANNs) are used to forecast the money demand for each ATM. For this purpose, key factors are defined, and a formulation is proposed to determine the money deposited to and withdrawn into ATMs. The mathematical model is run for different scenarios, and optimum routes are obtained.
 
Article
Transformers have achieved impressive performance in natural language processing and computer vision, including text translation, semantic segmentation, etc. However, due to excessive self-attention computation and memory occupation, the stereo matching task does not share its success. To promote this technology in stereo matching, especially with limited hardware resources, we propose a sliding space-disparity transformer named SSD-former. According to matching modeling, we simplify transformer for achieving faster speed, memory-friendly, and competitive performance. First, we employ the sliding window scheme to limit the self-attention operations in the cost volume for adapting to different resolutions, bringing efficiency and flexibility. Second, our space-disparity transformer remarkably reduces memory occupation and computation, only computing the current patch’s self-attention with two parts: (1) all patches of current disparity level at the whole spatial location and (2) the patches of different disparity levels at the exact spatial location. The experiments demonstrate that: (1) different from the standard transformer, SSD-former is faster and memory-friendly; (2) compared with 3D convolution methods, SSD-former has a larger receptive field and provides an impressive speed, showing great potential in stereo matching; and (3) our model obtains state-of-the-art performance and a faster speed on the multiple popular datasets, achieving the best speed–accuracy trade-off.
 
Algorithm flowchart—it is possible to check in (c) the homogeneity test, and (g-i) represents the steps to separability test. (d) guarantees that the easy classifier regions are maintained intact over training method
FSP space segmentation example. From frame (e) to frame (f), the bottom down homogeneous region was removed. The numbers in the rectangles are the proportion value for class 1, represented by blue markers. The squares represent the regions centroids
Stopping criteria based on Cauchy–Schwarz divergence tolerance. The numbers inside the rectangles represent the Cauchy–Schwarz divergence of each region. In frame (d), all the divergences are below the tolerance; the algorithm was stopped by the s-separable criteria
Mean and standard deviation of accuracy results in test (out-of-sample) data for all datasets (see Tables 5 and 6). Continuous line represents accuracy mean values while dotted lines are 2 x STD value
Article
We propose a local–global classification scheme in which the feature space is, in a first phase, segmented by an unsupervised algorithm allowing, in a second phase, the application of distinct classification methods in each of the generated sub-regions. The proposed segmentation process intentionally produces difficult-to-classify and easy-to-classify sub-regions. Consequently, it is possible to outcome, besides of the classification labels, a measure of confidence for these labels. In almost homogeneous regions, one may be well-nigh sure of the classification result. The algorithm has a built-in stopping criterion to avoid over dividing the space, what would lead to overfitting. The Cauchy–Schwarz divergence is used as a measure of homogeneity in each partition. The proposed algorithm has shown very nice results when compared with 52 prototype selection algorithms. It also brings in the advantage of priory unveiling areas of the feature space where one should expect more (or less) difficult in classifying.
 
Article
Hand gestures are becoming an important part of the communication method between humans and machines in the era of fast-paced urbanization. This paper introduces a new standard dataset for hand gesture recognition, Static HAnd PosturE (SHAPE), with adequate side, variation, and practicality. Compared with the previous datasets, our dataset has more classes, subjects, or scenes than other datasets. In addition, the SHAPE dataset is also one of the first datasets to focus on Asian subjects with Asian hand gestures. The SHAPE dataset contains more than 34,000 images collected from 20 distinct subjects with different clothes and backgrounds. A recognition architecture is also presented to investigate the proposed dataset. The architecture consists of two phases that are the hand detection phase for preprocessing and the classification phase by customized state-of-the-art deep neural network models. This paper investigates not only the high accuracy, but also the lightweight hand gesture recognition models that are suitable for resource-constrained devices such as portable edge devices. The promising application of this study is to create a human–machine interface that solves the problem of insufficient space for a keyboard or a mouse in small devices. Our experiments showed that the proposed architecture could obtain high accuracy with the self-built dataset. Details of our dataset can be seen online at https://users.soict.hust.edu.vn/linhdt/dataset/
 
Article
Estimating depth from a monocular camera is a must for many applications, including scene understanding and reconstruction, robot vision, and self-driving cars. However, generating depth maps from single RGB images is still a challenge as object shapes are to be inferred from intensity images strongly affected by viewpoint changes, texture content and light conditions. Therefore, most current solutions produce blurry approximations of low-resolution depth maps. We propose a novel depth map estimation technique based on an autoencoder network. This network is endowed with a multi-scale architecture and a multi-level depth estimator that preserve high-level information extracted from coarse feature maps as well as detailed local information present in fine feature maps. Curvilinear saliency, which is related to curvature estimation, is exploited as a loss function to boost the depth accuracy at object boundaries and raise the performance of the estimated high-resolution depth maps. We evaluate our model on the public NYU Depth v2 and Make3D datasets. The proposed model yields superior performance on both datasets compared to the state-of-the-art, achieving an accuracy of 86%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$~86\%$$\end{document} and showing exceptional performance at the preservation of object boundaries and small 3D structures. The code of the proposed model is publicly available at https://github.com/SaddamAbdulrhman/MDACSFB.
 
Article
This work proposes an adaptive nonsingular fixed-time controller to boost trajectory tracking precision and velocity for the manipulator system with lumped disturbance. First, while the system state is in the sliding phase, a fixed-time sliding mode (SM) surface is designed to improve tracking speed and accuracy. Secondly, an enhanced reaching law is designed by combining inverse trigonometric functions, which can reduce chattering while increasing the convergence velocity of the SM variables. Then, the adaptive law is developed to handle the upper bound of the unknown disturbance to overcome the difficulty of establishing the upper bound of the uncertain disturbance. It is demonstrated by the Lyapunov function theorem that the SM variables and tracking errors can reach a region near the zero point at a fixed time. As a result, by comparing the fixed-time controller presented in this work to other controllers, it is clear that the proposed fixed-time controller is better than other controllers.
 
Article
The short-term heavy rainfalls, thunderstorm gales, hail, squall lines, tornadoes, thunderstorms and other disastrous weather caused by deep convective clouds greatly threaten social and economic activities and the safety of people’s lives and property. Therefore, it is of great value to study the recognition methods of deep convective clouds in the field of the weather forecast. Since deep convective clouds are characterized by a short life cycle, small spatial scale and complex structure, it is difficult to accurately monitor and identify deep convective clouds by traditional ground monitoring methods. In this paper, the semantic segmentation network SCNET based on attention mechanism was proposed and a deep learning network for the recognition of deep convective clouds was established with infrared and brightness temperature channels of FY4A stationary meteorological satellites as input features. The results showed that SCNET has a better recognition effect than meteorological and machine learning methods, such as single-band threshold method, SVM, NN, UNET and RESNET, and can effectively improve the recognition accuracy of deep convective clouds
 
Article
Minimum vertex covering has been widely used and studied as a general optimization problem. We focus on one of its variation: minimum vertex cover of hypergraphs. Most existed algorithms are designed for general graphs, where each edge contains at most two vertices. Moreover, among these algorithms, rough set-based algorithms have been proposed recently and attract many researchers sight. However, they are not efficient enough when the number of nodes and hyperedges scale largely. To address these limitations, we propose a novel rough set-based approach by combining rough set theory with the stochastic local search algorithm. In this approach, three improvements have been introduced, i.e., (1) fast relative reduct construction method, which can quickly achieve a relative reduct, and it is based on low-complexity heuristics; (2) (p, q)-reverse incremental verification mechanism, which uses incremental positive region update technology to quickly verify whether a required attribute pair can be found; (3) adjusting iterative process rules, the main purposes of these rules are avoiding repeated computation and jumping out of local optimum. Finally, by comparing groups of benchmark graphs and hypergraphs with the existing algorithms based on rough sets, experimental results presents the advantages and limitations of our proposed approach.
 
Article
The application of clustering in generating random subspace has improved the accuracy and diversity of ensemble classification methods. If clusters are not balanced (unequal size of clusters) and not strong (unequal number of data from each class in each cluster), the results will deviate from classes with more samples in each cluster and thereby will be biased. The current paper presents a novel strong balance-constrained clustering or hard-strong clustering. This method creates diverse strong balanced data clusters to train different base classifiers and an artificial neural network with more than one hidden layer, in which the final decision is made for the data class through majority voting. By implementing the proposed method on 16 datasets, two objectives are followed: enhancing the performance of ensemble classifier and deep learning-based method (data mining objective), and adopting appropriate policies for budget, time, and energy assignment to various business domains by decision-makers (business objective). Based on the evaluation and comparison of the results, the proposed method is faster than other balancing methods. Furthermore, the accuracy of the proposed ensemble method has proved acceptable improvement than other ensemble classification methods.
 
Article
Image Quality Assessment (IQA) is one of the essential problems in image processing. The growth of natural image quality assessment methods has collected a large range of research achievements that are separated into three categories, such as Full Reference Image Quality Assessment (FR-IQA), and Reduced Reference Image Quality Assessment (RR-IQA), and No Reference Image Quality Assessment (NR-IQA). With the rapid growth of digital vision technology, the image quality estimation process quantifies the quality of an image that is used to transmit and acquire images. In this paper, we present a novel Lee Sigma Filterized Mathieu Feature Transformation-based Radial Kernel Deep Belief Network (LSFMFT-RKDBN) model that has been developed for estimating the image quality with and without a reference image. First, the proposed technique performs the quality estimation with full reference called LSFMFT-RKDBN-FR model work that is based on the layer-by-layer method. The visible layer of the Deep Belief Network (DBN) receives the test and reference images. Next, the input test and reference images are de-noised by applying the weighted Lee sigma filter. The de-noised images are partitioned into several patches for accurate feature extraction. Then, the Mathieu transformation is applied to obtain the test feature vector and reference feature vector. At the output layer, the radial basis kernel activation function is applied to analyze the feature vectors and display the estimated results. On the other hand, the proposed model is applied with no reference (LSFMFT-RKDBN-NR) to estimate the test image quality. The LSFMFT-RKDBN-NR model with image de-noising, patch extraction, and feature extraction is carried out to create a test feature vector. Finally, the estimated results are obtained at the output layer. We evaluate the proposed LSFMFT-RKDBN model on the CSIQ Image Quality dataset with qualitative and quantitative results analysis. The proposed LSFMFT-RKDBN model is used to estimate the image quality with higher accuracy and less time and memory consumption when compared to other related methods. The observed result shows the superior performance of the proposed LSFMFT-RKDBN model compared with the two state-of-the-art methods.
 
Article
This paper proposes a methodology for detection and classification of fatigue damage in mechanical structures in the framework of neural networks (NN). The proposed methodology has been tested and validated with polycrystalline-alloy (AL7075-T6) specimens on a laboratory-scale experimental apparatus. Signal processing tools (e.g., discrete wavelet transform and Hilbert transform) have been applied on time series of ultrasonic test signals to extract features that are derived from: (i) Signal envelope, (ii) Low-frequency and high-frequency signal spectra, and (iii) Signal energy. The performance of the neural network, combined with each one of these features, is compared with the ground truth, generated from the original ultrasonic test signals and microscope images. The results show that the NN model, combined with the signal-energy feature, yields the best performance and that it is capable of detecting and classifying the fatigue damage with (up to) 98.5% accuracy.
 
Article
Rapid advances in deep learning models have made it easier for public and crackers to generate hyper-realistic deepfake videos in which faces are swapped. Such deepfake videos may constitute a significant threat to the world if they are misused to blackmail public figures and to deceive systems of face recognition. As a result, distinguishing these fake videos from real ones has become fundamental. This paper introduces a new deepfake video detection method. You Only Look Once (YOLO) face detector is used to detect faces from video frames. A proposed hybrid method based on proposing two different feature extraction methods is applied to these faces. The first feature extraction method, a proposed Convolution Neural Network (CNN), is based on the Histogram of Oriented Gradient (HOG) method. The second one is an ameliorated XceptionNet CNN. The two extracted sets of features are merged together and fed as input to a sequence of Gated Recurrent Units (GRUs) to extract the spatial and temporal features and then individuate the authenticity of videos. The proposed method is trained on the CelebDF-FaceForencics++ (c23) dataset and evaluated on the CelebDF test set. The experimental results and analysis confirm the superiority of the suggested method over the state-of-the-art methods.
 
Article
Infrared and visible image fusion aims to highlight the prominent infrared target while containing the valuable texture details as much as possible. However, visible images are susceptible to the environment, especially the low-illumination environment, which will seriously affect the quality of the fused image. To solve this problem, an adaptive enhanced infrared and visible image fusion algorithm based on hybrid ℓ1-ℓ0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell_{1} { - }\ell_{{0}}$$\end{document} layer decomposition model and coupled dictionary is proposed (we termed the proposed method as AEFusion). First, the visible image is adaptively enhanced according to the actual situation. Then a novel fusion scheme based on coupled dictionary and ℓ1-ℓ0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell_{{1}} { - }\ell_{{0}}$$\end{document} pyramid is proposed to obtain the pre-fusion image, to further highlight the significant information, we set the pre-fusion image as the benchmark to obtain the weight map which is used to fuse the final fused detail layer. Qualitative and quantitative experimental results demonstrate that the proposed method is superior to 11 state-of-the-art image fusion methods as more valuable texture information and prominent infrared targets are preserved by the AEFusion, which is beneficial to target detection and tracking tasks. Our code is publicly available at: https://github.com/VCMHE/IRfusion.
 
Article
Deep learning (DL) models are computationally expensive in space and time, which makes it difficult to deploy DL models in edge computing devices, such as Raspberry-Pi or Jetson Nano. The current strategy uses genetic algorithm (GA), which compresses the deep convolution neural network models without compromising performance. GA was applied by converting the CNN layers into binary vectors. Further, the fitness function in GA was computed based on (i) the minimization of hidden units and (ii) test accuracy. The GA-based strategy was applied on different pre-trained architectures, namely AlexNet, VGG16, SqueezeNet, and ResNet50, respectively, by using three kinds of datasets, namely MNIST, CIFAR-10, and CIFAR-100. The proposed approach demonstrated the reduction in the storage space of AlexNet by 87.62%, 80.97%, and 86.20% corresponding to the datasets MNIST, CIFAR-10, and CIFAR-100, respectively. Further, for the same three datasets, namely VGG16, ResNet50, and SqueezeNet, the system average compression was 91.15%, 78.42%, and 38.40%, respectively. In addition to that, the inference time of the models using proposed strategy was significantly improved with an average of the four datasets of ~ 35.61%, 9.23%, 73.76%, and 79.93% corresponding to AlexNet, SqueezeNet, ResNet50, and VGG16 models. Further, our method when applied to the proposed CNN using the LIDC-IDRI dataset showed a 90.3% reduction in the storage space and inference time. DL system when optimized using GA shows improved performance in both storage and execution time.
 
Article
The main objective of this paper is to present an improved neural network algorithm (INNA) for solving the reliability-redundancy allocation problem (RRAP) with nonlinear resource constraints. In this RRAP, both the component reliability and the redundancy allocation are to be considered simultaneously. Neural network algorithm (NNA) is one of the newest and efficient swarm optimization algorithms having a strong global search ability that is very adequate in solving different kinds of complex optimization problems. Despite its efficiency, NNA experiences poor exploitation, which causes slow convergence and also restricts its practical application of solving optimization problems. Considering this deficiency and to obtain a better balance between exploration and exploitation, searching procedure for NNA is reconstructed by implementing a new logarithmic spiral search operator and the searching strategy of the learner phase of teaching–learning-based optimization (TLBO) and an improved NNA has been developed in this paper. To demonstrate the performance of INNA, it is evaluated against seven well-known reliability optimization problems and finally compared with other existing meta-heuristics algorithms. Additionally, the INNA results are statistically investigated with the Wilcoxon sign-rank test and Multiple comparison test to show the significance of the results. Experimental results reveal that the proposed algorithm is highly competitive and performs better than previously developed algorithms in the literature.
 
Article
This paper investigates the problem of forecasting multivariate aggregated human mobility while preserving the privacy of the individuals concerned. Differential privacy, a state-of-the-art formal notion, has been used as the privacy guarantee in two different and independent steps when training deep learning models. On one hand, we considered gradient perturbation, which uses the differentially private stochastic gradient descent algorithm to guarantee the privacy of each time series sample in the learning stage. On the other hand, we considered input perturbation, which adds differential privacy guarantees in each sample of the series before applying any learning. We compared four state-of-the-art recurrent neural networks: Long Short-Term Memory, Gated Recurrent Unit, and their Bidirectional architectures, i.e., Bidirectional-LSTM and Bidirectional-GRU. Extensive experiments were conducted with a real-world multivariate mobility dataset, which we published openly along with this paper. As shown in the results, differentially private deep learning models trained under gradient or input perturbation achieve nearly the same performance as non-private deep learning models, with loss in performance varying between 0.57% to 2.8%. The contribution of this paper is significant for those involved in urban planning and decision-making, providing a solution to the human mobility multivariate forecast problem through differentially private deep learning models.
 
Article
One of the major bottlenecks in refining supervised algorithms is data scarcity. This might be caused by a number of reasons often rooted in extremely expensive and lengthy data collection processes. In natural domains such as Heliophysics, it may take decades for sufficiently large samples for machine learning purposes. Inspired by the massive success of generative adversarial networks (GANs) in generating synthetic images, in this study we employed the conditional GAN (CGAN) on a recently released benchmark dataset tailored for solar flare forecasting. Our goal is to generate synthetic multivariate time-series data that (1) are statistically similar to the real data and (2) improve the performance of flare prediction when used to remedy the scarcity of strong flares. To evaluate the generated samples, first, we used the Kullback–Leibler divergence and adversarial accuracy measures to quantify the similarity between the real and synthetic data in terms of their descriptive statistics. Second, we evaluated the impact of the generated samples by training a predictive model on their descriptive statistics, which resulted in a significant improvement (over 1100% in TSS and 350% in HSS). Third, we used the generated time series to examine their high-dimensional contribution to mitigating the scarcity of the strong flares, which we also observed a significant improvement in terms of TSS (4%, 7%, and 31%) and HSS (75%, 35%, and 72%), compared to oversampling, undersampling, and synthetic oversampling methods, respectively. We believe our findings can open new doors toward more robust and accurate flare forecasting models.
 
Article
Graph Convolutional Network (GCN) which models the potential relationship between non-Euclidean spatial data has attracted researchers’ attention in deep learning in recent years. It has been widely used in different computer vision tasks by modeling the latent space, topology, semantics, and other information in Euclidean spatial data and has achieved significant success. To better understand the work principles and future GCN applications in the computer vision field, this study reviewed the basic principles of GCN, summarized the difficulties and solutions using GCN in different visual tasks, and introduced in detail the methods for constructing graphs from the Euclidean spatial data in different visual tasks. At the same time, the review divided the application of GCN in basic visual tasks into image recognition, object detection, semantic segmentation, instance segmentation and object tracking. The role and performance of GCN in basic visual tasks were summarized and compared in detail for different tasks. This review emphasizes that the application of GCN in computer vision faces three challenges: computational complexity, the paradigm of constructing graphs from the Euclidean spatial data, and the interpretability of the model. Finally, this review proposes two future trends of GCN in the vision field, namely model lightweight and fusing GCN with other models to improve the performance of the visual model and meet the higher requirements of vision tasks.
 
The extension to the conventional NMT models that is proposed by [44]. It generates the t-th target word yt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_t$$\end{document} given a source sentence (x1,x2,...,xT)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(x_1, x_2, ..., x_T)$$\end{document}
Examples of the attention mechanism in visual. (Top) Attending to the correct object in neural image caption generation [48]. (Bottom) Visualization of original image and question pairs, and co-attention maps namely word-level, phrase-level and question-level, respectively [52]
The Transformer architecture and the attention mechanisms it uses in detail [19]. (Left) The Transformer with one encoder-decoder stack. (Center) Multi-head attention. (Right) Scaled dot-product attention
Article
A long time ago in the machine learning literature, the idea of incorporating a mechanism inspired by the human visual system into neural networks was introduced. This idea is named the attention mechanism, and it has gone through a long development period. Today, many works have been devoted to this idea in a variety of tasks. Remarkable performance has recently been demonstrated. The goal of this paper is to provide an overview from the early work on searching for ways to implement attention idea with neural networks until the recent trends. This review emphasizes the important milestones during this progress regarding different tasks. By this way, this study aims to provide a road map for researchers to explore the current development and get inspired for novel approaches beyond the attention.
 
Article
In this article, a novel real-time object detector called Transformers Only Look Once (TOLO) is proposed to resolve two problems. The first problem is the inefficiency of building long-distance dependencies among local features for amounts of modern real-time object detectors. The second one is the lack of inductive biases for vision Transformer networks with heavily computational cost. TOLO is composed of Convolutional Neural Network (CNN) backbone, Feature Fusion Neck (FFN), and different Lite Transformer Heads (LTHs), which are used to transfer the inductive biases, supply the extracted features with high-resolution and high-semantic properties, and efficiently mine multiple long-distance dependencies with less memory overhead for detection, respectively. Moreover, to find the massive potential correct boxes during prediction, we propose a simple and efficient nonlinear combination method between the object confidence and the classification score. Experiments on the PASCAL VOC 2007, 2012, and the MS COCO 2017 datasets demonstrate that TOLO significantly outperforms other state-of-the-art methods with a small input size. Besides, the proposed nonlinear combination method can further elevate the detection performance of TOLO by boosting the results of potential correct predicted boxes without increasing the training process and model parameters.
 
Article
In streaming time series classification problems, the goal is to predict the label associated to the most recently received observations over the stream according to a set of categorized reference patterns. In on-line scenarios, data arise from non-stationary processes, which results in a succession of different patterns or events. This work presents an active adaptation strategy that allows time series classifiers to accommodate to the dynamics of streamed time series data. Specifically, our approach consists of a classifier that detects changes between events over streaming time series. For this purpose, the classifier uses features of the dynamic time warping measure computed between the streamed data and a set of reference patterns. When classifying a streaming series, the proposed pattern end detector analyzes such features to predict changes and adapt off-line time series classifiers to newly arriving events. To evaluate the performance of the proposed scheme, we employ the pattern end detection model along with dynamic time warping-based nearest neighbor classifiers over a benchmark of ten time series classification problems. The obtained results present exciting insights into the detection accuracy and latency performance of the proposed strategy.
 
The framework of MSSVC. A mixed sample strategy is used to generate a variety of subgraph views for each node. Each nodes’ batch size is set to 6. Specifically, we randomly select different subgraph views of a small node-set and feed them into the encoder that obtains the different subgraph representations of the same node. The multi-scale contrastive loss is used to maximize node agreement between different subgraph representations of each node
The multi-scale contrastive framework. The two subgraph views are taken from Fig. 1
Classification accuracy of MSSVC on Cora with different sample batch size and β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}
The impact of single data augmentation strategy on classification accuracy
t-SNE visualization of the representation in the Cora dataset
Article
Graph representation learning has received widespread attention in recent years. Most of the existing graph representation learning methods are based on supervised learning and require the complete graph as input. It needs a lot of computation memory cost. Besides, real-world graph data lacks labels and the cost of manually labeling data is expensive. Self-supervised learning provides a potential solution for graph representation learning to address these issues. Recently, multi-scale and multi-level self-supervised contrastive methods have been successfully applied. But most of these methods operate on complete graph data. Although the subgraph contrastive strategy improves the shortcomings of the previous self-supervised contrastive learning method, these subgraph contrastive methods only use a single contrastive strategy, which cannot fully extract the information in the graph. To approach these problems, in this paper, we introduce a novel self-supervised contrastive framework for graph representation learning. We generate multi-subgraph views for all nodes by a mixed sampling method. Our method learns node representation by a multi-scale contrastive loss. Specifically, we employ two objectives called bootstrapping contrastive loss and node-level agreement contrastive loss to maximize the node agreement between different subgraph views of the same node. Extensive experiments prove that compared with the state-of-the-art graph representation learning methods, our method is superior to a range of existing models in node classification task and computation memory costs.
 
Article
Affective computing, a subcategory of artificial intelligence, detects, processes, interprets, and mimics human emotions. Thanks to the continued advancement of portable non-invasive human sensor technologies, like brain–computer interfaces (BCI), emotion recognition has piqued the interest of academics from a variety of domains. Facial expressions, speech, behavior (gesture/posture), and physiological signals can all be used to identify human emotions. However, the first three may be ineffectual because people may hide their true emotions consciously or unconsciously (so-called social masking). Physiological signals can provide more accurate and objective emotion recognition. Electroencephalogram (EEG) signals respond in real time and are more sensitive to changes in affective states than peripheral neurophysiological signals. Thus, EEG signals can reveal important features of emotional states. Recently, several EEG-based BCI emotion recognition techniques have been developed. In addition, rapid advances in machine and deep learning have enabled machines or computers to understand, recognize, and analyze emotions. This study reviews emotion recognition methods that rely on multi-channel EEG signal-based BCIs and provides an overview of what has been accomplished in this area. It also provides an overview of the datasets and methods used to elicit emotional states. According to the usual emotional recognition pathway, we review various EEG feature extraction, feature selection/reduction, machine learning methods (e.g., k-nearest neighbor), support vector machine, decision tree, artificial neural network, random forest, and naive Bayes) and deep learning methods (e.g., convolutional and recurrent neural networks with long short term memory). In addition, EEG rhythms that are strongly linked to emotions as well as the relationship between distinct brain areas and emotions are discussed. We also discuss several human emotion recognition studies, published between 2015 and 2021, that use EEG data and compare different machine and deep learning algorithms. Finally, this review suggests several challenges and future research directions in the recognition and classification of human emotional states using EEG.
 
Article
Images can convey intense affective experiences and affect people on an affective level. With the prevalence of online pictures and videos, evaluating emotions from visual content has attracted considerable attention. Affective image recognition aims to classify the emotions conveyed by digital images automatically. The existing studies using manual features or deep networks mainly focus on low-level visual features or high-level semantic representation without considering all factors. To better understand how deep networks are working for affective recognition tasks, we investigate the convolutional features by visualization them in this work. Our research shows that the hierarchical CNN model mainly relies on deep semantic information while ignoring the shallow visual details, which are essential to evoke emotions. To form a more general and discriminative representation, we propose a multi-level hybrid model that learns and integrates the deep semantics and shallow visual representations for sentiment classification. In addition, this study shows that class imbalance would affect performance as the main category of the affective dataset will overwhelm training and degenerate the deep networks. Therefore, a new loss function is introduced to optimize the deep affective model. Experimental results on several affective image recognition datasets show that our model outperforms various existing studies. The source code is publicly available.
 
Article
Achieving sustainable profit advantage, cost reduction and resource utilization are always a bottleneck for resource providers, especially when trying to meet the computing needs of resource hungry applications in mobile edge-cloud (MEC) continuum. Recent research uses metaheuristic techniques to allocate resources to large-scale applications in MECs. However, some challenges attributed to the metaheuristic techniques include entrapment at the local optima caused by premature convergence and imbalance between the local and global searches. These may affect resource allocation in MECs if continually implemented. To address these concerns and ensure efficient resource allocation in MECs, we propose a fruit fly-based simulated annealing optimization scheme (FSAOS) to serve as a potential solution. In the proposed scheme, the simulated annealing is incorporated to balance between the global and local search and to overcome its premature convergence. We also introduce a trade-off factor to allow application owners to select the best service quality that will minimize their execution cost. Implementation of the FSAOS is carried out on EdgeCloudSim Simulator tool. Simulation results show that the FSAOS can schedule resources effectively based on tasks requirement by returning minimum makespan and execution costs, and achieve better resource utilization compared to the conventional fruit fly optimization algorithm and particle swarm optimization. To further unveil how efficient the FSAOSs, a statistical analysis based on 95% confidential interval is carried out. Numerical results show that FSAOS outperforms the benchmark schemes by achieving higher confidence level. This is an indication that the proposed FSAOS can provide efficient resource allocation in MECs while meeting customers’ aspirations as well as that of the resource providers.
 
Article
High-impedance faults (HIFs) exhibit low current amplitude and highly diverse characteristics, which make them difficult to be detected by conventional overcurrent relays. Various machine learning (ML) techniques have been proposed to detect and classify HIFs; however, these approaches are not reliable in presence of diverse HIF and non-HIF conditions and, moreover, rely on resource-intensive signal processing techniques. Consequently, this paper proposes a novel HIF detection and classification approach based on a state-of-the-art deep learning model, the transformer network, stacked with the Convolutional neural network (CNN). While the transformer network learns the complex HIF pattern in the data, the CNN enhances the generalization to provide robustness against noise. A kurtosis analysis is employed to prevent false detection of non-fault disturbances (e.g., capacitor and load switching) and nonlinear loads as HIFs. The performance of the proposed HIF detection and classification approach is evaluated using the IEEE 13-node test feeder. The results demonstrate that the proposed protection method reliably detects and classifies HIFs, is robust against noise, and outperforms the state-of-the-art techniques.
 
Article
Traffic accidents as a result of driver fatigue and drowsiness have caused many injuries and deaths. Therefore, driver fatigue detection and prediction system have been recognized as important potential research areas to prevent accidents caused by fatigue and drowsiness while driving. In this study, driver fatigue is determined by using behavior-based measurement information. Recent studies show that deep neural network is trending state-of-the-art machine learning approaches. Hence, we propose the deep belief network (DBN) model, a deep learning type, used for classification of the symptoms of fatigue in this study. DBN structure is a kind of neural network. The number of hidden layers within the network and the number of units in each hidden layer play important roles in the design of any neural network. Therefore, the hidden layer and the count of units in the DBN model designed in this paper have been selected as a result of various experiments. A greedy method has been adopted to adjust the structure of the deep belief network. Subsequently, the proposed DBN architecture test on KOU-DFD, YawDD and Nthu-DDD datasets. Comparative and experimental results concluded that the proposed DBN architecture is as robust as the other approaches found in the literature and achieves an accuracy rate of approximately 86%.
 
Article
A new multi-attention based method for solving the MIL problem (MAMIL), which takes into account the neighboring patches or instances of each analyzed patch in a bag, is proposed. In the method, one of the attention modules takes into account adjacent patches or instances, several attention modules are used to get a diverse feature representation of patches, and one attention module is used to unite different feature representations to provide an accurate classification of each patch (instance) and the whole bag. Due to MAMIL, a combined representation of patches and their neighbors in the form of embeddings of a small dimensionality for simple classification is realized. Moreover, different types of patches are efficiently processed, and a diverse feature representation of patches in a bag by using several attention modules is implemented. A simple approach for explaining the classification predictions of patches is proposed. Numerical experiments with various datasets illustrate the proposed method.
 
Article
Single image rain streaks removal is a great challenging task in computer vision due to the uncertainty of the shape and size of rain streaks. Current methods attempted to adopt complex optimization processes or progressive refinement schemes. But these methods cause a significant impact on the efficiency of many real-time demanding applications. To address this problem, we propose a multi-level transformer deraining network which is an efficient single image rain removal model. Specifically, an efficient deraining network is constructed to extract rain streaks. We then employ cascade networks to extract feature information from deep high-level to shallow low-level layers. In addition, the multi-head self-attention mechanism is applied to extracting global information in the feature map at each level, which can highly improve the representational ability for rain streaks. Experimental results on both synthetic and real-world datasets have demonstrated the efficacy of our method, which uses less time costs and obtains comparable results in comparison to the state-of-the-art methods.
 
Article
COVID-19 has taken a toll on the entire world, rendering serious illness and high mortality rate. In the present day, when the globe is hit by a pandemic, those suspected to be infected by the virus need to confirm its presence to seek immediate medical attention to avoid adverse outcomes and also to prevent further transmission of the virus in their close contacts by ensuring timely isolation. The most reliable laboratory testing currently available is the reverse transcription–polymerase chain reaction (RT-PCR) test. Although the test is considered gold standard, 20–25% of results can still be false negatives, which has lately led physicians to recommend medical imaging in specific cases. Our research examines the aspect of chest imaging as a method to diagnose COVID-19. This work is not directed to establish an alternative to RT-PCR, but to aid physicians in determining the presence of virus in medical images. As the disease presents lung involvement, it provides a basis to explore computer vision for classification in radiographic images. In this paper, authors compare the performance of various models, namely ResNet-50, EfficientNetB0, VGG-16 and a custom convolutional neural network (CNN) for detecting the presence of virus in chest computed tomography (CT) scan and chest X-ray images. The most promising results have been derived by using ResNet-50 on CT scans with an accuracy of 98.9% and ResNet-50 on X-rays with an accuracy of 98.7%, which offer an opportunity to further explore these methods for prospective use.
 
Denge SBB
Denge software screen after the performance is completed (left foot)
Denge data in .csv file format (Sampled from original data)
Training package names for the left and right foot
Trained DDSS model (n=128, m=64)
Article
Monitoring the balance conditions and physical abilities of athletes is important to track their current situations which enables us to apply appropriate training programs for recovery. For different branches of sports, there are three main balance board types to be used; not swaying board (i.e. Wii board), semi-spherical fulcrum (i.e. Wobble board), and springboard (i.e. Spring Balance Board). In this study, the Balance springboard, which is new to the literature, is used. The springboard equipped with sensors uses Bluetooth technology to transmit collected balance data. There are various previous studies developed for assessing the balance performance of athletes regarding the first two types of balance-boards. Most of them are based on statistical analysis and machine learning (ML) techniques. In this study, the usage of a shallow deep learning model trained with the balance data, which is a contribution to the literature, gathered from the springboard is introduced. This model (DDSS, Denge Decision Support System) is compared with the base ANN model -which leads the study to tend our DDSS model- and ML techniques. Our DDSS model outperforms when compared with the base ANN and ML techniques, Sequential Minimal Optimization and Random Forest, and offers appropriate training program suggestions with a success rate of 92.11%.
 
Article
Motivated by the merit of twin support vector machine (TSVM), this paper presents an improved parametric-margin Universum twin support vector machine (PM-U-TSVM), which utilizes the prior knowledge contained in the Universum samples to improve the classification performance and exploits the parametric-margin strategy to be suitable for error structure to enhance the representation ability of the TSVM. Specifically, in contrast to the classic SVM-type methods that need to solve a single large-scale QPP problem, the proposed PM-U-TSVM extends twin SVM learning model by determining a pair of smaller size non-parallel parameter margin hyperplanes to provide a more flexible parametric-margin structure for input data, and analyzes the prior information ensconced in Universum to fully exploit the latent useful knowledge to construct the final classifier. This joint learning strategy can eventually help our model perform better in terms of effectiveness and robustness. Furthermore, the kernel extension of the PM-U-TSVM is proposed to deal with the nonlinear case. Experimental results on several datasets show that the proposed PM-U-TSVM not only achieves higher classification accuracy but also has better generalization performance when dealing with noisy classification problems.
 
Article
Maize is one of the world's most important food crops, but its cultivation is hampered by diseases. Rapid disease identification remains a challenge due to a lack of the necessary infrastructure. This necessitates the development of automated methods to identify diseases. In this research, the use of deep learning models to identify maize leaf diseases is proposed. In this article, we investigate the transfer learning of deep convolutional neural networks for the detection of maize leaf diseases and explore employing the knowledge of pre-trained models and then transferring the knowledge to our dataset. In this attempt, we employ pre-trained VGG16, ResNet50, InceptionV3, and Xception models to classify three common maize leaf diseases using a dataset of 18,888 images of healthy and diseased leaves. Besides, Bayesian optimization is used to choose optimal values for hyperparameters, and image augmentation is used to improve the model's ability to generalize. The work includes a comparative study and analysis of the proposed models. The results demonstrate that all trained models have an accuracy of more than 93% in classifying maize leaf diseases. In particular, VGG16, InceptionV3, and Xception achieved an accuracy of more than 99%. Furthermore, our methodology provides new avenues for the detection of maize leaf diseases.
 
Average validation Accuracy (Val_Accuracy) of AF results for MNIST dataset
Average validation Accuracy (Val_Accuracy) of AF results for Fashion MNIST dataset
Average validation Accuracy (Val_Accuracy) values of AF results for CIFAR-10 dataset
Average validation Accuracy (Val_Accuracy) values of AF results for SVHN dataset
Article
Activation functions (AFs) are the basis for neural network architectures used in real-world problems to accurately model and learn complex relationships between variables. They are preferred to process the input information coming to the network and to produce the corresponding output. The kernel-based activation function (KAF) offers an extended version of ReLU and sigmoid AFs. Therefore, KAF faced with the problems of bias shift originating from the negative region, vanishing gradient, adaptability, flexibility, and neuron death in parameters during the learning process. In this study, hybrid KAF + RSigELUS and KAF + RSigELUD AFs, which are extended versions of KAF, are proposed. In the proposed AFs, the gauss kernel function is used. The proposed KAF + RSigELUS and KAF + RSigELUD AFs are effective in the positive, negative, and linear activation regions. Performance evaluations of them were conducted on the MNIST, Fashion MNIST, CIFAR-10, and SVHN benchmark datasets. The experimental evaluations show that the proposed AFs overcome existing problems and outperformed ReLU, LReLU, ELU, PReLU, and KAF AFs.
 
Article
Assessment of crop maturity and quality is pivotal in the food industry and for harvesting. The manual classification of crops based on their maturity levels for harvesting and packaging purpose is a tedious process. However, the emergence of machine learning/deep learning techniques has opened up the ways in this direction, but its practical success is still limited. In this research, we examined two convolutional neural network paradigms (i.e., AlexNet and VGG16) utilizing a transfer-learning approach for classifying the jujube fruits based on their maturity level (i.e., unripe, ripe, and over-ripe). The training/testing of models was performed over the collected dataset of around 400 images, which was further augmented to 4398 images collectively for the three maturity classes. The best accuracy achieved for the correct classification of maturity classes with AlexNet and VGG16 for the actual and augmented images are 94.17% & 97.65%, and 98.26% & 99.17% respectively. The examined models were compared with two existing methods for jujube maturity classification and found to be performing better. The significantly improved success rate of VGG16 models over the AlexNet and existing proposed models for jujube classification makes the model recommendable for developing an efficient system for the automated harvesting and sorting of jujube fruits.
 
Article
Detecting cardiac abnormalities between 14 and 28 weeks of gestation with an apical four-chamber view is a difficult undertaking. Several unfavorable factors can prevent such detection, such as the fetal heart’s relatively small size, unclear appearances in anatomical structures (e.g., shadows), and incomplete tissue boundaries. Cardiac defects without segmentation are not always straightforward to detect, so using only segmentation cannot produce defect interpretation. This paper proposes an improved semantic segmentation approach that uses a region proposal network for septal defect detection and combines two processes: contour segmentation with U-Net architecture and defect detection with Faster-RCNN architecture. The model is trained using 764 ultrasound images that include three abnormal conditions (i.e., atrial septal defect, ventricular septal defect, and atrioventricular septal defect) and normal conditions from an apical four-chamber view. The proposed model produces a satisfactory mean intersection over union, mean average precision, and dice similarity component metrics of about 75%, 87.80%, and 96.37%, respectively. Furthermore, the proposed model has also been validated on 71 unseen images in normal conditions and produces 100% sensitivity, which means that all normal conditions without septal defects can be detected effectively. The developed model has the potential to identify the fetal heart in normal and pathological settings accurately. The developed deep learning model's practical use in identifying congenital heart disorders has substantial future promise.
 
Article
Cardiovascular diseases cause approximately 17 million deaths each year and 31% of deaths worldwide. These diseases generally occur as myocardial infarction and heart failure. The survival status, which we used as a target in our classification study, indicates that the patient died or survived before the end of the follow-up period, which is a mean of 130 days. Various machine learning classifiers have been preferred to both predict survival of patients and rank the characteristics corresponding to the most important risk factors. For this purpose, the data set that is occurred totally 299 samples is traditionally divided into 70% for training and 30% for test cluster to be used in machine learning algorithms, with have been analyzed with many methods such as Artificial Neural Networks, Fine Gaussian SVM, Fine KNN, Weighted KNN, Subspace KNN, Boosted Trees, and Bagged Trees. As a result, according to the data obtained, it has been seen that there are algorithms that can predict heart failure diagnosis with full accuracy (100%). Thus, it was concluded that it is appropriate to use machine learning algorithms to predict whether a heart failure patient will survive. This study has the potential to be used as a new supportive tool for doctors when predicting whether a heart failure patient will survive.
 
Article
Drowsiness is the principal cause of road crashes nowadays, as per the existing data. Drowsiness may put many precious lives in jeopardy. Drowsiness may be detected early and accurately, which can save lives. Using computer vision and deep learning techniques, this research proposes a new approach to detect driver drowsiness at an early stage with improved accuracy. In our developed model, we have considered the most significant temporal features such as head pose angles (Yaw, Pitch, and Roll), centers of pupil movement, and distance for the emotional feature that help in the detection of drowsiness state more accurately. Our method solves the possibility of occluded frames at initial stage via imposing the occlusion criteria depending on the relationship of distance between pupil centers and the horizontal length of the eye. As a result, it outperformed existing approaches in terms of overall system accuracy and consistency. Furthermore, retrieved features from correct frames are used as training and test data by the long short-term memory network to classify the driver's state. Here, results are elaborated in terms of area under the curve-receiver operating characteristic curve scores.
 
Top-cited authors
Muhammad Asif Zahoor Raja
  • COMSATS University Islamabad, Attock Campus, Attock, Pakistan
Danial Jahed Armaghani
  • University of Technology Sydney
Jianqiang Wang
  • Central South University
Nilanjan Dey
  • JIS University
U Rajendra Acharya