Conference Paper

Imagenet classification with deep convolutional neural networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... One effective method to solve this problem is the technique of data augmentation, which helps CNNs to learn invariant features from augmented samples with abundant variations of the original samples. Almost all successful models such as AlexNet [26], VGG [44], and ResNet [18] adopt data augmentation before training. ...
... Data augmentation has become a necessary component for training convolutional neural networks. For example, Alexnet [26], VGG [44] and ResNet [18] all consist of image translation, horizontal reflection and altering intensities for data augmentation. In addition to these basic augmentation methods, many sophisticated methods have been proposed. ...
Article
Full-text available
In the process of training convolutional neural networks, the training data is often insufficient to obtain ideal performance and encounters the overfitting problem. To address this issue, traditional data augmentation (DA) techniques, which are designed manually based on empirical results, are often adopted in supervised learning. Essentially, traditional DA techniques are in the implicit form of feature engineering. The augmentation strategies should be designed carefully, for example, the distribution of augmented samples should be close to the original data distribution. Otherwise, it will reduce the performance on the test set. Instead of designing augmentation strategies manually, we propose to learn the data distribution directly. New samples can then be generated from the estimated data distribution. Specifically, a deep DA framework is proposed which consists of two neural networks. One is a generative adversarial network, which is used to learn the data distribution, and the other one is a convolutional neural network classifier. We evaluate the proposed model on a handwritten Chinese character dataset and a digit dataset, and the experimental results show it outperforms baseline methods including one manually well-designed DA method and two state-of-the-art DA methods.
... Neural networks that date back to the 1940s [10] have grown their impact tremendously in the last decade. Today, we encounter many deep neural network applications in various fields because of the huge amount of data and computational power we have today [11,12,13]. The gathering of computing power with various kinds of big data has greatly helped the recent rise of deep learning. ...
... We know that as the amount of training data increases, neural networks perform better and their generalization ability improves [11,31,32]. On the other side, data collection has a cost, or it may not be possible to collect large enough data. ...
Article
Full-text available
Neural networks have become popular in many fields of science since they serve as promising, reliable and powerful tools. In this work, we study the effect of data augmentation on the predictive power of neural network models for nuclear physics data. We present two different data augmentation techniques, and we conduct a detailed analysis in terms of different depths, optimizers, activation functions and random seed values to show the success and robustness of the model. Using the experimental uncertainties for data augmentation for the first time, the size of the training data set is artificially boosted and the changes in the root-mean-square error between the model predictions on the test set and the experimental data are investigated. Our results show that the data augmentation decreases the prediction errors, stabilizes the model and prevents overfitting. The extrapolation capabilities of the MLP models are also tested for newly measured nuclei in AME2020 mass table, and it is shown that the predictions are significantly improved by using data augmentation.
... In comparison to six other networks of deep learning and four published work, the proposed work proves the efficiency of the system. The proposed MLIIF system (Algorithm-7, A7) is compared with: (a) existing DNN and (b) published works as follows: -Existing DNN networks such as VGG-19 [4] as Algorithm-1(A1), VGG-16 [28] as Algorithm-2 (A2), VGG-13 [29] as Algorithm-3(A3), VGG-11 [11] as Algorithm-4(A4), AlexNet [16] as Algorithm-5 (A5), and SqueezeNet [10] as Algorithm-6(A6). -Published works such as AlgorithmRefered-1(Ar1) [14], AlgorithmRefered-2(Ar2) [32], AlgorithmRefered-3 (Ar3) [7], and AlgorithmRefered-4(Ar4) [8]. ...
... Existing algorithm MLIIF 1 Pre-trained existing VGG-11 [11], VGG-13 [29], VGG-16 [28], VGG-19 [4], Squeezenet [10] and Alexnet [16] The hybrid MLIIF can provide more features and textured information than single DNN's 2 ...
Article
Full-text available
Recently, deep learning has high popularity in the field of image processing due to its unique feature extraction property. This paper, proposes a novel multi-layer, multi-tier system called Multi-Layer Intelligent Image Fusion(MLIIF) with deep learning(DL) networks for visually enhanced medical images through fusion. Implemented deep feature based multilayer fusion strategy for both high frequency and low frequency components to obtain more informative fused image from the source image sets. The hybrid MLIIF consists of VGG-19, VGG-11, and Squeezenet DL networks for different layer deep feature extraction from approximation and detailed frequency components of the source images. The robustness of the proposed multi-layer, multi-tier fusion system is validated by subjective and objective analysis. The effectiveness of the proposed MLIIF system is evaluated by error image calculation with the ground truth image and thus accuracy of the system. The source images utilized for the experimentations are collected from the website www.med.harvard.edu and the proposed MLIIF system obtained an accuracy of 95%. The experimental findings indicate that the proposed system outperforms compared with existing DL networks.
... Most traditional supervised and unsupervised machine learning algorithms deal with classification, regression, and representation learning of independent and identically distributed (i.i.d.) data instances. If there is a regular structure, such as the regular grid present in images, convolutional neural networks (CNNs) [130,131] constitute a highly successful learning approach by exploiting local translational invariance of such data to extract rich features. Similarly, recurrent neural networks (RNNs) [132,133] are applied widely in addressing learning tasks concerning sequential data. ...
... Subsequently, a SAE is employed to generate high-level representations, which are used as inputs of a LSTM model for forecasting. The long and short-term time-series network (LSTNet) [240] approach combines an RNN with a convolutional neural network (CNN) [130,131]. The first layer consists of a CNN without pooling, which strives to learn time-localized patterns. ...
Thesis
Full-text available
Computational Bayesian inference has numerous applications in many branches of signal processing and machine learning. Bayesian techniques allow for principled modeling of uncertainty in the design of inference algorithms and offer better empirical performance than their non-Bayesian counterparts in diverse problem settings. However, the posterior distribution of the quantity of interest is not analytically tractable in most practical inference tasks. Approximating the posterior distribution using Monte Carlo methods provides an avenue for implementing Bayesian approaches in these cases. Such approximations often require tremendous computational efforts and scale poorly to complex, high-dimensional settings. Designing scalable and computationally efficient Monte Carlo methods for Bayesian inference is thus of paramount importance in many tasks, and is the main topic of this thesis. The main novel contributions of this thesis can be organized into the following three categories. First, we address the task of sequential inference of the state of a hidden Markov model in the presence of Gaussian mixture distributed noises. In this setting, the filtering distribution is multi-modal and existing techniques fall short in providing accurate state estimates. We propose several particle flow based algorithms which are suitable for this scenario and offer significant improvement compared to the current state-of-the-art filtering techniques. Second, we develop a general Bayesian Graph Convolutional Neural Network (BGCN) framework and apply it in a semi-supervised node classification problem. Viewing the observed graph as a realization from a parametric family of random graphs, the Bayesian approach targets inference of the joint posterior of the random graph parameters and the node labels. We also propose an extension of the BGCN by incorporating a non-parametric graph inference approach, which extends the applicability of the Bayesian framework to other learning tasks beyond node classification. Third, we tackle a probabilistic spatio-temporal forecasting task, where utilizing the spatial relationships among the time-series is beneficial for accurate forecasting. We treat the time series data as a random realization from a nonlinear state-space model and target Bayesian inference of the hidden states for probabilistic forecasting. Particle flow is used as the tool for posterior inference of the states, due to its effectiveness in complex, high-dimensional scenarios. Our approach provides a better characterization of uncertainty while maintaining comparable accuracy to the state-of-the-art point forecasting methods.
... Other terms that appear in this cluster, such as "computer vision" and "image processing," refer to the methods used to process the millions of astronomical images that are produced every day (through, among other instruments, telescopes) and from which it is possible to obtain knowledge on the underlying physical processes (Müller et al., 2018) and determine the classification of the objects that appear (Ma et al., 2019). These methods are usually based on CNN (Dieleman et al., 2015;Hoyle, 2016) due to the power of this technique when classifying (Krizhevsky et al., 2012). ...
Article
Full-text available
Since the beginning of the 21st century, the fields of astronomy and astrophysics have experienced significant growth at observational and computational levels, leading to the acquisition of increasingly huge volumes of data. In order to process this vast quantity of information, artificial intelligence (AI) techniques are being combined with data mining to detect patterns with the aim of modeling, classifying or predicting the behavior of certain astronomical phenomena or objects. Parallel to the exponential development of the aforementioned techniques, the scientific output related to the application of AI and machine learning (ML) in astronomy and astrophysics has also experienced considerable growth in recent years. Therefore, the increasingly abundant articles make it difficult to monitor this field in terms of which research topics are the most prolific or novel, or which countries or authors are leading them. In this article, a text‐mining‐based scientometric analysis of scientific documents published over the last three decades on the application of AI and ML in the fields of astronomy and astrophysics is presented. The VOSviewer software and data from the Web of Science (WoS) are used to elucidate the evolution of publications in this research field, their distribution by country (including co‐authorship), the most relevant topics addressed, and the most cited elements and most significant co‐citations according to publication source and authorship. The obtained results demonstrate how application of AI/ML to the fields of astronomy/astrophysics represents an established and rapidly growing field of research that is crucial to obtaining scientific understanding of the universe. This article is categorized under: Algorithmic Development > Text Mining Technologies > Machine Learning Application Areas > Science and Technology
... In industry deep learning [1] application, we use the re-label method [2] to achieve more than 95% accuracy at dev dataset. But under human evaluation for industry application inference, the accuracy is only around 90%-94%. ...
Preprint
Full-text available
In industry deep learning application, we should fix the badcase after we achieve more than 95% accuracy at dev dataset. The badcase reason is from the wrong rule/knowledge of human labeling and will cause low accuracy under human evaluation. In this paper, we propose the pattern-based method to fix the badcase for industry application inference. We propose the pipeline to solve the problem and improve the accuracy of human evaluation. The experiment results verify our idea.
... For several computer vision applications, including picture classification and object identification, deep learning has recently shown remarkable improvements (Krizhevsky et al. 2012) (Chugh and Jain 2021;Chen et al. 2020;Zhang et al. 2020;Daniel 2021;Xu et al. 2021). While there have been major advances in face recognition, DeepFace, DeepIDs, Visual Geometry Group (VGG) Face, FaceNet, SphereFace and ArcFace have also found tremendous success. ...
Article
Full-text available
With the rapid technological development of video transmission, it is applied on various applications such as security, forgery detection and surveillance. Moreover, the images and videos supply a huge volume of variations in intra-personal and also make the face recognition significant in a biometric security area. In addition, automatic face recognition process is a widely applied concept in security. The spoofing attacks account for reproducing a human face by applying photographs or videos. The face recognition and spoof detection processes are performed by using machine learning and deep learning algorithms by analysing the images in videos. For the purpose of enhancing the prediction accuracy, we propose a new hybrid deep learning technique called hybrid convolutional neural network (CNN)-based architecture with long short-term memory (LSTM) units to study the impact in classification. This hybrid approach uses a non-softmax function for making effective decision on classification. The experiments have been performed for evaluating the proposed technique and proved better than the existing deep learning algorithms in terms of precision, recall, f-measure and accuracy.
... C. Implementation Details. In the encoder module, we considered two typical network architectures, i.e., ResNet-152 [10] and Inception-ResNet-v2 [22] as the image feature extractors, which are both pre-trained on the Ima-geNet dataset [16]. The extracted image features are 2048dimension and 1536-dimension, respectively. ...
Article
Full-text available
Video captioning is an important problem involved in many applications. It aims to generate some descriptions of the content of a video. Most of existing methods for video captioning are based on the deep encoder–decoder models, particularly, the attention-based models (say Transformer). However, the existing transformer-based models may not fully exploit the semantic context, that is, only using the left-to-right style of context but ignoring the right-to-left counterpart. In this paper, we introduce a bidirectional (forward-backward) decoder to exploit both the left-to-right and right-to-left styles of context for the Transformer-based video captioning model. Thus, our model is called bidirectional Transformer (dubbed BiTransformer). Specifically, in the bridge of the encoder and forward decoder (aiming to capture the left-to-right context) used in the existing Transformer-based models, we plug in a backward decoder to capture the right-to-left context. Equipped with such bidirectional decoder, the semantic context of videos will be more fully exploited, resulting in better video captions. The effectiveness of our model is demonstrated over two benchmark datasets, i.e., MSVD and MSR-VTT,via comparing to the state-of-the-art methods. Particularly, in terms of the important evaluation metric CIDEr, the proposed model outperforms the state-of-the-art models with improvements of 1.2% in both datasets.
... For the classification task here, we adopted CNNs, which have been widely used in the research community for image classification and segmentation in recent years (Lawrence et al., 1997;Krizhevsky et al., 2012;LeCun, 2021). The ability of . ...
Article
Full-text available
Food samples are routinely screened for food-contaminating beetles (i.e., pantry beetles) due to their adverse impact on the economy, environment, public health and safety. If found, their remains are subsequently analyzed to identify the species responsible for the contamination; each species poses different levels of risk, requiring different regulatory and management steps. At present, this identification is done through manual microscopic examination since each species of beetle has a unique pattern on its elytra (hardened forewing). Our study sought to automate the pattern recognition process through machine learning. Such automation will enable more efficient identification of pantry beetle species and could potentially be scaled up and implemented across various analysis centers in a consistent manner. In our earlier studies, we demonstrated that automated species identification of pantry beetles is feasible through elytral pattern recognition. Due to poor image quality, however, we failed to achieve prediction accuracies of more than 80%. Subsequently, we modified the traditional imaging technique, allowing us to acquire high-quality elytral images. In this study, we explored whether high-quality elytral images can truly achieve near-perfect prediction accuracies for 27 different species of pantry beetles. To test this hypothesis, we developed a convolutional neural network (CNN) model and compared performance between two different image sets for various pantry beetles. Our study indicates improved image quality indeed leads to better prediction accuracy; however, it was not the only requirement for achieving good accuracy. Also required are many high-quality images, especially for species with a high number of variations in their elytral patterns. The current study provided a direction toward achieving our ultimate goal of automated species identification through elytral pattern recognition.
... All studies on analysis of photographic images used deep learning (neural networks with more than one hidden layer), the most popular architecture of which being VGG neural networks 17,22,25,26,30,51 . This is perhaps unsurprising since VGGNet was developed as an extension of the revolutionary AlexNet 54,55 . ...
Article
Full-text available
Machine learning (ML) algorithms are becoming increasingly pervasive in the domains of medical diagnostics and prognostication, afforded by complex deep learning architectures that overcome the limitations of manual feature extraction. In this systematic review and meta-analysis, we provide an update on current progress of ML algorithms in point-of-care (POC) automated diagnostic classification systems for lesions of the oral cavity. Studies reporting performance metrics on ML algorithms used in automatic classification of oral regions of interest were identified and screened by 2 independent reviewers from 4 databases. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed. 35 studies were suitable for qualitative synthesis, and 31 for quantitative analysis. Outcomes were assessed using a bivariate random-effects model following an assessment of bias and heterogeneity. 4 distinct methodologies were identified for POC diagnosis: (1) clinical photography; (2) optical imaging; (3) thermal imaging; (4) analysis of volatile organic compounds. Estimated AUROC across all studies was 0.935, and no difference in performance was identified between methodologies. We discuss the various classical and modern approaches to ML employed within identified studies, and highlight issues that will need to be addressed for implementation of automated classification systems in screening and early detection.
... Most well-known CNN architectures have been built for image recognition. For example, AlexNet [52], VGG-16 [53] and ResNet-34 [54], each showed excellent performance, in some cases superior to human performance, for detecting patterns in images. The use of CNN architectures for diagnosis and classification from multivariate time series data has a similar set of commonly used architectures. ...
Article
Full-text available
Background Cardiopulmonary exercise testing (CPET) provides a reliable and reproducible approach to measuring fitness in patients and diagnosing their health problems. However, the data from CPET consist of multiple time series that require training to interpret. Part of this training teaches the use of flow charts or nested decision trees to interpret the CPET results. This paper investigates the use of two machine learning techniques using neural networks to predict patient health conditions with CPET data in contrast to flow charts. The data for this investigation comes from a small sample of patients with known health problems and who had CPET results. The small size of the sample data also allows us to investigate the use and performance of deep learning neural networks on health care problems with limited amounts of labeled training and testing data. Methods This paper compares the current standard for interpreting and classifying CPET data, flowcharts, to neural network techniques, autoencoders and convolutional neural networks (CNN). The study also investigated the performance of principal component analysis (PCA) with logistic regression to provide an additional baseline of comparison to the neural network techniques. Results The patients in the sample had two primary diagnoses: heart failure and metabolic syndrome. All model-based testing was done with 5-fold cross-validation and metrics of precision, recall, F1 score, and accuracy. As a baseline for comparison to our models, the highest performing flow chart method achieved an accuracy of 77%. Both PCA regression and CNN achieved an average accuracy of 90% and outperformed the flow chart methods on all metrics. The autoencoder with logistic regression performed the best on each of the metrics and had an average accuracy of 94%. Conclusions This study suggests that machine learning and neural network techniques, in particular, can provide higher levels of accuracy with CPET data than traditional flowchart methods. Further, the CNN performed well with a small data set showing that these techniques can be designed to perform well on small data problems that are often found in health care and the life sciences. Further testing with larger data sets is needed to continue evaluating the use of machine learning to interpret CPET data.
... Transfer learning is also utilized in [33] for document image quality assessment. The pre-trained AlexNet [28] model is used in this study. The authors used a two-step strategy based on transfer learning. ...
Article
Full-text available
Recently, Convolutional Neural Networks with 3D kernels (3D CNNs) have shown great superiority over 2D CNNs for video processing applications. In the field of Stereoscopic Video Quality Assessment (SVQA), 3D CNNs are utilized to extract the spatio-temporal features from the stereoscopic video. Besides, the emergence of substantial video datasets such as Kinetics has made it possible to use pre-trained 3D CNNs in other video-related fields. In this paper, we fine-tune 3D Residual Networks (3D ResNets) pre-trained on the Kinetics dataset for measuring the quality of stereoscopic videos and propose a no-reference SVQA method. Specifically, our aim is twofold: Firstly, we answer the question: can we use 3D CNNs as a quality-aware feature extractor from stereoscopic videos or not. Secondly, we explore which ResNet architecture is more appropriate for SVQA. Experimental results on two publicly available SVQA datasets of LFOVIAS3DPh2 and NAMA3DS1-COSPAD1 show the effectiveness of the proposed transfer learning-based method for SVQA that provides the RMSE of 0.332 in LFOVIAS3DPh2 dataset. Also, the results show that deeper 3D ResNet models extract more efficient quality-aware features.
... In mobile applications, it is desired to limit the number of DNN parameters to achieve smaller memory footprint and better performance. SqueezeNet [21] was one of the first to achieve AlexNet [23] level accuracy with 50 times fewer parameters. Currently, several light networks [11,26] with small models were introduced. ...
Article
Full-text available
We present CrossInfoMobileNet, a hand pose estimation convolutional neural network based on CrossInfoNet, specifically tuned to mobile phone processors through the optimization, modification, and replacement of computationally critical CrossInfoNet components. By introducing a state-of-the-art MobileNetV3 network as a feature extractor and refiner, replacing ReLU activation with a better performing H-Swish activation function, we have achieved a network that requires 2.37 times less multiply-add operations and 2.22 times less parameters than the CrossInfoNet network, while maintaining the same error on the state-of-the-art datasets. This reduction of multiply-add operations resulted in an average 1.56 times faster real-world performance on both desktop and mobile devices, making it more suitable for embedded applications. The full source code of CrossInfoMobileNet including the sample dataset and its evaluation is available online through Code Ocean.
... It includes 18 convolutional layers and uses skip connection function to add output from one layer to the next layer, which can ease the training load of deeper networks and mitigate the gradient vanishing problem. We explore several models, including AlexNet [23], ResNet18, ResNet34, and Xception [24]. ResNet18 achieves the best performance in malware detection over our ME dataset. ...
Conference Paper
Full-text available
Detecting malware by analyzing raw bytes of programs using deep neural networks, also referred to as end-to-end malware detection, is considered as a new promising approach to simplify feature selection in static analysis while still provide accurate detection. Unfortunately, recent studies show that evasion attacks can modify raw bytes of malware and force a well-trained detector to predict the crafted malware as benign. In this paper, we propose a new evasion attack and validate this vulnerability of end-to-end malware detection in the context of multiple detectors, where our evasion attack MultiEvasion can defeat 2 (and even 3) classifiers simultaneously without affecting functionalities of malware. This raises emerging alters to end-to-end malware detection as running multiple classifiers was considered as one of the major countermeasures against evasion attacks. Specifically, our experimental results over real-world datasets show that our proposed attack can achieve 99.5% evasion rate against 2 classifiers and 18.3% evasion rate against 3 classifiers. Our findings suggest that the robustness and reliability of end-to-end malware detection, especially the ones based on neural networks, need to be carefully examined before apply it in the real-world applications. (preprint version)
... Figure 1 shows a simplified representation of a few common deep learning architectures, applicable to visual imagery [9]. Figure 1 shows a schematic representation of two examples of the most commonly used networks. As can be seen, in Figure 1, one type of deep neural network architecture can also form the backbone of more sophisticated architectures for advanced applications [5], [7], [8], [9]. In this paper, the CNN architecture of interest is U-net. ...
Preprint
Full-text available
In this paper the main objective is to determine the best size of late gadolinium enhancement (LGE)-magnetic resonance imaging (MRI) images for the training dataset to achieve optimal deep learning training outcomes. To determine the new size of LGE-MRI images from the reference training dataset, non-extra pixel and extra pixel interpolation algorithms are used. A novel strategy-based on thresholding, median filtering, and subtraction operations-is introduced and applied to remove extra class labels in interpolated ground truth (GT) segmentation masks. Fully automated quantification is achieved using the expectation maximization, weighted intensity, a priori information (EWA) algorithm, and the outcome of automatic semantic segmentation of LGE-MRI images with the convolutional neural network (CNN). In the experiments, common class metrics are used to evaluate the quality of semantic segmentation with a CNN architecture of interest (U-net) against the GT segmentation. Arbitrary threshold, comparison of the sums, and sums of differences are criteria or options used to estimate the relationship between semi-automatic and fully automated quantification of MI results. A close relationship between semi-automatic or manual and fully automated quantification of MI results was more identified in the case involving the dataset of bigger LGE MRI images than in that of the dataset of smaller LGE-MRI images-where the best quantification results based on the dataset of bigger LGE MRI images were 55.5% closer the manual or semi-automatic results while the best quantification results based on the dataset of smaller LGE MRI images were 22.2% closer the manual results.
Article
Deep learning techniques have achieved remarkable success in lesion segmentation and classification between benign and malignant tumors in breast ultrasound images. However, existing studies are predominantly focused on devising efficient neural network-based learning structures to tackle specific tasks individually. By contrast, in clinical practice, sonographers perform segmentation and classification as a whole; they investigate the border contours of the tissue while detecting abnormal masses and performing diagnostic analysis. Performing multiple cognitive tasks simultaneously in this manner facilitates exploitation of the commonalities and differences between tasks. Inspired by this unified recognition process, this study proposes a novel learning scheme, called the cross-task guided network (CTG-Net), for efficient ultrasound breast image understanding. CTG-Net integrates the two most significant tasks in computerized breast lesion pattern investigation: lesion segmentation and tumor classification. Further, it enables the learning of efficient feature representations across tasks from ultrasound images and the task-specific discriminative features that can greatly facilitate lesion detection. This is achieved using task-specific attention models to share the prediction results between tasks. Then, following the guidance of task-specific attention soft masks, the joint feature responses are efficiently calibrated through iterative model training. Finally, a simple feature fusion scheme is used to aggregate the attention-guided features for efficient ultrasound pattern analysis. We performed extensive experimental comparisons on multiple ultrasound datasets. Compared to state-of-the-art multi-task learning approaches, the proposed approach can improve the Dice’s coefficient, true-positive rate of segmentation, AUC, and sensitivity of classification by 11%, 17%, 2%, and 6%, respectively. The results demonstrate that the proposed cross-task guided feature learning framework can effectively fuse the complementary information of ultrasound image segmentation and classification tasks to achieve accurate tumor localization. Thus, it can aid sonographers to detect and diagnose breast cancer.
Article
Full-text available
Optical coherence tomography angiography (OCTA) is an emerging non-invasive technique for imaging the retinal vasculature. While there are many promising clinical applications for OCTA, determination of image quality remains a challenge. We developed a deep learning-based system using a ResNet152 neural network classifier, pretrained using ImageNet, to classify images of the superficial capillary plexus in 347 scans from 134 patients. Images were also manually graded by two independent graders as a ground truth for the supervised learning models. Because requirements for image quality may vary depending on the clinical or research setting, two models were trained—one to identify high-quality images and one to identify low-quality images. Our neural network models demonstrated outstanding area under the curve (AUC) metrics for both low quality image identification (AUC = 0.99, 95%CI 0.98–1.00, $$\kappa $$ κ = 0.90) and high quality image identification (AUC = 0.97, 95%CI 0.96–0.99, $$\kappa $$ κ = 0.81), significantly outperforming machine-reported signal strength (AUC = 0.82, 95%CI 0.77–0.86, $$\kappa $$ κ = 0.52 and AUC = 0.78, 95%CI 0.73–0.83, $$\kappa $$ κ = 0.27 respectively). Our study demonstrates that techniques from machine learning may be used to develop flexible and robust methods for quality control of OCTA images.
Article
As large-scale laser 3D point clouds data contains massive and complex data, it faces great challenges in the automatic intelligent processing and classification of large-scale 3D point clouds. Aiming at the problem that 3D point clouds in complex scenes are self-occluded or occluded, which could reduce the object classification accuracy, we propose a multidimension feature optimal combination classification method named MFOC-CliqueNet based on CliqueNet for large-scale laser point clouds. The optimal combination matrix of multidimension features is constructed by extracting the three-dimensional features and multidirectional two-dimension features of 3D point cloud. This is the first time that multidimensional optimal combination features are introduced into cyclic convolutional networks CliqueNet. It is important for large-scale 3D point cloud classification. The experimental results show that the MFOC-CliqueNet framework can realize the latest level with fewer parameters. The experiments on the Large-Scale Scene Point Cloud Oakland dataset show that the classification accuracy of our method is 98.9%, which is better than other classification algorithms mentioned in this paper.
Article
Long Short-Term Memory networks are making significant inroads into improving time series applications, including human action recognition. In a human action video, the spatial and temporal streams carry distinctive yet prominent information, hence many researchers turn to spatio-temporal models for human action recognition. A spatio-temporal model integrates the temporal network (e.g. Long Short-Term Memory) and spatial network (e.g. Convolutional Neural Networks). There are few challenges in the existing human action recognition: (1) the uni-directional modeling of Long Short-Term Memory making it unable to preserve the information from the future, (2) the sparse sampling strategy tends to lose prominent information when performing dimension reduction on the input of Long Short-Term Memory, and (3) the fusion strategy for consolidating the temporal network and spatial network. In view of this, we propose a Bidirectional Long Short-Term Memory with Temporal Dense Sampling and Fusion Network method to address the above-mentioned challenges. The Temporal Dense Sampling partitions the human action video into segments and then performs maxpooling operation along the temporal axis in each segment. A multi-stream bidirectional Long Short-Term Memory network is adopted to encode the long-term spatial and temporal dependencies in both forward and backward directions. Instead of assigning fixed weights to the spatial network and temporal network, we propose a fusion network where a fully-connected layer is trained to adaptively assign the weights for the networks. The empirical results demonstrate that the proposed Bidirectional Long Short-Term Memory with Temporal Dense Sampling and Fusion Network method outshines the state-of-the-art methods with an accuracy of 94.78% on UCF101 dataset and 70.72% on HMDB51 dataset.
Article
For the protection and management of coastal ecosystems, it is crucial to monitor typical coastal objects and examine their characteristics of spatial and temporal variation. There are limitations to the conventional object-oriented and spectrum-based approaches to HSRIs interpretation. The majority of recently conducted studies on semantic segmentation based on DCNNs concentrate on improving the accuracy of single objects at local scales. The completeness, generalization, and edge accuracy of the extraction and classification of multiple objects with the complex background at regional scales still need to be improved. We created a benchmark dataset CSRSD for coastal supervision using HSRIs and GIS in this study to address the aforementioned problems. In the meantime, by combining the traditional U-Net and DeepLabV3+ feature fusion strategies, we propose a novel fully connected fusion pattern by switching to deepwise separable convolution from conventional convolution and introducing spatial dropout to create a brand new CBS module. The LFCSDN, a new lightweight fully connected spatial dropout network, has been suggested. The findings demonstrate that our constructed semantic segmentation dataset, which has produced reliable results on U-Net and DeepLabV3+, can be used as a benchmark for applications based on DCNNs for coastal scenes. While maintaining high accuracy, LFCSDN can significantly reduce the number of parameters. Our suggested CBS module can increase the model’s generalization by reducing overfitting. In order to analyze the spatiotemporal characteristics of target changes in the study area, tests on expansive remote sensing imagery were also conducted. The findings can be applied to ecological restoration, coastal area mapping, and integrated management. Additionally, it serves as a resource for studies on multiscale semantic segmentation in computer vision.
Article
Knowledge Graph (KG) provides high-quality structured knowledge for various downstream knowledge-aware tasks (such as recommendation and intelligent question-answering) with its unique advantages of representing and managing massive knowledge. The quality and completeness of KGs largely determine the effectiveness of the downstream tasks. But in view of the incomplete characteristics of KGs, there is still a large amount of valuable knowledge is missing from the KGs. Therefore, it is necessary to improve the existing KGs to supplement the missed knowledge. Knowledge Graph Completion (KGC) is one of the popular technologies for knowledge supplement. Accordingly, there has a growing concern over the KGC technologies. Recently, there have been lots of studies focusing on the KGC field. To investigate and serve as a helpful resource for researchers to grasp the main ideas and results of KGC studies, and further highlight ongoing research in KGC, in this paper, we provide a all-round up-to-date overview of the current state-of-the-art in KGC. According to the information sources used in KGC methods, we divide the existing KGC methods into two main categories: the KGC methods relying on structural information and the KGC methods using other additional information. Further, each category is subdivided into different granularity for summarizing and comparing them. Besides, the other KGC methods for KGs of special fields (including temporal KGC, commonsense KGC, and hyper-relational KGC) are also introduced. In particular, we discuss comparisons and analyses for each category in our overview. Finally, some discussions and directions for future research are provided.
Article
U-Net series models have achieved considerable success in various fields such as image segmentation and image classification. However, the decoders in these models often use transposed convolution (TC) from level to level, reducing the representation ability of the features. Therefore, we proposed a DenseUNet model that adopts dense TC to ensure maximum information flow from lower layers to higher ones. DenseUNet comprises four blocks, the output of each block being transposed several times with different strides based on its size. All transposed results of the same size are concatenated together through a skip connection and then fused with the same size result of the first convolutional layer of the corresponding block. Multiscale TC operations restore the feature size at different scales to supplement the important features lost in the pooling layers. By progressively accumulating features from different paths, DenseUNet improves the representational ability of features and enhances the robustness of classification. We evaluated the model on four image classification benchmark datasets—namely, CIFAR-10, CIFAR-100, SVHN, and FMNIST—using three U-Net series models (U-Net, TernausNet, and CrackU-Net) and four classical classification models (VGG16, VGG19, ResNeXt, and DenseNet). The experimental results showed that our model has stable training performance and excellent test accuracy.
Article
Electroencephalography (EEG) has been widely used in the research of stress detection in recent years; yet, how to analyze an EEG is an important issue for upgrading the accuracy of stress detection. This study aims to collect the EEG of table tennis players by a stress test and analyze it with machine learning to identify the models with optimal accuracy. The research methods are collecting the EEG of table tennis players using the Stroop color and word test and mental arithmetic, extracting features by data preprocessing and then making comparisons using the algorithms of logistic regression, support vector machine, decision tree C4.5, classification and regression tree, random forest, and extreme gradient boosting (XGBoost). The research findings indicated that, in three-level stress classification, XGBoost had an 86.49% accuracy in the case of the generalized model. This study outperformed other studies by up to 11.27% in three-level classification. The conclusion of this study is that a stress detection model that was built with the data on the brain waves of table tennis players could distinguish high stress, medium stress, and low stress, as this study provided the best classifying results based on the past research in three-level stress classification with an EEG.
Article
Full-text available
People around the world relish variety of food that are flavourful. Spices add flavours to the food without adding any fat or calories. People have used spices for many centuries and are an integral part of our food. In addition to aroma, spices also have anti-bacterial, anti-inflammatory properties and other health-promoting properties. Recognizing spices from images is a challenging problem for a machine as they come in varying sizes and shapes, different colours, high visual similarity, and texture. The classification of spices presents useful applications in the field of Artificial Intelligence-driven food industry, e-commerce, and health care. In the billion-dollar spice industry, image classification of spices finds applications ranging from receiving, processing, labelling, and packaging them. As there is no dataset currently available for spices, in this work, a Spice10 dataset with 2000 images of spices is first created. This study aims to find out whether the accurate classification of spices is possible using computer vision technology. Instead of building models from scratch, a pre-trained transfer learning approach has been implemented in this work to classify the commonly used spices. The images in the dataset are of different sizes and have to be resized and pre-processed before using it with the transfer learning approach. Few different pre-trained networks are modified and used for the image classification of spices. The best classification average accuracy obtained by the VGG16 model is nearly 93.06% which is better than the other models. The high accuracy of the VGG16 model indicates it can be successfully used for the classification of spices.
Article
Like the way engineers designing buildings, competent generative design methods try to understand the prescriptive requirement in text and architectural sketches, apply engineering principles and develop the structural design. However, this requirement may be challenging to existing methods because they are not good at simultaneously taking text and image input and then generating designs. This study proposed an innovative design approach, TxtImg2Img, to overcome the difficulties. Based on generative adversarial networks architecture, the generator is proposed to encode, extract and fuse texts and images, and generate new design images; the discriminator is developed to judge real and fake images and texts. Consequently, TxtImg2Img is advantageous in extracting features from the multimodal text and image data, fusing the features using the Hadamard product, and generating designs to satisfy the text-image requirements after learning from a limited number of design samples. Specifically, TxtImg2Img can generate structural design images without distortion, and the corresponding structural design meets the mechanical requirements, after being trained by dozens of words and hundreds of image data. The case studies confirm performance improvement of up to 21% and that the proposed approach presents a promising breakthrough for intelligent construction.
Article
In English writing, automatic evaluation systems are becoming more and more common and are receiving more and more attention. As a teaching application, an automatic evaluation system uses information technologies such as corpus and artificial intelligence to quickly score and modify students’ writing. To expedite the realization of teaching information, this system records students’ writing processes, which are closely related to the instruction requirements of the college’s English program and meets the personalized needs by combining the current situation of English writing and teaching. This study investigates the impact of field cognitive style and automatic evaluation system on college writing training. From the perspective of cognitive style differences, this paper summarizes the application strategies of an automatic evaluation system in College English writing and teaching, to better realize the integration of information technology and subject teaching and improve students’ English writing ability and level. The deep learning-based system for grading students in the classroom is completed by this paper. The network data transmission module allows several ends to all experience stable data interaction.
Article
Knowing the modality of a medical image is crucial in understanding the characteristics of the image. Therefore, it is important to classify medical images as per their modality. The image and its accompanying text caption contain information that could help in identifying the modality of a given medical image. This work proposes an approach for medical image modality classification using visual and textual features. The proposed approach uses convolutional neural networks to extract visual features from a medical image. Word embeddings obtained from biomedical word2vec models are used to generate textual features from the image captions. Support vector machine based classifiers are then used to classify medical images using these features. We propose to use the late fusion approach to combine visual and textual features. The proposed approach performs better than the state-of-the-art methods.
Article
The goal of this paper is to classify the various cough and breath sounds of COVID-19 artefacts in the signals from dynamic real-life environments. The main reason for choosing cough and breath sounds than other common symptoms to detect COVID-19 patients from the comfort of their homes, so that they do not overload the Medicare system and therefore do not unwittingly spread the disease by regularly monitoring themselves. The presented model includes two main phases. The first phase is the sound-to-image transformation, which is improved by the Mel-scale spectrogram approach. The second phase consists of extraction of features and classification using nine deep transfer models (ResNet18/34/50/100/101, GoogLeNet, SqueezeNet, MobileNetv2, and NasNetmobile). The dataset contains information data from almost 1600 people (1185 Male and 415 Female) from all over the world. Our classification model is the most accurate, its accuracy is 99.2% according to the SGDM optimizer. The accuracy is good enough that a large set of labelled cough and breath data may be used to check the possibility for generalization. The results demonstrate that ResNet18 is the best stable model for classifying cough and breath tones from a restricted dataset, with a sensitivity of 98.3% and a specificity of 97.8%. Finally, the presented model is shown to be more trustworthy and accurate than any other present model. Cough and breath study accuracy is promising enough to put extrapolation and generalization to the test.
Article
The use of automatic systems for medical image classification has revolutionized the diagnosis of a high number of diseases. These alternatives, which are usually based on artificial intelligence (AI), provide a helpful tool for clinicians, eliminating the inter and intra-observer variability that the diagnostic process entails. Convolutional Neural Network (CNNs) have proved to be an excellent option for this purpose, demonstrating a large performance in a wide range of contexts. However, it is also extremely important to quantify the reliability of the model’s predictions in order to guarantee the confidence in the classification. In this work, we propose a multi-level ensemble classification system based on a Bayesian Deep Learning approach in order to maximize performance while providing the uncertainty of each classification decision. This tool combines the information extracted from different architectures by weighting their results according to the uncertainty of their predictions. Performance is evaluated in a wide range of real scenarios: in the first one, the aim is to differentiate between different pulmonary pathologies: controls vs bacterial pneumonia vs viral pneumonia. A two-level decision tree is employed to divide the 3-class classification into two binary classifications, yielding an accuracy of 98.19%. In the second context, performance is assessed for the diagnosis of Parkinson’s disease, leading to an accuracy of 95.31%. The reduced preprocessing needed for obtaining this high performance, in addition to the information provided about the reliability of the predictions evidence the applicability of the system to be used as an aid for clinicians.
Article
Federated Learning (FL) has achieved great success in many intelligent applications of the Internet of Vehicles (IoV), however, a large number of vehicles and increasingly size of models bring challenges to FL-empowered connected vehicles. Federated Distillation (FD) has become a novel paradigm to address communication bottlenecks by exchanging model outputs among devices rather than model parameters. In this paper we investigate several key factors that affect the communication efficiency of FD, including communication frequency, soft-labels quantization, and coding methods. Based on the findings of the preceding analysis, we propose FedDQ, a communication-efficient federated distillation method. Specifically, we propose a controlled averaging algorithm based on control variates to solve the drift problem arising from local updates. Then, we design a new quantization approach and coding method to reduce overhead for both upstream and downstream communications. Extensive experiments on image classification tasks at different levels of data heterogeneity show that our method can reduce the amount of communication required to achieve a fixed performance target by around 2 or 3 orders of magnitude compared to benchmark methods while achieving equivalent or higher classification accuracy.
Article
Rock glaciers (RG) are landforms that occur in high latitudes or elevations and — in their active state — consist of a mixture of rock debris and ice. Despite serving as a form of groundwater storage, they are an indicator for the occurrence of (former) permafrost and therefore carry significance in the research for the ongoing climate change. For these reasons, the past years have shown rising interest in the establishment of RG inventories to investigate the extent of permafrost and quantify water storages. Creating these inventories, however, usually involves manual, laborious, and subjective mapping of the landforms based on aerial image - and digital elevation model analysis. We propose an approach for RG mapping based on supervised machine learning which can help to increase the mapping efficiency and permits rapid RG mapping in vast and not yet covered areas. We found deep convolutional artificial neural networks (ANN) that are specifically designed for image segmentation (U-Net architecture) to be well suited for this classification problem. The general workflow consists of training the ANNs with orthophotos and slope maps of digital elevation models as input. The output (RG label-maps) is derived from a recently published RG inventory of the Austrian Alps that features 5769 individual RGs and was compiled manually by several scientists. To increase the generalization capabilities, we use live data augmentation during training. Based on this inventory, the ANNs have learned the average expert opinion and the RG map generated by the ANN can be used to increase the consistency and completeness of already existing RG inventories. Moreover, this ANN approach might be valuable for other landform mapping tasks beyond rock glaciers (e.g., other mass movements).
Article
Full-text available
Due to the COVID-19 pandemic, computerized COVID-19 diagnosis studies are proliferating. The diversity of COVID-19 models raises the questions of which COVID-19 diagnostic model should be selected and which decision-makers of healthcare organizations should consider performance criteria. Because of this, a selection scheme is necessary to address all the above issues. This study proposes an integrated method for selecting the optimal deep learning model based on a novel crow swarm optimization algorithm for COVID-19 diagnosis. The crow swarm optimization is employed to find an optimal set of coefficients using a designed fitness function for evaluating the performance of the deep learning models. The crow swarm optimization is modified to obtain a good selected coefficient distribution by considering the best average fitness. We have utilized two datasets: the first dataset includes 746 computed tomography images, 349 of them are of confirmed COVID-19 cases and the other 397 are of healthy individuals, and the second dataset are composed of unimproved computed tomography images of the lung for 632 positive cases of COVID-19 with 15 trained and pretrained deep learning models with nine evaluation metrics are used to evaluate the developed methodology. Among the pretrained CNN and deep models using the first dataset, ResNet50 has an accuracy of 91.46% and a F1-score of 90.49%. For the first dataset, the ResNet50 algorithm is the optimal deep learning model selected as the ideal identification approach for COVID-19 with the closeness overall fitness value of 5715.988 for COVID-19 computed tomography lung images case considered differential advancement. In contrast, the VGG16 algorithm is the optimal deep learning model is selected as the ideal identification approach for COVID-19 with the closeness overall fitness value of 5758.791 for the second dataset. Overall, InceptionV3 had the lowest performance for both datasets. The proposed evaluation methodology is a helpful tool to assist healthcare managers in selecting and evaluating the optimal COVID-19 diagnosis models based on deep learning.
Conference Paper
Full-text available
Artificial intelligence appears to be the focus of this decade. Without a question, AI plays a significant role in the current economy around the world. However, pursuing innovation or research within a business requires a fresh approach, and Artificial Intelligence can undoubtedly help. Application-oriented learning research has grown in popularity since 2009. When we refer to automation-oriented applications like robotics, the potential for current advances in "deep learning" as a general-purpose method of invention may be substituted. This can be described as a paradigm shift away from labor-intensive, systematic research and toward research that incorporates passively generated huge datasets and improved prediction algorithms. It will not only assist organisations in mastering this form of study, but it will also provide potential commercial advantages. This strategy can assist in the acquisition and control of big datasets and application-specific algorithms. We believe that organisations should adopt rules that foster transparency and sharing of essential datasets across public and private players, since they will be critical instruments for boosting research productivity and innovation-driven competition in the future.
Article
Full-text available
YOLOv5 is a high-performance real-time object detector that plays an important role in one-stage detectors. However, there are two problems with the design of the YOLOv5 head. The common branch of classification task and regression task of the YOLOv5 head will hurt the training process, and the correlation between classification score and localization accuracy is low. We propose a Double IoU-aware Decoupled Head (DDH) and apply it to YOLOv5. The improved model is named DDH-YOLOv5, which substantially improves the localization accuracy of the model without significantly increasing FLOPS and parameters. Extensive experiments on dataset PASCAL VOC2007 show that DDH-YOLOv5 has good performance. Compared with YOLOv5, DDH-YOLOv5m and DDH-YOLOv5l proposed in this paper achieve 2.4%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} and 1.3%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} improvement in Average Precision (AP), respectively. Compared with Deformable DETR, which is known for its fast-converging, DDH-YOLOv5 completely outperforms Deformable DETR on COCO2017 Val with half of FLOPS and only a quarter of epochs.
Article
Full-text available
Facial Expression recognition is a computer vision problem that took relevant benefit from the research in deep learning. Recent deep neural networks achieved superior results, demonstrating the feasibility of recognizing the expression of a user from a single picture or a video recording the face dynamics. Research studies reveal that the most discriminating portions of the face surfaces that contribute to the recognition of facial expressions are located on the mouth and the eyes. The restrictions for COVID pandemic reasons have also revealed that state-of-the-art solutions for the analysis of the face can severely fail due to the occlusions of using the facial masks. This study explores to what extend expression recognition can deal with occluded faces in presence of masks. To a fairer comparison, the analysis is performed in different occluded scenarios to effectively assess if the facial masks can really imply a decrease in the recognition accuracy. The experiments performed on two public datasets show that some famous top deep classifiers expose a significant reduction in accuracy in presence of masks up to half of the accuracy achieved in non-occluded conditions. Moreover, a relevant decrease in performance is also reported also in the case of occluded eyes but the overall drop in performance is not as severe as in presence of the facial masks, thus confirming that, like happens for face biometric recognition, occluded faces by facial mask still represent a challenging limitation for computer vision solutions.
Chapter
Artificial intelligence techniques such as neural networks (NNs), fuzzy systems, and hybrid NN-fuzzy inference systems can aid in rock properties estimation and reduce uncertainty in the construction of reservoir static models. In this chapter, the main application of intelligent systems or machine learning (ML) models is discussed in rock properties estimation of tight gas sandstones. It is shown how intelligent systems have successfully been used for the estimation of petrophysical properties (e.g., porosity, permeability, water saturation, and pore type), seismic reservoir characterization, and fracture parameters estimation such as fracture density and fracture aperture. Seismic attributes are formulated to reservoir rock and properties using appropriate intelligent systems. This way, a continuous or discrete volume of rock properties such as porosity can be obtained. ML methods are also considered an indirect source of fracture parameters estimation from conventional good logs. Using ML methods, uncertainty in the estimation of reservoir rock properties and geological models can be minimized.
Article
Although the world is shifting toward using more renewable energy resources, combustion systems will still play an important role in the immediate future of global energy. To follow a sustainable path to the future and reduce global warming impacts, it is important to improve the efficiency and performance of combustion processes and minimize their emissions. Machine learning techniques are a cost-effective solution for improving the sustainability of combustion systems through modeling, prediction, forecasting, optimization, fault detection, and control of processes. The objective of this study is to provide a review and discussion regarding the current state of research on the applications of machine learning techniques in different combustion processes related to power generation. Depending on the type of combustion process, the applications of machine learning techniques are categorized into three main groups: (1) coal and natural gas power plants, (2) biomass combustion, and (3) carbon capture systems. This study discusses the potential benefits and challenges of machine learning in the combustion area and provides some research directions for future studies. Overall, the conducted review demonstrates that machine learning techniques can play a substantial role to shift combustion systems towards lower emission processes with improved operational flexibility and reduced operating cost.
Article
Efficient image‐recognition algorithms to classify the pixels accurately are required for the computer‐vision‐based inspection of concrete defects. This study proposes a deep learning‐based model called sparse‐sensing and superpixel‐based segmentation (SSSeg) for accurate and efficient crack segmentation. The model employed a sparse‐sensing‐based encoder and a superpixel‐based decoder and was compared with six state‐of‐the‐art models. An input pipeline of 1231 diverse crack images was specially designed to train and evaluate the models. The results indicated that the SSSeg achieved a good balance between the recognition correctness and completeness and outperformed other models in both accuracy and efficiency. The SSSeg also exhibited good resistance to the interference of surface roughness, dirty stains, and moisture. The increased depth and receptive field of sparse‐sensing units guaranteed the representability; meanwhile, structured sparse characteristics protected the network from overfitting. The lightweight superpixel‐based decoder omitted skip connections, which greatly reduced the computation and memory footprint and enlarged the input size in the inference.
Article
Distracted driving is one of the key factors that cause drivers to ignore potential road hazards and then lead to accidents. Existing efforts in distracted behavior recognition are mainly based on deep learning (DL) methods, which identifies distracted behaviors by analyzing static characteristics of images. However, the convolutional neural network (CNN) - based DL methods lack the causal reasoning ability for behavior patterns. The uncertainty of driving behaviors, noise of the collected data, and occlusion between body agents, bring additional challenges to existing DL methods to recognize distracted behaviors continuously and accurately. Therefore, in this paper, we propose a distracted behavior recognition method based on the Temporal-Spatial double-line DL network (TSD-DLN) and causal And-or graph (C-AOG). TSD-DLN fuses the attention feature extracted from the dynamic optical flow information and the spatial feature of the single video frame to recognize the distracted driving posture. Furthermore, a causal knowledge fence based on C-AOG is fused with TSD-DLN to improve the recognition robustness. The C-AOG represents the causality of behavior state fluent change and adopts counterfactual reasoning to suppress behavior recognition failures caused by frame features distortion or occlusion between body agents. We compared the performance of the proposed method with other state-of-the-art (SOTA) DL methods on two public datasets and self-collected dataset. Experimental results demonstrate that proposed method significantly outperforms other SOTA methods when acquiring distracted driving behavior by processing consecutive frames. In addition, the proposed method exhibits accurate continuous recognition and robustness under incomplete observation scenarios.
Article
Fire detection in its early stages is of a great importance in different environmental related applications. Among the visual signs of fire, smoke appears earlier than the flames in many cases, and quickly reaches the environment. Thus, it can be used for early detection of fire using machine vision techniques. Existing approaches have tried to do it either by traditional machine learning methods applying various combinations of color, texture, and motion features, or by deep learning-based methods that can automatically capture the smoke features from images. However, the former approaches face number of challenges due to the transparent nature of smoke and its variant visual appearance in different environments, and the later ones are not able to capture motion characteristics of smoke, so, their efficiency in various environmental conditions is still problematic and often cause false alarms. In this study, we aim to introduce a hybrid approach that is based on deep learning, spatio, and spatio-temporal characteristics of the smoke. Doing so, we use all the strengths of these techniques to detect the smoke as accurately as possible. The proposed method consists of four stages: 1) moving pixels are extracted from input images by an efficient motion detection scheme; 2) the extracted moving areas are given individually to a tailored convolutional neural network to identify candidate smoke regions; 3) an efficient combination of spatial and spatio-temporal features is extracted from each candidate region based on the distinct characteristics of smoke; 4) a support vector machine classifier is used to further classify real smoke from non-smoke regions using the extracted features. The proposed method is implemented using Python programming language, and extensive experiments conducted on “Visor”, “Bilkent”, and “Yuan” benchmark datasets which show it has high performance and accuracy. The results also indicate that its reliability is far better than the competitors in terms of false alarm rate. The average accuracy and false positive rates obtained on 10 testing videos are 97.63% and 3.8%, respectively. Also, the proposed method is able to detect both white and black smokes which, to our best knowledge, has not been addressed in any of the related researches.
Article
Bulk metallic glasses (BMGs) have been widely used in different fields owing to their unique and excellent properties. In order to accelerate the development of BMGs, different feasible parameters or criteria of their glass forming ability (GFA) have been proposed. With the advent of the era of big data, machine learning (ML) methods provide novel insights into the study of BMGs. In this paper, we trained a convolutional neural network (CNN) model to investigate GFA of BMGs. A hundred alloying elements and their possible combinations were taken into account by mapping an alloying composition into a 10 × 10 feature graph. Compared with the other prediction methods of GFA in BMGs, the alloy composition is the only variable input without the requirement for various physical and chemical properties obtained through pre-experiments. The predictive ability of our proposed model is quantified by a training R² of 0.9745 and a test R² of 0.8137. This work suggests that ML approaches has great potential in guiding the design of new BMG materials.
ResearchGate has not been able to resolve any references for this publication.