Fig 4 - uploaded by Nitish Srivastava
Content may be subject to copyright.
Some Imagenet test cases with the probabilities of the best 5 labels underneath. Many of the top 5 labels are quite plausible. 

Some Imagenet test cases with the probabilities of the best 5 labels underneath. Many of the top 5 labels are quite plausible. 

Source publication
Article
Full-text available
When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several oth...

Similar publications

Article
Full-text available
We present a generative model which can automatically summarize the stroke composition of free-hand sketches of a given category. When our model is fit to a collection of sketches with similar poses, it discovers and learns the structure and appearance of a set of coherent parts, with each part represented by a group of strokes. It represents both...
Article
Full-text available
In this paper, we present an algorithm to automatically detect meaningful modes in a histogram. The proposed method is based on the behavior of local minima in a scale-space representation. We show that the detection of such meaningful modes is equivalent in a two classes clustering problem on the length of minima scale-space curves. The algorithm...
Article
Full-text available
Advancements in Sonar image capture have enabled researchers to apply sophisticated object identification algorithms in order to locate targets of interest in images such as mines. Despite progress in this field, modern sonar automatic target recognition (ATR) approaches lack robustness to the amount of noise one would expect in real-world scenario...
Article
Full-text available
Multi-person event recognition is a challenging task, often with many people active in the scene but only a small subset contributing to an actual event. In this paper, we propose a model which learns to detect events in such videos while automatically "attending" to the people responsible for the event. Our model does not use explicit annotations...

Citations

... Global Average Pooling layer is used in place of Max-Pooling to prevent loss of information contained in the pooling neighbourhood of an activation [25]. Regularization technique is applied to the fully connected dense layer in the sequential model via a Dropout layer [26] to prevent overfitting of the network on the training data. The dropout is varied from 0.2 to 0.5 within each model series created for each of the architectures used, so that a well generalized model with best possible prediction capability can be obtained. ...
Article
Full-text available
Due to stochastic occurrence of surface defects in a structure, size of acquired image datasets may vary for cracked and un-cracked classes. Further, in crack detection and classification, among misclassified predictions, while, false-positives can be particularly important that can provide added safety factor to the structural health monitoring system to adopt early preventive measures, false negatives can result in an overconfident health monitoring system thereby seriously affecting the durability of a structure. In this study, the authors aimed to address these two problems, by transfer learning five pre-trained deep convolution neural network (DCNN) models on the same target dataset using binary focal loss and evaluated the models’ performance in comparison to the binary cross-entropy loss function. Five model sets each consisting twenty four variations have been generated by varying the dropout and loss function parameters, from which the best performing model has been proposed. The influence of the focussing parameter, γ on the model accuracy has also been investigated. Finally, three independent test datasets are used to evaluate the generalization capacity of the proposed model under optimal thresholds which yielded in appreciable metrics outcome.
... The advantage of deep learning is to use unsupervised or semi-supervised feature learning and hierarchical feature extraction to replace manual feature extraction. The concept of deep learning was proposed by Hinton et al. in 2006 [14] . Models are usually trained by a large amount of labeled data. ...
... To generalise the model and prevent overfitting during training, we regularised the stack of densely connected layers in the emission module by applying a 5 % dropout rate to the hidden layers (Hinton et al., 2012). The combination of machine learning components and numerical implementations of physical processes such as atmospheric transport in a single model 160 adds a novel aspect to the overfitting problem. ...
Preprint
Full-text available
Aeolian dust has significant impacts on climate, public health, infrastructure and ecosystems. Assessing dust concentrations and these impacts is challenging because the emissions depend on many environmental factors and can vary greatly with meteorological conditions. We present a data-driven aeolian dust scheme that combines machine learning components and physical equations to predict atmospheric dust concentrations and quantify the sources. The numerical scheme was trained to reproduce dust aerosol optical depth retrievals by the Infrared Atmospheric Sounding Interferometer on board the MetOp-A satellite. The input parameters included meteorological variables from the fifth generation atmospheric reanalysis of the European Centre for Medium-Range Weather Forecasts. The trained dust scheme can be applied as an emission submodel, to be used in climate and Earth system models, which is reproducibly derived from observational data so that a priori assumptions and manual parameter tuning can be largely avoided. We compared the trained emission submodel to a state-of-the-art emission parametrisation, showing that it substantially improves the representation of aeolian dust in the global atmospheric chemistry-climate model EMAC.
... The Dropout method is another regularization technique introduced by Hinton et al. (2012) and experimentally extended by Srivastava et al. (2014). This technique is a straightforward way of applying in the training phase of a model to reduce the connection between nodes within the hidden layers. ...
... The detail of creating the simple network is expressed in Table 4. To show the effectiveness of the proposed method within the well-known network architectures, the VGG and AlexNet Convolutional Neural Networks are employed as basic networks (Yu et al. 2016;Krizhevsky 2012;. Despite the basic network, the proposed approach inherently has two neural networks, which belong to the actor and the critic networks in the DDPG. ...
Article
Convolutional Neural Networks are machine learning models that have proven abilities in many variants of tasks. This powerful machine learning model sometimes suffers from overfitting. This paper proposes a method based on Reinforcement Learning for addressing this problem. In this research, the parameters of a target layer in the Convolutional Neural Network take as a state for the Agent of the Reinforcement Learning section. Then the Agent gives some actions as forming parameters of a hyperbolic secant function. This function’s form is changed gradually and implicitly by the proposed method. The inputs of the function are the weights of the layer, and its outputs multiply by the same weights to updating them. In this study, the proposed method is inspired by the Deep Deterministic Policy Gradient model because the actions of the Agent are into a continuous domain. To show the proposed method’s effectiveness, the classification task is considered using Convolutional Neural Networks. In this study, 7 datasets have been used for evaluating the model; MNIST, Extended MNIST, small-notMNIST, Fashion-MNIST, sign language MNIST, CIFAR-10, and CIFAR-100.
... distribution of the labels. For regularization, DETexT employs dropout in the penultimate layer to constrain the l 2 norms of the weight vectors (Hinton et al., 2012). The dropout method prevents the hidden units from co-adaptation through random rejection, that is, a certain proportion of hidden units are set to zero during forward backpropagration. ...
Article
Full-text available
Detecting SNV at very low read depths helps to reduce sequencing requirements, lowers sequencing costs, and aids in the early screening, diagnosis, and treatment of cancer. However, the accuracy of SNV detection is significantly reduced at read depths below ×34 due to the lack of a sufficient number of read pairs to help filter out false positives. Many recent studies have revealed the potential of mutational signature (MS) in detecting true SNV, understanding the mutational processes that lead to the development of human cancers, and analyzing the endogenous and exogenous causes. Here, we present DETexT, an SNV detection method better suited to low read depths, which classifies false positive variants by combining MS with deep learning algorithms to mine correlation information around bases in individual reads without relying on the support of duplicate read pairs. We have validated the effectiveness of DETexT on simulated and real datasets and conducted comparative experiments. The source code has been uploaded to https://github.com/TrinaZ/extra-lowRD for academic use only.
... To further improve the accuracy, the dropout regularization technique [34] was used to avoid overfitting due to the relatively small amount of data. The validation accuracy with a dropout rate ranging from 0 to 0.9 was compared. ...
... Hence, the optimized framework of 2D-CNN consisted of a sequential distribution of single Conv blocks with one channel and a fully connected classifier. To further improve the accuracy, the dropout regularization technique [34] was used to avoid overfitting due to the relatively small amount of data. The validation accuracy with a dropout rate ranging from 0 to 0.9 was compared. ...
Article
Full-text available
Due to the similar chemical composition and matrix effect, the accurate identification of mineral pigments on wall paintings has brought great challenges. This work implemented an identification study on three mineral pigments with similar chemical compositions by combining LIBS technology with the K-nearest neighbor algorithm (KNN), random forest (RF support vector machine (SVM), back propagation artificial neural network (Bp-ANN) and convolutional neural network (CNN) to find the most suitable identification method for mural research. Using the SelectKBest algorithm, 300 characteristic lines with the largest difference among the three pigments were determined. The identification models of KNN, RF, SVM, Bp-ANN and CNN were established and optimized. The results showed that, except for the KNN model, the identification accuracy of other models for mock-up mural samples was above 99%. However, only the identification accuracy of 2D-CNN models reached above 94% for actual mural samples. Therefore, the 2D-CNN model was determined as the most suitable model for the identification and analysis of mural pigments.
... Various techniques have been developed to tackle overfitting, an incomplete list includes early stopping [33,34], data augmentation [35,36], adding statistical noise to inputs [37], and regularization [38][39][40]. Dropout, which was first introduced by Hinton et al. [41] and subsequently proved to be a stochastic regularization technique by Srivastava et al. [42], is an effective technique for tackling overfitting. Dropout can be applied to nodes [15] or edges [43]. ...
... Dropout was first introduced by Hinton et al. [41] as a way to train deep neural networks, in which a collection of hidden neurons is stochastically "dropped out" at each iteration of a training procedure. It has been proven effective in controlling overfitting. ...
Article
Full-text available
Enhancing message propagation is critical for solving the problem of node classification in sparse graph with few labels. The recently popularized Graph Convolutional Network (GCN) lacks the ability to propagate messages effectively to distant nodes because of over-smoothing. Besides, the GCN with numerous trainable parameters suffers from overfitting when the labeled nodes are scarce. This article addresses the problem via building GCN on Enhanced Message-Passing Graph (EMPG). The key idea is that node classification can benefit from various variants of the input graph that can propagate messages more efficiently, based on the assumption that the structure of each variant is reasonable when more unlabeled nodes are labeled properly. Specifically, the proposed method first maps the nodes to a latent space through graph embedding that captures the structural information of the input graph. Considering the node attributes together, the proposed method constructs the EMPG by adding connections between the nodes in close proximity in the latent space. With the help of the added connections, the EMPG allows a node to propagate its message to the right nodes at long distances, so that the GCN built on the EMPG need not stack multiple layers. As a result, over-smoothing is avoided. However, dense connections may cause message propagation saturation and lead to overfitting. Seeing the EMPG as an accumulation of some potential variants of the original graph, the proposed method utilizes dropout to extract a group of variants from the EMPG and then builds multichannel GCNs on them. The multichannel features learned from different dropout EMPGs are aggregated to compute the final prediction jointly. The proposed method is flexible, as a brod range of GCNs can be incorporated easily. Additionally, it is efficient and robust. Experimental results demonstrate that the proposed method yields improvements in node classification.
... In the following observation, feature extraction from the CNN model is passed to other classifiers [29]. Figure 13 shows the confusion metric of the convolutional neural network with the random forest model. ...
Article
Full-text available
Although the statistics show a slow decline in traffic accidents in many countries over the last few years, drunk or drug-influenced driving still contributes to enough shares in those records to act. Nowadays, breath analysers are used to estimate breath alcohol content (BAC) by law enforcement as a preliminary alcohol screening in many countries. Therefore, since breath analysers or field sobriety testers do not accurately measure BAC, the analysis of blood samples of individuals is required for further action. Many researchers have presented various approaches to detect drunk driving, for example, using sensors, face recognition, and a driver’s behaviour to confound the shortcomings of the time-honoured approach using breath analysers. But each one has some limitations. This study proposed a plan to distinguish between drivers’ states, that is, sober or drunk, by the use of transfer learning from the convolutional neural network (CNN) features to the random forest (RF) features with an accuracy of up to 93%, which is higher than that of existing models. With the same dataset, to validate our research, a comparative analysis was performed with other existing model classifiers such as the simple vector machine (SVM) with an accuracy of 65% and the K-nearest neighbour (KNN) with an accuracy of 62%, and it was found that our approach is an optimized approach in terms of accuracy, precision, recall, F1-score, AUC-ROC curve, and Matthew’s correlation coefficient (MCC) with confusion matrix.
... It is a phenomenon in which the data set performs well in the training process and performs poorly in the testing process. Hinton et al. (2012) proposed to use dropout to reduce overfitting of fully connected neurons. Dropout reduces the complexity of mutual adaptation between neurons by ignoring a certain proportion of neurons randomly during training. ...
Article
Full-text available
Computer vision based on machine learning theory has been widely used in the surface damage detection of concrete structures, but the characterization of internal damage in concrete still remains a challenge for researchers. Aiming at this problem, we propose a nonde-structive evaluation (NDE) method to classify diverse conditions of internal damage in concrete based on short-time Fourier transform (STFT) and convolutional neural networks (CNN). The STFT converts the self-resonant vibration signals into two-dimensional time-frequency images that can be used as the input data for the CNN. The training set is fed into the CNN for feature extraction and classification, and then the testing set is brought into the trained model for verification. Both a simple case of virgin state and damaged state, as well as a complicated case covering all four internal damage states were successfully classified with an excellent recognition rate of testing samples. The key CNN hyperparameters were optimized and the classification accuracy rate of spectrum images was as high as 98.8%. Optimal data set size was also found to balance the accuracy and efficiency. The findings in this work validate the feasibility of the CNN for the detection and differentiation of invisible damage in concrete nondestructively.
... When the number of parameters of the neural network is too large, a problem of overfitting input data occurs during neural network training. To solve this problem, Hinton et al. [29] proposed a dropout layer. Dropout is a concept that randomly turns off the nodes constituting the FCL with a set probability between 0 and 1. ...
Article
Full-text available
As the demand for ocean exploration increases, studies are being actively conducted on autonomous underwater vehicles (AUVs) that can efficiently perform various missions. To successfully perform long-term, wide-ranging missions, it is necessary to apply fault diagnosis technology to AUVs. In this study, a system that can monitor the health of in situ AUV thrusters using a convolutional neural network (CNN) was developed. As input data, an acoustic signal that comprehensively contains the mechanical and hydrodynamic information of the AUV thruster was adopted. The acoustic signal was pre-processed into two-dimensional data through continuous wavelet transform. The neural network was trained with three different pre-processing methods and the accuracy was compared. The decibel scale was more effective than the linear scale, and the normalized decibel scale was more effective than the decibel scale. Through tests on off-training conditions that deviate from the neural network learning condition, the developed system properly recognized the distribution characteristics of noise sources even when the operating speed and the thruster rotation speed changed, and correctly diagnosed the state of the thruster. These results showed that the acoustic signal-based CNN can be effectively used for monitoring the health of the AUV’s thrusters.