Fig 6 - uploaded by Nitish Srivastava
Content may be subject to copyright.
Frame classification error and cross-entropy on the (a) Training and (b) Validation set as learning progresses. The training error is computed using the stochastic nets.

Frame classification error and cross-entropy on the (a) Training and (b) Validation set as learning progresses. The training error is computed using the stochastic nets.

Source publication
Article
Full-text available
When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several oth...

Contexts in source publication

Context 1
... tried several network architectures by varying the number of input frames (15 and 31), number of layers in the neural network (3, 4 and 5) and the number of hidden units in each layer (2000 and 4000). Figure 6 shows the validation error curves for a number of these combinations. Using dropout consistently leads to lower error rates. ...
Context 2
... model needs to be run for about 200 epochs to converge. The same network was also finetuned with standard backpropagation using a smaller learning rate of 0.1, keeping all other hyperparameters Figure 6 shows the frame classification error and cross-entropy objective value on the training and validation sets. We compare the performance of dropout and standard backpropagation on several network architectures and input representations. ...

Similar publications

Article
Full-text available
Sparse Representation (or coding) based Classification (SRC) has gained great success in face recognition in recent years. However, SRC emphasizes the sparsity too much and overlooks the correlation information which has been demonstrated to be critical in real-world face recognition problems. Besides, some work considers the correlation but overlo...
Article
Full-text available
Dictionary learning for sparse representations is traditionally approached with sequential atom updates, in which an optimized atom is used immediately for the optimization of the next atoms. We propose instead a Jacobi version, in which groups of atoms are updated independently, in parallel. Extensive numerical evidence for sparse image representa...
Article
Full-text available
In this position paper we describe a general framework for applying machine learning and pattern recognition techniques in healthcare. In particular, we are interested in providing an automated tool for monitoring and incrementing the level of awareness in the operating room and for identifying human errors which occur during the laparoscopy surgic...
Article
Full-text available
When deep learning is applied to visual object recognition, data augmentation is often used to generate additional training data without extra labeling cost. It helps to reduce overfitting and increase the performance of the algorithm. In this paper we investigate if it is possible to use data augmentation as the main component of an unsupervised f...

Citations

... Global Average Pooling layer is used in place of Max-Pooling to prevent loss of information contained in the pooling neighbourhood of an activation [25]. Regularization technique is applied to the fully connected dense layer in the sequential model via a Dropout layer [26] to prevent overfitting of the network on the training data. The dropout is varied from 0.2 to 0.5 within each model series created for each of the architectures used, so that a well generalized model with best possible prediction capability can be obtained. ...
Article
Full-text available
Due to stochastic occurrence of surface defects in a structure, size of acquired image datasets may vary for cracked and un-cracked classes. Further, in crack detection and classification, among misclassified predictions, while, false-positives can be particularly important that can provide added safety factor to the structural health monitoring system to adopt early preventive measures, false negatives can result in an overconfident health monitoring system thereby seriously affecting the durability of a structure. In this study, the authors aimed to address these two problems, by transfer learning five pre-trained deep convolution neural network (DCNN) models on the same target dataset using binary focal loss and evaluated the models’ performance in comparison to the binary cross-entropy loss function. Five model sets each consisting twenty four variations have been generated by varying the dropout and loss function parameters, from which the best performing model has been proposed. The influence of the focussing parameter, γ on the model accuracy has also been investigated. Finally, three independent test datasets are used to evaluate the generalization capacity of the proposed model under optimal thresholds which yielded in appreciable metrics outcome.
... In the following observation, feature extraction from the CNN model is passed to other classifiers [29]. Figure 13 shows the confusion metric of the convolutional neural network with the random forest model. ...
Article
Full-text available
Although the statistics show a slow decline in traffic accidents in many countries over the last few years, drunk or drug-influenced driving still contributes to enough shares in those records to act. Nowadays, breath analysers are used to estimate breath alcohol content (BAC) by law enforcement as a preliminary alcohol screening in many countries. Therefore, since breath analysers or field sobriety testers do not accurately measure BAC, the analysis of blood samples of individuals is required for further action. Many researchers have presented various approaches to detect drunk driving, for example, using sensors, face recognition, and a driver’s behaviour to confound the shortcomings of the time-honoured approach using breath analysers. But each one has some limitations. This study proposed a plan to distinguish between drivers’ states, that is, sober or drunk, by the use of transfer learning from the convolutional neural network (CNN) features to the random forest (RF) features with an accuracy of up to 93%, which is higher than that of existing models. With the same dataset, to validate our research, a comparative analysis was performed with other existing model classifiers such as the simple vector machine (SVM) with an accuracy of 65% and the K-nearest neighbour (KNN) with an accuracy of 62%, and it was found that our approach is an optimized approach in terms of accuracy, precision, recall, F1-score, AUC-ROC curve, and Matthew’s correlation coefficient (MCC) with confusion matrix.
... It is a phenomenon in which the data set performs well in the training process and performs poorly in the testing process. Hinton et al. (2012) proposed to use dropout to reduce overfitting of fully connected neurons. Dropout reduces the complexity of mutual adaptation between neurons by ignoring a certain proportion of neurons randomly during training. ...
Article
Full-text available
Computer vision based on machine learning theory has been widely used in the surface damage detection of concrete structures, but the characterization of internal damage in concrete still remains a challenge for researchers. Aiming at this problem, we propose a nonde-structive evaluation (NDE) method to classify diverse conditions of internal damage in concrete based on short-time Fourier transform (STFT) and convolutional neural networks (CNN). The STFT converts the self-resonant vibration signals into two-dimensional time-frequency images that can be used as the input data for the CNN. The training set is fed into the CNN for feature extraction and classification, and then the testing set is brought into the trained model for verification. Both a simple case of virgin state and damaged state, as well as a complicated case covering all four internal damage states were successfully classified with an excellent recognition rate of testing samples. The key CNN hyperparameters were optimized and the classification accuracy rate of spectrum images was as high as 98.8%. Optimal data set size was also found to balance the accuracy and efficiency. The findings in this work validate the feasibility of the CNN for the detection and differentiation of invisible damage in concrete nondestructively.
... When the number of parameters of the neural network is too large, a problem of overfitting input data occurs during neural network training. To solve this problem, Hinton et al. [29] proposed a dropout layer. Dropout is a concept that randomly turns off the nodes constituting the FCL with a set probability between 0 and 1. ...
Article
Full-text available
As the demand for ocean exploration increases, studies are being actively conducted on autonomous underwater vehicles (AUVs) that can efficiently perform various missions. To successfully perform long-term, wide-ranging missions, it is necessary to apply fault diagnosis technology to AUVs. In this study, a system that can monitor the health of in situ AUV thrusters using a convolutional neural network (CNN) was developed. As input data, an acoustic signal that comprehensively contains the mechanical and hydrodynamic information of the AUV thruster was adopted. The acoustic signal was pre-processed into two-dimensional data through continuous wavelet transform. The neural network was trained with three different pre-processing methods and the accuracy was compared. The decibel scale was more effective than the linear scale, and the normalized decibel scale was more effective than the decibel scale. Through tests on off-training conditions that deviate from the neural network learning condition, the developed system properly recognized the distribution characteristics of noise sources even when the operating speed and the thruster rotation speed changed, and correctly diagnosed the state of the thruster. These results showed that the acoustic signal-based CNN can be effectively used for monitoring the health of the AUV’s thrusters.
... For a ground acceleration dynamic excitationü g (s), the response function R(ü g )(t) experiences a retardation effect due to the causality of the physical process. Therefore, applying the universal approximation of function (15) to input function space ...
... We also study the different methods with different network sizes and training samples, which are discussed in subsequent sections. To avoid overfitting, dropout [15] is considered during training for few of cases, but is disabled while evaluation. ...
Preprint
Full-text available
In this paper, we propose a DeepONet structure with causality to represent the causal linear operators between Banach spaces of time-dependent signals. The theorem of universal approximations to nonlinear operators proposed in \cite{tianpingchen1995} is extended to operators with causalities, and the proposed Causality-DeepONet implements the physical causality in its framework. The proposed Causality-DeepONet considers causality (the state of the system at the current time is not affected by that of the future, but only by its current state and past history) and uses a convolution-type weight in its design. To demonstrate its effectiveness in handling the causal response of a physical system, the Causality-DeepONet is applied to learn the operator representing the response of a building due to earthquake ground accelerations. Extensive numerical tests and comparisons with some existing variants of DeepONet are carried out, and the Causality-DeepONet clearly shows its unique capability to learn the retarded dynamic responses of the seismic response operator with good accuracy.
... Here, we use L 2 regularization for preventing overfitting, and it yields satisfactory performance, as evidenced by our numerical studies. We note that, other techniques, such as early stopping (Li et al. 2020), L 1 regularization (Mollalo et al. 2019), and Monte Carlo dropout (Hinton et al. 2012;Gal and Ghahramani 2016), can also be used for addressing the problem of overfitting. It is beyond our scope to evaluate which technology is more efficient. ...
Article
Full-text available
Online peer-to-peer lending platforms provide loans directly from lenders to borrowers without passing through traditional financial institutions. For lenders on these platforms to avoid loss, it is crucial that they accurately assess default risk so that they can make appropriate decisions. In this study, we develop a penalized deep learning model to predict default risk based on survival data. As opposed to simply predicting whether default will occur, we focus on predicting the probability of default over time. Moreover, by adding an additional one-to-one layer in the neural network, we achieve feature selection and estimation simultaneously by incorporating an \(L_1\)-penalty into the objective function. The minibatch gradient descent algorithm makes it possible to handle massive data. An analysis of a real-world loan data and simulations demonstrate the model’s competitive practical performance, which suggests favorable potential applications in peer-to-peer lending platforms.
... In the FC layer we use a regularization technique in order to avoid "overfitting", which occurs when the model fits very well on the peculiarities of the data set at hand, which, inevitably, leads to poor generalization performance on unseen data. So, to prevent complex co-adaptations on the training data, we use the "Dropout" technique [55]: it is a technique meant to prevent overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1 − p [41]. ...
Article
Full-text available
Ultra-low frequency (ULF) magnetospheric plasma waves play a key role in the dynamics of the Earth’s magnetosphere and, therefore, their importance in Space Weather phenomena is indisputable. Magnetic field measurements from recent multi-satellite missions (e.g., Cluster, THEMIS, Van Allen Probes and Swarm) are currently advancing our knowledge on the physics of ULF waves. In particular, Swarm satellites, one of the most successful missions for the study of the near-Earth electromagnetic environment, have contributed to the expansion of data availability in the topside ionosphere, stimulating much recent progress in this area. Coupled with the new successful developments in artificial intelligence (AI), we are now able to use more robust approaches devoted to automated ULF wave event identification and classification. The goal of this effort is to use a popular machine learning method, widely used in Earth Observation domain for classification of satellite images, to solve a Space Physics classification problem, namely to identify ULF wave events using magnetic field data from Swarm. We construct a Convolutional Neural Network (ConvNet) that takes as input the wavelet spectrum of the Earth’s magnetic field variations per track, as measured by Swarm, and whose building blocks consist of two alternating convolution and pooling layers, and one fully connected layer, aiming to classify ULF wave events within four different possible signal categories: (1) Pc3 wave events (i.e., frequency range 20–100 MHz), (2) background noise, (3) false positives, and (4) plasma instabilities. Our preliminary experiments show promising results, yielding successful identification of more than 97% accuracy. The same methodology can be easily applied to magnetometer data from other satellite missions and ground-based arrays.
... All the tested networks use batch normalized input feed [64], and a rectified linear unit [65] as activation function. The use of a dropout layer [66,67] worsens the accuracy so we do not use it here. The optimization of the weights of the network is done using Adam optimizer [68] -which is a variation of stochastic gradient descent-with respect to the mean squared error. ...
Preprint
Full-text available
We develop a machine learning (ML) approach for improving the accuracy of the horizontal dis-tribution of the aerosol optical depth (AOD) simulated by the CHIMERE chemistry-transport model over Northern Africa using Moderate-Resolution Imaging Spectroradiometer (MODIS) AOD satellite observations. These observations are used during the training phase of the ML method for deriving a regional bias correction of AODs simulated by CHIMERE. The results are daily maps of regional bias corrected AODs with full horizontal coverage over Northern Africa. We test four types of ML models: multiple linear regression (MLR), random forests (RF), gradi-ent boosting (XGB), and multiple layer perceptron networks (NN). We perform comparisons with satellite and independent ground-based observations of AOD that are not used in the training phase. They suggest that all models have overall comparable performances with a slight advantage of the RF model which expresses less spatial artifacts. While the method slightly underestimates the very high AODs, it significantly reduces biases and absolute errors, and clearly enhances linear correlations with respect to independent observations. This im-provement for deriving the AOD is particularly relevant for high dust pollution regions like the Sahara Desert, which dramatically lack ground-based measurements for validations of chemis-try-transport modeling which currently remains challenging and imprecise.
... Thanks to the unique feature of its convolutional layers, which contain learnable kernel filters as parameters, the CNN can obtain spatially correlated local features from a small region of previous layers where its activation functions, such as rectified linear units (ReLU) and sigmoid functions provide these using the output of the nodes from one layer to the next layer, allowing for the modelling of non-linear input data to be incorporated in the network [7]. ...
Chapter
Technology can give industries the ability to create products/materials/services that meet customer needs and comply with applicable regulatory obligations. In this context, an automatic damage detection system is proposed for sandwich panels. Instead of relying on manual inspection, the system is based on artificial vision and operates with high accuracy in an industrial environment, ensuring traceability in product quality, reducing the percentage of returns caused by imperfections. The adaptive thresholding method seeks to identify the pixel intensities found on the surface of the sandwich panel. Unlike existing methods, the proposed algorithm is based on an adaptive threshold that uses the local characteristics of an image to segment and classify damage on the surfaces of sandwich panels, seeking to reject or accept a product according to the quality levels defined by the standard. The experimental results propose to generate a comparison with a sandwich panel damage detection method based on a convolutional neural network. The results of the experiment show that the proposed thresholding-based method has better accuracy and F1Score than deep learning methods. Moreover, this system is able to improve the industrial standards of sandwich panel manufacturing according to the standard, which limits the allowable imperfections, pointing out only the maximum admissible value of manufacturing imperfections to obtain a quality product.KeywordsDamage identificationDeep learningFeature extractionComputer vision
... The term "dropped-out" relates to the "filtering-out" of random neurons during the training stage. The process was initially introduced [25] to prevent overfitting by removing unnecessary sets of features for every iteration during the training stage, and it was partially proven to stabilize the prediction while being the regularizer. Table 3 lists the mean IU for respective facies and overall facies, representing sixteen seismic attributes as input and nine seismic attributes as input deployed in A Field, by using different dropout values. ...
Article
Full-text available
Automating geobodies using insufficient labeled training data as input for structural prediction may result in missing important features and a possibility of overfitting, leading to low accuracy. We adopt a deep learning (DL) predictive modeling scheme to alleviate detection of channelized features based on classified seismic attributes (X) and different ground truth scenarios (y), to imitate actual human interpreters’ tasks. In this approach, diverse augmentation method was applied to increase the accuracy of the model after we were satisfied with the refined annotated ground truth dataset. We evaluated the effect of dropout as a training regularizer and facies’ spatial representation towards optimized prediction results, apart from conventional hyperparameter tuning. From our findings, increasing batch size helps speedup training speed and improve performance stability. Finally, we demonstrate that the designed Convolutional Neural Network (CNN) is capable of learning channelized variation from complex deepwater settings in a fluvial-dominated depositional environment while producing outstanding mean Intersection of Union (IoU) (95%) despite utilizing 6.4% from the overall dataset and avoiding overfitting possibilities.