Content uploaded by Y. Bengio

Author content

All content in this area was uploaded by Y. Bengio on Nov 19, 2014

Content may be subject to copyright.

A preview of the PDF is not available

ArticlePDF Available

Theoretical results strongly suggest that in order to learn the kind of complicated functions that can repre- sent high-level abstractions (e.g. in vision, language, an d other AI-level tasks), one needs deep architec- tures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-formulae. Searching the parameter space of deep architectures is a difficult opti mization task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state-of-the-art in certain areas. This paper d iscusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks.

Figures - uploaded by Y. Bengio

Author content

All figure content in this area was uploaded by Y. Bengio

Content may be subject to copyright.

Content uploaded by Y. Bengio

Author content

All content in this area was uploaded by Y. Bengio on Nov 19, 2014

Content may be subject to copyright.

A preview of the PDF is not available

... Over the years significant progress has been made in training large models. Nevertheless, it has not yet been clear what makes a representation good for complex learning systems (Bottou et al., 2007;Vincent et al., 2008;Bengio, 2009;Zhang et al., 2021). ...

The statistical supervised learning framework assumes an input-output set with a joint probability distribution that is reliably represented by the training dataset. The learner is then required to output a prediction rule learned from the training dataset's input-output pairs. In this work, we provide meaningful insights into the asymptotic equipartition property (AEP) \citep{Shannon:1948} in the context of machine learning, and illuminate some of its potential ramifications for few-shot learning. We provide theoretical guarantees for reliable learning under the information-theoretic AEP, and for the generalization error with respect to the sample size. We then focus on a highly efficient recurrent neural net (RNN) framework and propose a reduced-entropy algorithm for few-shot learning. We also propose a mathematical intuition for the RNN as an approximation of a sparse coding solver. We verify the applicability, robustness, and computational efficiency of the proposed approach with image deblurring and optical coherence tomography (OCT) speckle suppression. Our experimental results demonstrate significant potential for improving learning models' sample efficiency, generalization, and time complexity, that can therefore be leveraged for practical real-time applications.

... Softmax is a multinomial logistic function that can produce a vector in the range (0, 1) to represent the classification probability distribution of each class. Softmax is often used as a conversion function for multiclass classification [51]. The sigmoid function is suitable for binary classification. ...

Damage to the surface construction of reinforced concrete (RC) will impact the security of the facility’s structure. Deep learning can effectively identify various types of damage, which is useful for taking protective measures to avoid further deterioration of the structure. Based on deep learning, the multi-convolutional neural network (MCNN) has the potential for identifying multiple RC damage images. The MCNN6 of this study was evaluated by indicators (accuracy, loss, and efficiency), and the optimized architecture was confirmed. The results show that the identification performance for “crack and rebar exposure” (Type B) by MCNN6 is the best, with an accuracy of 96.81% and a loss of 0.07. The accuracy of the other five types of damage combinations is also higher than 80.0%, and the loss is less than 0.44. Finally, the MCNN6 model can be used in the detection of various damage to achieve automated assessment for RC facility surface conditions.

Historically, the intuition behind developing classification algorithms was often to identify a hypothesis that minimizes training error. A major problem encountered with this approach is overfitting, which occurs when the hypothesis becomes too complex in comparison to the size of training data. In such cases, it is likely that an algorithm minimizing training error will find a hypothesis that fits the training data very well, but generalizes poorly to previously unseen data. Good generalization here refers to low generalization error which is defined as the difference between the training error and true error.

The European commission launch of the twin Sentinel-2 satellites provides new opportunities for land use and land cover (LULC) classification because of the readily availability of their data and their enhanced spatial, temporal and spectral resolutions. The rapid development of machine learning over the past decade led to data-driven models being at the forefront of high accuracy predictions of the physical world. However, the contribution of the driving variables behind these predictions cannot be explained beyond generalized metrics of overall performance. Here, we compared the performance of three shallow learners (support vector machines, random forest, and extreme gradient boosting) as well as two deep learners (a convolutional neural network and a residual network with 50 layers) in and around the city of Malmö in southern Sweden. Our complete analysis suite involved 141 input features, 85 scenarios, and 8 LULC classes. We explored the interpretability of the five learners using Shapley additive explanations to better understand feature importance at the level of individual LULC classes. The purpose of class-level feature importance was to identify the most parsimonious combination of features that could reasonably map a particular class and enhance overall map accuracy. We showed that not only do overall accuracies increase from shallow (mean = 84.64%) to deep learners (mean = 92.63%) but that the number of explanatory variables required to obtain maximum accuracy decreases along the same gradient. Furthermore, we demonstrated that class-level importance metrics can be successfully identified using Shapley additive explanations in both shallow and deep learners, which allows for a more detailed understanding of variable importance. We show that for certain LULC classes there is a convergence of variable importance across all the algorithms, which helps explain model predictions and aid the selection of more parsimonious models. The use of class-level feature importance metrics is still new in LULC classification, and this study provides important insight into the potential of more nuanced importance metrics.

With time, machine learning experts are unanimously agreeing that finding good features is one of the most important problems in pattern classification [1]. In fact, if given features are good, even a linear classifier would suffice to give excellent classification results.

In the last few decades, deep learning techniques for diagnosing and predicting disease conditions from neuroimaging have attracted much attention and interest from the scientific community. Big data and artificial intelligence approaches and innovations are currently being utilized to generate large datasets from images, text, sounds, graphs, and signals. New trends in the utilization of deep learning for disease prediction in neurology, oncology, cardiology, and other areas entail converting patient electronic health records, biological system information, physiological signals, biomarkers, and biomedical images to cognitive functions. The current trends in deep learning techniques focus on utilizing neuroimaging analysis to evaluate alterations in local morphological topographies of different brain sub-regions and then predict novel disorder-linked brain patterns. Hence, this chapter presents a detailed overview of different approaches in deep learning for the prediction of major brain diseases such as mild cognitive impairment, Alzheimer's disease, brain tumors, depressive disorders, traumatic brain injury, schizophrenia, Parkinson's disease, autism spectrum disease, attention-deficit hyperactivity disorder, epilepsy, stroke, multiple sclerosis, and more. The chapter also discusses the current challenges of utilizing deep learning in assessing brain disorders in neuroimaging data.

The quantum circuit layout (QCL) problem is to map a quantum circuit such that the constraints of the device are satisfied. We introduce a quantum circuit mapping heuristic, QXX, and its machine learning version, QXX-MLP. The latter infers automatically the optimal QXX parameter values such that the layed out circuit has a reduced depth. In order to speed up circuit compilation, before laying the circuits out, we are using a Gaussian function to estimate the depth of the compiled circuits. This Gaussian also informs the compiler about the circuit region that influences most the resulting circuit’s depth. We present empiric evidence for the feasibility of learning the layout method using approximation. QXX and QXX-MLP open the path to feasible large scale QCL methods.

In this paper, we conduct an in-depth research on the corresponding enterprises, combined with some problems existing in the process of data processing and use. We establish a deep learning model on the extensive collection and comprehensive investigation of the research results of domestic and foreign enterprises in all aspects of the process of data processing and use, and determine the research directions. Firstly, in view of the increasing complexity and dimension of enterprise data, and the difficulties of enterprise data application, this paper studies the related data preprocessing methods. Secondly, aiming at the problems of enterprise cost control and customer relationship management, this paper studies the prediction based on enterprise data through the analysis of practical problems and the processing of corresponding data. Finally, in order to progress and advance the efficiency and scientific usefulness of enterprise management, we in this paper study the evaluation based on enterprise data. The model is verified through simulations and compared with several models i.e. cross hybrid and sequential hybrid models. Using certain assumptions, the attained outcomes confirm that the accuracy of the deep learning structure of the single model is sophisticated and greater than that of the cross hybrid model, but lower than that of the sequential hybrid model.

Particulate matter has a significantly larger impact on human health than other toxins which makes air pollution a highly serious problem. The air quality of a given region can be utilized as a primary determinant of the pollution index, as well as how well the industries and population are controlled. With the development of industries, monitoring urban air quality has become a persistent issue. At the same time, the crucial effect of air pollution on individuals’ healthiness and the environment and monitoring air quality is becoming gradually important, mainly in urban areas. Several computing methods have been studied and compared to verify the accurateness of air quality forecasting requirements to date, ranging from machine learning to deep learning. This paper introduced a deep learning air quality forecasting approach based on the convolutional bidirectional long short-term memory (CBLSTM) model for PM 2.5, which combines 1D convolution and bidirectional LSTM neural networks. The experiment findings demonstrate that the suggested approach outperforms the LSTM, CBLSTM, and CBGRU comparison models and achieves a high accuracy rate (MAE = 6.8 and RMSE = 10.2).

For many types of machine learning algorithms, one can compute the statistically `optimal' way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance.

Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.