Theoretical results strongly suggest that in order to learn the kind of complicated functions that can repre- sent high-level abstractions (e.g. in vision, language, an d other AI-level tasks), one needs deep architec- tures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-formulae. Searching the parameter space of deep architectures is a difficult opti mization task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state-of-the-art in certain areas. This paper d iscusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks.
All content in this area was uploaded by Y. Bengio on Nov 19, 2014
Content may be subject to copyright.
A preview of the PDF is not available
... We use some of the recently popularized parametric and non-parametric ML approaches, such as elastic net [59], support vector machines [60], random forest [61], gradient boosting [62], and feedforward artificial neural networks [63]. For each model, there are many variations proposed in the literature. ...
... In the remainder of this section, we provide a high-level description of these models. For further details on them, refer to Appendix B and various textbooks [14,37,63,64]. ...
... Given these results, we go back and adjust the weights and biases of the network. Typically, we need a large training dataset to achieve good performance using ANN [63,67]. ...
This paper assesses the usefulness of comprehensive payments data for macroeconomic predictions in Canada. Specifically, we evaluate which type of payments data are useful, when they are useful, why they are useful, and whether machine learning (ML) models enhance their predictive value. We find payments data with a factor model can help improve accuracy up to 25% in predicting GDP, retail, and wholesale sales; and nonlinear ML models can further improve the accuracy up to 20%. Furthermore, we find the retail payments data are more useful than the data from the wholesale system; and they add more value during crisis and at the nowcasting horizon due to the timeliness. The contribution of the payments data and ML models is small and linear during low and normal economic growth periods. However, their contribution is large, asymmetrical, and nonlinear during crises such as COVID-19. Moreover, we propose a cross-validation approach to mitigate overfitting and use tools to overcome interpretability in the ML models to improve their effectiveness for policy use.
... DNNs, like regular ANNs, can model complex non-linear relationships. DNNs have the advantage of being able to model complex data with fewer units (nodes) than similarly performed ANNs [21,[25][26][27][28]. The DNN is trained using a standard-error backpropagation algorithm, and the weights are updated through stochastic gradient descent. ...
... The structure of a deep neural network[21,[25][26][27][28]. ...
Drought has significant impacts on both society and the environment, but it is a gradual and comprehensive process that affects a region over time. Therefore, non-structural measures are necessary to prepare and respond to the damage caused by drought in a flexible manner according to the stage of drought. In this study, an AI-based water demand prediction model was developed using deep neural network (DNN) and long short-term memory (LSTM) models. The model was trained from 2004 to 2015 and verified from 2016 to 2021. Model accuracy was evaluated using data, with the LSTM model achieving a correlation coefficient (CC) of 0.95 and normalized root mean square error (NRMSE) of 8.38, indicating excellent performance. The probability of the random variable X falling within the interval [a,b], as described by the probability density function f(x), was calculated using the water demand data. The cumulative distribution function was used to calculate the probability of the random variable being less than or equal to a specific value. These calculations were used to establish the criteria for each stage of the crisis alert system. Decision tree (DT) and random forest (RF) models, based on AI-based classification, were used to predict water demand at the Gurye intake station. The models took into account the impact of water demand from the previous day, as well as the effects of rainfall, maximum temperature, and average temperature. Daily water demand data from the Gurye intake station and the previous day’s rainfall, maximum temperature, and average temperature data from a nearby observatory were collected from 2004 to 2021. The models were trained on data from 2004 to 2015 and validated on data from 2016 to 2021. Model accuracy was evaluated using the F1-score, with the random forest model achieving a score of 0.88, indicating excellent performance.
... The depth of a deep neural network increases in size by a generic procedure of adding and training one or more layers, until it can make good predictions on a new set of .(data (Bengio 2009 A convolutional neural network (CNN) is a class of deep neural networks and has become an efficient tool for solving pattern recognition problems. CNN architecture typically consists of convolutional layers that are fully connected with pooling layers to extract essential features from the image and the fully connected layer is used as a classifier. ...
Hematopoiesis is a process in which hematopoietic stem cells produce other mature blood cells in the bone marrow through cell proliferation and differentiation. The hematopoietic cells are cultured on a petri dish to form a different colony-forming unit (CFU). The idea is to identify the type of CFU produced by the stem cell. Several software has been developed to classify the CFU automatically. However, an automated identification or classification of CFU types has become the main challenge. Most of the current software has common drawbacks, such as the expensive operating cost and complex machines. The purpose of this study is to investigate several selected convolutional neural network (CNN) pre-trained models to overcome these constraints for automated CFU classification. Prior to CFU classification, the images are acquired from mouse stem cells and categorized into three types which are CFU-erythroid (E), CFU-granulocyte/macrophage (GM) and CFU-PreB. These images are then pre-processed before being fed into CNN pre-trained models. The models adopt a deep learning neural network approach to extract informative features from the CFU images Classification performance shows that the models integrated with the pre-processing module can classify the CFUs with high accuracies and shorter computational time with 96.33% on 61 minutes and 37 seconds, respectively. Hence, this work finding could be used as the baseline reference for further research.
... The inherent non-linearity and ill-posed nature of the obstacle scattering inversion problem are further exacerbated in real-world conditions where varied forms of noise interfere with data collection and transmission, making the accurate deduction and reconstruction of obstacle geometric features problematic [5][6][7]. Machine learning's profound self-learning capabilities have been extensively highlighted in recent literature, pointing towards its potential efficacy in diminishing noise impacts and aptly managing the scattering inversion challenge [8][9][10]. In an attempt to tackle the scattering inversion problem, a synthesis of neural networks with the Tikhonov regularization strategy, termed NEET, was introduced by Li et al., displaying promising outcomes in sparse data environments [11]. ...
... However, training a CNN from scratch can be challenging due to the large amount of labelled data and the significant computational resources required. To address this challenge, researchers have proposed neural network training strategies to optimise the performance and accuracy of the model (Bengio 2009). One common approach is to use a pre-trained CNN and fine-tune it for a specific task. ...
Embedded applications are increasingly prevalent in various domains, from consumer electronics to industrial automation and smart cities. With the advances in integrated circuit manufacturing technologies, low-power chips can now execute complex algorithms, including machine learning models. However, the computational constraints of embedded devices require compact and efficient neural network models, as well as software frameworks and optimisation techniques tailored to their hardware resources. This study investigates the implementation of Convolutional Neural Network (CNN) models for gesture recognition on an STM32F4 microcontroller, by exploring the impact of freezing layers, fine-tuning and pruning techniques on pre-trained CNN models. The results demonstrate that fine-tuning and freezing layers improve accuracy by up to 18%. Additionally, freezing layers by 10% and 20% improved the accuracy. Finally, we demonstrate that pruning reduced the model size by 90%, enabling it to perform gesture recognition on small devices. These findings are significant for developing software and optimisation techniques for embedded systems, particularly in the context of the Internet of Things.
Much of the recent success of artificial neural networks needs to be attributed to larger systems of neurons, herein called architectures. This chapter exemplifies some of these systems on a conceptual level.
El avance de la tecnología ha llevado a la humanidad a bordes inexplorados de la ciencia y la innovación. Sin embargo, con cada avance vienen nuevas responsabilidades, dilemas éticos y desafíos que requieren de reflexión y adaptación. En este escenario, la inteligencia artificial (IA) emerge como una herramienta con un potencial transformador sin precedentes. Mientras que en algunos sectores esta transformación es obvia, en otros, como el sistema de justicia, plantea cuestiones profundamente arraigadas en la esencia misma de nuestros valores y estructuras sociales. La intersección de la inteligencia artificial y la justicia es un territorio emergente, lleno de posibilidades, pero también de riesgos. Es imperativo que abordemos este tema con la seriedad y profundidad que merece, asegurando que nuestras instituciones judiciales sigan siendo pilares de: Justicia, Igualdad y Equidad en esta nueva era tecnológica. La incorporación de la inteligencia artificial (IA) en diversas áreas de la vida cotidiana ha revolucionado la manera en que operan muchas industrias. El sistema de Justicia no es la excepción. Aunque su implementación ha suscitado tanto entusiasmo como preocupación, lo cierto es que la IA tiene el potencial de mejorar la eficiencia y precisión de los Procesos Judiciales. Este artículo explora las aplicaciones, beneficios y desafíos que presenta el uso de la IA en la Justicia, en busca de una nueva “Seguridad Jurídica Cibernética”.
For many types of machine learning algorithms, one can compute the statistically `optimal' way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance.
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.