ArticleLiterature Review

Deep Learning in Neural Networks: An Overview

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

In recent years, deep neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The reason for this, besides the hardware requirements, was the lack of internal structure and architecture of these approaches, which made them unsuitable for classical problems of higher reasoning. The problem of lack of structure has been mitigated somewhat in recent decades by the development of new special forms of artificial neural networks, as can be seen in [17]. Fig. 1 summarizes the division into symbol-based and connectionist approaches according to [13] and the categorization of the associated methods and systems according to [18]. ...
... It was not until the early 2000s that learning from representations could be transferred from theory to an efficiently usable practical application, as described in [17]. Initially, the input data is determined manually, but then the characteristics of the problem to be solved are determined independently by the system used. ...
... In addition, motivated by the successes of the early 2000s, a large number of specialized connectionist models were created or old specialized models were made usable in practice. These developments are described in [17]. The defining feature of deep connectionist models in deep learning is the high number of successive, distributed processing steps. ...
Preprint
Full-text available
Artificial intelligence (AI) has emerged as a transformative force across industries, driven by advances in deep learning and natural language processing, and fueled by large-scale data and computing resources. Despite its rapid adoption, the opacity of AI systems poses significant challenges to trust and acceptance. This work explores the intersection of connectionist and symbolic approaches to artificial intelligence, focusing on the derivation of interpretable symbolic models, such as decision trees, from feedforward neural networks (FNNs). Decision trees provide a transparent framework for elucidating the operations of neural networks while preserving their functionality. The derivation is presented in a step-by-step approach and illustrated with several examples. A systematic methodology is proposed to bridge neural and symbolic paradigms by exploiting distributed representations in FNNs to identify symbolic components, including fillers, roles, and their interrelationships. The process traces neuron activation values and input configurations across network layers, mapping activations and their underlying inputs to decision tree edges. The resulting symbolic structures effectively capture FNN decision processes and enable scalability to deeper networks through iterative refinement of subpaths for each hidden layer. To validate the theoretical framework, a prototype was developed using Keras .h5-data and emulating TensorFlow within the Java JDK/JavaFX environment. This prototype demonstrates the feasibility of extracting symbolic representations from neural networks, enhancing trust in AI systems, and promoting accountability.
... lıklar, öhrenme sürecinde ayarlanarak modelin performansını optimize eder. Aktivasyon fonksiyonları, nöronun çıkışını belirler ve yaygın olarak sigmoid, tanh ve ReLU kullanılır (Schmidhuber, 2015). ...
... Öhrenme oranı gibi parametreler, bu sürecin etkinli. hini etkiler (Schmidhuber, 2015). ...
Chapter
Full-text available
Bu çalışmada, ATP yazılımı kullanılarak enerji nakil hatlarında meydana gelen üç faz toprak arızaları modellenmiş, bu arızalara ait akım ve gerilim sinyalleri Goertzel algoritması ile işlenmiştir. İşlenen sinyaller kullanılarak arıza noktasının tahmini için Yapay Sinir Ahları (YSA), Destek Vektör Regresyonu (SVR) ve Polinom Regresyonu (PR) modelleri geliştirilmiş ve bu modellerin performansları karşılaştırılmıştır.
... This emphasizes the urgent need for objective, widely accepted artificial intelligence (AI) tools to support multicenter PET/CT IQA. Recent advances in computational power have enabled promising AI-driven solutions [4], with deep learning (DL) [5,6] demonstrating notable capabilities in extracting relevant patterns from large datasets [7]. DL has already achieved impressive results in medical image analysis tasks, such as reconstruction, denoising, registration, segmentation, and modeling [8][9][10][11][12]. ...
Article
Full-text available
Background The quality of clinical PET/CT images is critical for both accurate diagnosis and image-based research. However, current image quality assessment (IQA) methods predominantly rely on handcrafted features and region-specific analyses, thereby limiting automation in whole-body and multicenter evaluations. This study aims to develop an expert-perceptive deep learning-based IQA system for [18F]FDG PET/CT to tackle the lack of automated, interpretable assessments of clinical whole-body PET/CT image quality. Methods This retrospective multicenter study included clinical whole-body [18F]FDG PET/CT scans from 718 patients. Automated identification and localization algorithms were applied to select predefined pairs of PET and CT slices from whole-body images. Fifteen experienced experts, trained to conduct blinded slice-level subjective assessments, provided average visual scores as reference standards. Using the MANIQA framework, the developed IQA model integrates the Vision Transformer, Transposed Attention, and Scale Swin Transformer Blocks to categorize PET and CT images into five quality classes. The model’s correlation, consistency, and accuracy with expert evaluations on both PET and CT test sets were statistically analysed to assess the system’s IQA performance. Additionally, the model’s ability to distinguish high-quality images was evaluated using receiver operating characteristic (ROC) curves. Results The IQA model demonstrated high accuracy in predicting image quality categories and showed strong concordance with expert evaluations of PET/CT image quality. In predicting slice-level image quality across all body regions, the model achieved an average accuracy of 0.832 for PET and 0.902 for CT. The model’s scores showed substantial agreement with expert assessments, achieving average Spearman coefficients (ρ) of 0.891 for PET and 0.624 for CT, while the average Intraclass Correlation Coefficient (ICC) reached 0.953 for PET and 0.92 for CT. The PET IQA model demonstrated strong discriminative performance, achieving an area under the curve (AUC) of ≥ 0.88 for both the thoracic and abdominal regions. Conclusions This fully automated IQA system provides a robust and comprehensive framework for the objective evaluation of clinical image quality. Furthermore, it demonstrates significant potential as an impartial, expert-level tool for standardised multicenter clinical IQA.
... Within the above issue, this article presents the application of the criterium proposed by the intermittency ratio metric (IR) [4] for acoustic event detection together with deep learning (DL) techniques for the classification of source(s) producing such events (e.g., automatically categorizing the detected events by convolutional neural network (CNN) [14]). The integration of these methods provides a dual-layered analysistemporal and categoricalthat offers more and deeper insights into urban noise. ...
Article
Full-text available
Urban environments are characterized by a complex interplay of various sound sources, which significantly influence the overall soundscape quality. This study presents a methodology that combines the intermittency ratio (IR) metric for acoustic event detection with deep learning (DL) techniques for the classification of sound sources associated with these events. The aim is to provide an automated tool for detecting and categorizing polyphonic acoustic events, thereby enhancing our ability to assess and manage environmental noise. Using a dataset collected in the city center of Barcelona, our results demonstrate the effectiveness of the IR metric in successfully detecting events from diverse categories. Specifically, the IR captures the temporal variations of sound pressure levels due to significant noise events, enabling their detection but not providing information on the associated sound sources. To fill this weakness, the DL-based classification system, which uses a MobileNet convolutional neural network, shows promise in identifying foreground sound sources. Our findings highlight the potential of DL techniques to automate the classification of sound sources, providing valuable insights into the acoustic environment. The proposed methodology of combining the two above techniques represents a step forward in automating acoustic event detection and classification in urban soundscapes and providing important information to manage noise mitigation actions.
... However, such techniques often have difficulty in truly uncovering anomalies in signals, which are easily apparent to experienced observers through visual inspection. Comparison in spikes or irregularities across multiple examples of signals easily reveals variations indicative of wheel damage to a trained eye, as shown in Figure 2. Similarly to how MNIST visually treats handwritten digits [35], one can visually learn and recognize patterns and shapes within a grid-like topology of time series data [34,59]. For instance, one can apply 1D convolution on the time axis, with CNNs learning temporal features and capturing dependencies at multiple time scales through convolutional filters, and pooling layers for dimensionality reduction [51]. ...
Preprint
The integration of advanced sensor technologies with deep learning algorithms has revolutionized fault diagnosis in railway systems, particularly at the wheel-track interface. Although numerous models have been proposed to detect irregularities such as wheel out-of-roundness, they often fall short in real-world applications due to the dynamic and nonstationary nature of railway operations. This paper introduces BOLT-RM (Boosting-inspired Online Learning with Transfer for Railway Maintenance), a model designed to address these challenges using continual learning for predictive maintenance. By allowing the model to continuously learn and adapt as new data become available, BOLT-RM overcomes the issue of catastrophic forgetting that often plagues traditional models. It retains past knowledge while improving predictive accuracy with each new learning episode, using a boosting-like knowledge sharing mechanism to adapt to evolving operational conditions such as changes in speed, load, and track irregularities. The methodology is validated through comprehensive multi-domain simulations of train-track dynamic interactions, which capture realistic railway operating conditions. The proposed BOLT-RM model demonstrates significant improvements in identifying wheel anomalies, establishing a reliable sequence for maintenance interventions.
... The new imaging technologies such as MRI scans CT scans and personalized cardiology, including ultrasound imaging like echocardiogram scans, are used to evaluate blocks, understand the general and specific condition of the heart and analyze the progression of diseases (Schmidhuber, 2015). These techniques have come a long way in providing more precise and specific diagnostic indications, which are essential for diagnosis of the disease and management. ...
Article
Full-text available
Cardiovascular diseases are the most common diseases around the world and result in high morbidity and mortality rates. It proves the need to develop new approaches to the disease's early diagnosis and prevention. Portable health monitoring is characterized by state-of-the-art sensors. It produces a continuous flow of physiological data in real-time. It presents a great opportunity for preventative Cardiovascular disease management. Using deep learning algorithms that is used to accurately and effectively. It creates preventive early diagnosing models for diseases. The authors discuss the integration of wearable health technology and deep learning towards the improvement of individual-oriented technologies in cardiology. The collected data are cleaned for noise and further normalized to provide uniform input in deep learning models. To detect anomalies that are likely to lead to cardiovascular risks, state-of-the-art algorithms. The models are trained and validated using a dataset of wearable device data and clinically diagnosed Cardiovascular disease cases. Wearable health devices combined with deep learning algorithms provide an innovative platform for cardiovascular disease screening and prevention. The incorporation of the proposed system proved its efficiency and accuracy in the identification of potential probable Cardiovascular disease risk, making it possible for early, real-time, noninvasive, and personalized healthcare services. This approach improves Harnessing Wearable Health Data And … Md Alamgir Miah, et al. 327 Nanotechnology Perceptions 15 No. 3 (2019) 326-349 early detection but also provides handy information to the patients. This type of research includes the acquisition of larger databases and better models, and incorporating those into more offices and healthcare practices.
... Convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformer models (e.g., BERT, GPT) significantly enhanced AI's proficiency in natural language processing (NLP), image recognition, and autonomous decision-making. Today, AI-powered systems are deployed in autonomous vehicles, medical diagnosis, and digital assistants [4]. ...
Article
Full-text available
The rapid development of artificial intelligence (AI) is transforming the job market and infrastructure, bringing both opportunities and challenges. This paper examines AI’s multidimensional impact by combining literature review and empirical research. It first explores the evolution of AI technology and its key drivers, laying the groundwork for assessing its broader implications. The study then analyzes AI’s effects on the job market, focusing on mechanisms of job displacement, transformation, and creation, supported by a regression model to quantify employment trends. Furthermore, it investigates AI’s role in higher education infrastructure, using university libraries as an example to illustrate how AI enhances library automation, resource management, and personalized services. The findings indicate that AI’s impact is highly sector-specific and influenced by automation intensity, workforce adaptability, and institutional policies. The paper concludes with policy recommendations emphasizing reskilling programs, ethical AI governance, and strategies for sustainable AI integration in education and employment sectors.
... Deep learning algorithms exhibit significant proficiency in visual pattern identification. RNNs have the capacity to adeptly address the previously described problem, mostly because of their robust modeling proficiency for sequential data (Schmidhuber, 2015). According to Zhao et al. (2018a), they came up with a new way to use a recurrent neural network and an altered motion impact map to find, identify, and report any strange behaviors in a school of fish in intensive aquaculture. ...
Article
Full-text available
Deep learning (DL) has changed aquaculture by offering automated solutions for species identification, health assessment, biomass calculation, feeding optimization, and water quality forecasting. Conventional aquaculture encounters obstacles like ineffective resource management, disease epidemics, and environmental deterioration; nevertheless, deep learning applications provide intelligent decision-making skills that improve sustainability and economic feasibility. This study carefully looks at 41 peer-reviewed papers from 2015 to 2024 to find out how useful deep learning is in aquaculture. It focuses on main AI models such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and generative adversarial networks (GANs). The results indicate that AI-driven solutions boost fish health evaluations, optimize feeding strategies, and improve water quality monitoring, hence minimizing waste and augmenting production efficiency. Nonetheless, obstacles like substantial computing demands, dataset restrictions, and regulatory limits impede extensive implementation. Comparative assessments demonstrate that deep learning models surpass conventional aquaculture methods in precision and prediction efficacy. In the future, researchers should investigate new AI technologies like federated learning, edge computing, and AI-integrated robotics to make deep learning easier to use and more scalable for aquaculture applications. By surmounting these obstacles and utilizing advanced AI technology, aquaculture may evolve into a more sustainable, efficient, and intelligent sector.
... Machine Learning methods like deep learning with neural networks are showing promising results (e.g. Schmidhuber et al. [31]), but lack in interpretability of the found solutions. Symbolic Regression on the other hand is delivering explainable models by learning analytical connections from data input. ...
Thesis
Full-text available
In quantum chemistry, material science, physics and other fields, modeling atoms and molecular systems is becoming increasingly popular over the last decades. Approaches, like the Hartree-Fock method (HF), do not include the total electronic energy as compared to more advanced ones (e.g. Coupled Cluster), which are computationally much more demanding and therefore several orders of magnitudes slower to simulate the required task. The difference of HF and post-HF methods is improved by adding the London-dispersion interaction, an attractive van der Waals force. While its principle dependence on interatomic distance is well known, several improvements have been suggested in the past. In this work interpretable correlations for this correction are searched using a machine learning method called Symbolic Regression and the data input of atomic pairs moving apart from each other.
... Gradient descent-based optimization methods [1] serve as the foundation of modern machine learning [2,3]. The basic stochastic gradient descent (SGD) algorithm [4,5] relies on noisy gradient estimates, while momentum methods [6,7] enhance convergence through smoothed updates. ...
Preprint
We present a manifestly covariant formulation of the gradient descent method, ensuring consistency across arbitrary coordinate systems and general curved trainable spaces. The optimization dynamics is defined using a covariant force vector and a covariant metric tensor, both computed from the first and second statistical moments of the gradients. These moments are estimated through time-averaging with an exponential weight function, which preserves linear computational complexity. We show that commonly used optimization methods such as RMSProp and Adam correspond to special limits of the covariant gradient descent (CGD) and demonstrate how these methods can be further generalized and improved.
... It has been proposed that artificial intelligence could deal with these issues for parameter estimation 27 . Moreover, deep learning (DL) 28 can learn from large amounts of complex data and has various applications in physics [29][30][31][32][33][34][35][36][37] . Here we employ DL and demonstrate that it offers a promising approach for continuously tracking complicated time-dependent signals using spin sensors. ...
Article
Full-text available
Squeezing and entanglement play crucial roles in approaches for quantum metrology. Yet, demonstrating quantum enhancement in continuous signal tracking remains a challenging endeavour because simultaneous entanglement generation and signal perturbations are often incompatible. We demonstrate that concurrent steady-state spin squeezing and sensing are possible using continuous quantum non-demolition measurements under constant optical pumping. We achieve a sustained spin-squeezed state with a large ensemble of hot atoms using metrologically relevant steady-state squeezing. We further employ the system to track different types of continuous time-fluctuating magnetic fields, and we demonstrate the use of deep learning models to infer the time-varying fields from an optical measurement. The quantum enhancement due to spin squeezing was verified by a degraded performance in test experiments where the spin squeezing was deliberately prevented. These results represent an advance in continuous quantum-enhanced metrology with entangled atoms, including the training and application of a deep neural network to infer complex time-dependent perturbations.
... P REDICTING protein-ligand (PL) interactions is crucial in drug discovery, as it facilitates the screening of drug candidates from large compound libraries against specific protein targets [1], [2], [3]. Recent advancements in deep learning [4], [5] have significantly impacted this domain by speeding up the identification and development of novel therapeutic compounds [6], [7], [8], [9], [10]. A pivotal aspect of applying deep learning to PL interactions is the choice of molecular representation. ...
Article
Accurate prediction of the drug binding between proteins and ligands can significantly advance the development of structure-based drug design. Recent advances have shown great potential in applying equivariant graph neural network (EGNN)-based methods to learn representations of protein-ligand (PL) complexes. However, most of them typically focus on atom-level graph representations and omit the residue-level information in PL complexes, which are considered essential for understanding the binding mechanism. In this article, we develop a SO(3)-equivariant hierarchical graph neural network (EHGNN) that effectively captures the intrinsic hierarchy of biomolecular structures to enhance the predictive performance of PL interactions. Based on the SO(3)-EHGNN, we further propose a molecular dynamics-powered and energy-guided deep learning framework, called Dynamics-PLI, to capture the spatial structures and energetic information inside molecular dynamic (MD) trajectories. Extensive experimental results show significant improvements over current state-of-the-art methods, with a decrease of 4.03% in RMSE for the binding affinity problem and an average increase of 3.95% in AUROC and AUPRC for the ligand efficacy problem, demonstrating the superiority of Dynamics-PLI for PL interaction prediction. Our findings indicate that the SO(3)-EHGNN exhibits enhanced performance without the necessity of pre-training, emphasizing the inherent analytical strength of SO(3)-EHGNN.
... Advancements in image processing, fueled by the convergence of big data technologies and machine learning algorithms, underscore the pivotal role of Artificial Neural Networks (ANNs) like MLP, CNN, and RNN (Liang et al. 2015). With its ability to introduce intricate model structures and adapt to changing data, deep learning enables ordered data representation across multiple generalization levels (Schmidhuber et al. 2015) This paper used the VGG-16 model, devised by Simonyan and Zisserman in 2014, which stands out with its thirteen convolutional layers, two fully connected layers, and a SoftMax classifier. The design incorporates layered convolutional layers with various filter sizes, culminating in a 1000-unit SoftMax output layer (Tammina et al. 2019). ...
... The feedforward neural network (FNN) is the earliest and simplest form of artificial neural networks (ANNs) introduced [43]. In this network structure, data flow in a single direction, from the input layer to the output layer. ...
Article
Full-text available
Quantum coherence is a crucial resource in numerous quantum processing tasks. The robustness of coherence provides an operational measure of quantum coherence, which can be calculated for various states using semidefinite programming. However, this method depends on convex optimization and can be time-intensive, especially as the dimensionality of the space increases. In this study, we employ machine learning techniques to quantify quantum coherence, focusing on the robustness of coherence. By leveraging artificial neural networks, we developed and trained models for systems with different dimensionalities. Testing on data samples shows that our approach substantially reduces computation time while maintaining strong generalizability.
... La posibilidad de representar la señal de llanto como imagen por medio de espectrogramas abre una nueva perspectiva para el uso de la representación visual del espectro de frecuencias de una señal de llanto a medida que varía con el tiempo. Y la ventaja de esta técnica es que actualmente existen modelos de aprendizaje profundo muy potentes que están hechos específicamente para clasificar imágenes de manera muy eficiente 10 . Las arquitecturas de aprendizaje profundo como las redes neuronales convolucionales (CNN) se vienen aplicando con éxito en múltiples campos de la ciencia con bastante éxito (en algunos casos superiores al desempeño humano experto). ...
... 1990'lı ve 2000'li yıllarda Yapay Sinir Ağlarının (YSA) tahminleme maliyetinden ötürü, avantajlı olmasının yanı sıra Destek Vektör Makineleri (Support Vector Machine) probleme özel çalışan, otomatik olmayan özellikleri yönetmekten ve yalın modellerin kullanılması güncel bir seçim olarak kullanılmıştır (Cortes ve Vapnik, 1995). Bilgisayarların çok daha hızlı çalışmaya başlaması ve grafik işlemci ünitelerinin (GPU) kurulması ile birlikte köklü değişiklikler gerçekleştirilmiştir. İleri gelen 10 yıllık süre içerisinde işlem hesaplama hızını olduğundan 1000 kat daha hızlı gerçekleştirerek; destek vektör makinelerine, sinir ağları alternatif olarak ortaya çıkmıştır (Schmidhuber, 2015). 2000 yılında Igor Aizenberg, Naum Aizenberg ve Joss Vandewalle tarafından 'Derin Öğrenme' YSA bağlamında tanıtılmıştır (Aizenberg vd., 2000). ...
Chapter
Full-text available
... Deep learning has gained widespread attention due to its advantages in handling large-scale data and its ability to automatically learn features from the data [26], [27]. RNNs and their improved versions, LSTM networks, have been widely used in trajectory prediction due to their advantages in handling time-series data. ...
Article
Full-text available
Aircraft trajectory prediction using deep learning models has become a significant challenge in the field of civil aviation. Traditional convolutional neural network (CNN) models are difficult to capture the global features of the aircraft trajectories due to the fixed size of the receptive field of the convolution kernel, whereas attentional mechanism based Transformer, despite being able to handle long-term dependencies, has a high computational complexity and lacks local inductive bias. This paper proposes a multi-scale feature fusion architecture for aircraft trajectory prediction, called CALFUSE-KAN. This method employs CNN to extract local spatial features, Multi-Head Self-Attention (MHSA) to capture global spatial information, and an adaptive weighted feature fusion block for dynamic feature balancing, subsequently utilizing Long Short-Term Memory (LSTM) to effectively capture temporal dependencies. Moreover, integrating Kolmogorov–Arnold Network (KAN) to model high-dimensional nonlinear relationships further enhances prediction accuracy. Experiment demonstrates that the CALFUSE-KAN model reduces the average prediction error by 80.02%, 63.38%, 38.39%, and 33.49% compared to the LSTM, Transformer, CNN-KAN, and CNN-BiGRU models, respectively. These results indicate that the proposed hybrid model achieves high accuracy and robustness in predicting flight trajectories.
Article
Full-text available
This article explores the nuanced interplay between Machine Learning (ML) and human rights, emphasizing the need for a balanced approach amid evolving technology. ML's potential to empower global human rights initiatives is examined through applications like sentiment analysis and predictive policing, while acknowledging associated risks such as privacy concerns and algorithmic biases. The narrative underscores the importance of ethical guidelines, transparency, and comprehensibility in ML algorithms, aligning with human rights principles. It advocates for a human-centric development approach, focusing on augmenting human capabilities rather than replacement, and calls for inclusive development to ensure universal ML benefits. The article concludes by highlighting the imperative for collaborative efforts among technologists, policymakers, and human rights advocates to navigate this intricate intersection effectively. keywords: Machine Learning, Human Rights, Ethical Guidelines, Inclusive Development, Human-Centric Approach, Technology, Innovations, Risk Mitigation.
Article
Full-text available
A potential solution to the severe ecological problems that mankind is now facing may lie at the crossroads of sustainability and artificial intelligence (AI). the function of artificial intelligence in promoting sustainability initiatives in a range of fields, such as conservation, energy management, resource optimisation, and climate modelling. Artificial intelligence (AI) solutions provide new ways to improve resource efficiency, decrease carbon emissions, and encourage environmental stewardship by utilising data analytics, machine learning, and predictive modelling. More nimble reactions to opportunities and risks in the environment are possible with the help of AI-driven systems that integrate real-time monitoring, decision-making, and adaptive management tactics. Adopting AI in sustainability efforts brings up ethical, socioeconomic, and governance concerns that need careful evaluation, in addition to its possible advantages. these difficulties and emphasises the significance of transparent, equitable, and accountable AI deployment practices. The ultimate goal of using AI for good is to help stakeholders work together to build a better, more sustainable world that can withstand the test of time.
Article
Artificial Intelligence (AI) has become a transformative tool in scientific research, reshaping traditional methodologies by enabling advanced data analysis, hypothesis testing, and predictive modeling. The integration of machine learning (ML), deep learning (DL), and natural language processing (NLP) has significantly accelerated discoveries in medicine, physics, chemistry, environmental science, and other disciplines. AI-driven technologies allow researchers to process large datasets, identify complex patterns, and generate predictive insights with unprecedented accuracy and speed. These innovations have led to breakthroughs in drug discovery, climate modeling, quantum physics simulations, and genetic research, demonstrating AI’s potential to enhance efficiency, automation, and precision in scientific investigations. Despite its numerous advantages, AI-driven research presents challenges, including ethical concerns, algorithmic bias, data security risks, and high computational demands. The reliance on large datasets and complex AI models raises concerns about data privacy, model transparency, and fairness in scientific conclusions. Additionally, AI systems require high-performance computing resources, making accessibility and affordability key concerns for many research institutions. Addressing these challenges through robust regulatory frameworks, ethical AI development, and improved AI model interpretability is crucial for ensuring responsible AI-driven scientific exploration. This study explores AI’s impact on scientific research, analyzing its applications, benefits, and challenges. The findings are supported by statistical data and two tables, illustrating AI’s adoption trends, efficiency improvements, and transformative role in modern research. Future advancements, such as AI-augmented automation, AI-driven robotics, and interdisciplinary AI applications, will further revolutionize scientific inquiry, making AI an indispensable tool for data-driven discovery and innovation.
Article
Recent trends in deep learning (DL) have made hardware accelerators essential for various high-performance computing (HPC) applications, including image classification, computer vision, and speech recognition. This survey summarizes and classifies the most recent developments in DL accelerators, focusing on their role in meeting the performance demands of HPC applications. We explore cutting-edge approaches to DL acceleration, covering not only GPU- and TPU-based platforms but also specialized hardware such as FPGA- and ASIC-based accelerators, Neural Processing Units, open hardware RISC-V-based accelerators, and co-processors. This survey also describes accelerators leveraging emerging memory technologies and computing paradigms, including 3D-stacked Processor-In-Memory, non-volatile memories like Resistive RAM and Phase Change Memories used for in-memory computing, as well as Neuromorphic Processing Units, and Multi-Chip Module-based accelerators. Furthermore, we provide insights into emerging quantum-based accelerators and photonics. Finally, this survey categorizes the most influential architectures and technologies from recent years, offering readers a comprehensive perspective on the rapidly evolving field of deep learning acceleration.
Chapter
This study examines the role of artificial intelligence (AI) in promoting sovereignty and carbon neutrality, with a focus on digital inclusion and climate-resilient AI strategies for emerging markets. By integrating insights from previous studies on AI's contribution to carbon neutrality and digital inclusion within the context of climate change, the chapter highlights the importance of technology policy frameworks in shaping effective AI strategies. Additionally, the chapter explores the use of confirmatory factor analysis (CFA) and exploratory factor analysis (EFA) in evaluating and validating the impact of AWI on economic resilience. These statistical methods are employed to identify and confirm key factors influencing the successful implementation of AI in climate action. The study also incorporates fuzzy-set qualitative comparative analysis (fsQCA) to assess the complex interplay of multiple factors contributing to AI-driven solutions.
Article
The MAXED and GRAVEL unfolding algorithms have been used to determine cross-sections, with the NAXSUN method developed at JRC-Geel. This study explores the potential of a particular type of artificial neural network, the multilayer perceptron (MLP), as an alternative to traditional unfolding algorithms. By generating a training dataset using the TALYS 2.0 code and testing the MLP model on real experimental data, we compared the effectiveness of MLP in unfolding neutron-induced reactions cross sections involving indium and rhenium isotopes. The results were benchmarked against those obtained using standard unfolding algorithms and TALYS 2.0 simulations, demonstrating the advantages and limitations of the ANN approach. The obtained results show a much-reduced corridor of uncertainty in the derived cross-section curves compared to previous work using traditional unfolding techniques.
Chapter
Malicious intrusions are constant threats to networks, intrusion detection systems (IDS) have been developed to identify and classify these attacks to prevent them from occurring. However, the accuracy and efficiency of these systems are still not satisfactory. Currently, the actual detection accuracy of some detection models is relatively low. Most earlier research approaches relied on regular neural networks, which had low accuracy. The IDS works well while machine learning (ML) and especially deep learning (DL) algorithms are employed to identify and prevent various threats. To solve these problems, an enhancing Bidirectional long short-term memory (Bi-LSTM) model has been proposed in this paper. To train our model, the up-to-date publicly available NSL-KDD dataset is introduced, which is a widely used benchmark dataset in the field of intrusion detection. We have performed 10-fold cross-validation to demonstrate the unbiasedness of the results. Furthermore, we compare the enhancing Bi-LSTM model with existing classifiers. Our proposed model achieves impressive results in terms of accuracy up to 97,93%.
Chapter
In many respects, computer vision practitioners are now being outpaced by neuroscientists, who are leading the way, modeling computer vision systems directly after neurobiology, and borrowing from computer vision and imaging to simulate the biology and theories of the human visual system. The state of the art in computer vision is rapidly moving toward synthetic brains and synthetic vision systems, similar to other biological sciences where we see synthetic biology such as prosthetics, robotics, and genomic engineering. Computer vision is becoming a subset of neuroscience and vision sciences, where researchers implement complete synthetic vision models, leveraging computer vision and imaging methods, leaving some computer vision, and imaging methods in the wake of history.
Article
Neural networks are a key component of formulation design, and the integration of artificial intelligence (AI) into drug development is revolutionizing the pharmaceutical industry. To solve the issues of cost, accuracy, and efficiency, AI-powered models—in particular, deep learning networks—are being used more and more to forecast and optimize medication compositions. Neural networks are capable of predicting solubility, stability, and bioavailability as well as suggesting optimal compositions by examining large datasets and identifying non-linear correlations between formulation components. The time required to produce new medications is greatly decreased by this methodology, which speeds up the conventional trial-and-error method. AI may also improve personalized medicine by customizing medication formulas to meet the demands of each patient. The use of neural networks in drug formulation is examined in this research, which also highlights recent developments, difficulties, and potential paths for AI-powered drug development.
Chapter
In this chapter, we look at a wide range of feature learning architectures and deep learning architectures, which incorporate a range of feature models and classification models. This chapter digs deeper into the background concepts of feature learning and artificial neural networks summarized in the taxonomy of Chap. 9, and complements the local and regional feature descriptor surveys in Chaps. 3, 4, 5, and 6. The architectures in the survey represent significant variations across neural-network approaches, local feature descriptor and classification-based approaches, and ensemble approaches. The architecture taken together as the sum of its parts is apparently more important than individual parts or components of the design, such as the choice of feature descriptor, number of levels in the feature hierarchy, number of features per layer, or the choice of classifier. Good results are being reported across a wide range of architectures.
Article
Full-text available
The evolution of machine learning (ML) has ushered in a new era of data-driven decision-making, where adaptive algorithms play a pivotal role in harnessing complex datasets. This paper delves into the diverse paradigms of ML, emphasizing the significance of adaptive algorithms and the insights derived from data-centric approaches. By exploring the interplay between various learning paradigms and adaptive methodologies, we aim to provide a comprehensive understanding of how data-driven insights can be effectively utilized across different domains.
Article
Full-text available
The distinction between riming and aggregation is of high relevance for model microphysics, data assimilation, and warnings of potential aircraft hazards due to the link between riming and updrafts as well as the presence of supercooled liquid water in the atmosphere. Even though the polarimetric fingerprints for aggregation and riming are qualitatively similar, we hypothesize that it is feasible to implement an area-wide discrimination algorithm based on national polarimetric weather radar networks only. Quasi-vertical profiles (QVPs) of reflectivity (ZH), differential reflectivity (ZDR), and estimated depolarization ratio (DR) are utilized to learn about the information content of each individual polarimetric variable and their combinations for riming detection. High-resolution Doppler spectra from the vertical (birdbath) scans of the C-band radar network of the German Meteorological Service serve as input and ground truth for algorithm development. Mean isolated spectra profiles (MISPs) of the Doppler velocity are used to infer regions with frozen hydrometeors falling faster than 1.5 m s⁻¹ and accordingly associated with significant riming. Several machine learning methods have been tested to detect riming from the corresponding QVPs of polarimetric variables. The best-performing algorithm is a fine-tuned gradient-boosting model based on decision trees. The precipitation event on 14 July 2021, which led to catastrophic flooding in the Ahr valley in western Germany, was selected to validate the performance. Considering balanced accuracy, the algorithm is able to correctly predict 74 % of the observed riming features; thus, the feasibility of reliable riming detection with national radar networks has been successfully demonstrated.
Article
The advantages of guided wave detection, such as its ability to propagate over long distances and penetrate deeply, have led to its application in the field of anisotropic damage detection in carbon fiber-reinforced polymer (CFRP). Due to the anisotropy of CFRP, traditional guided wave-based detection methods have difficulty in precisely locating the defect. In this study, we proposed a novel deep learning-based detection method for CFRP by employing image recognition technology for guided wave field inspection. This method is capable of rapidly and accurately extracting defective features from the structure, thereby facilitating precise damage identification. To avoid time-consuming sample data generation by simulation for CFRP, the steady-state guided wave field of the aluminum plates was simulated instead. The isotropic wave field data were then stretched and applied for neural network training.
Article
Steganography is a critical information‐hiding technique widely used for the covert transmission of secret information on social media. In contrast, steganalysis plays a key role in ensuring information security. Although various effective steganalysis algorithms have been proposed, existing studies typically treat color images as three independent channels and do not fully consider robust features suitable for JPEG images. To address this limitation, we propose a robust steganalysis algorithm based on high‐dimensional mapping. By analyzing the changes in color images during the JPEG compression and decompression processes, we observe that the embedding of secret information causes shifts in the JPEG coefficients, which subsequently affects feature representation during decompression. Based on this observation, our method captures steganographic traces by utilizing the transformation errors produced during decompression. Additionally, due to the imbalance between luminance and chrominance, the feature weights of each channel are uneven. To ensure balanced analysis across the three channels, we adjust the distribution differences of each channel through high‐dimensional mapping, thereby reducing intraclass feature variations. Experimental results demonstrate that the proposed method outperforms existing approaches in most cases.
Article
Zusammenfassung Die Kunst der Vorhersage ist seit jeher ein wesentlicher Bestandteil des ärztlichen Handelns. In der frühen Geschichte eher intuitiv und mit übersinnlichen verknüpft, vertrauen Patienten heute auf unsere wissenschaftlich-medizinischen Kenntnisse, um verlässliche medizinische Vorhersagen zu erhalten. Dabei gilt es Wahrscheinlichkeiten einzuschätzen, ob ein bestimmter Gesundheitszustand vorliegt – Diagnostik, und ob ein bestimmtes Ereignis in der Zukunft eintreten wird – Prognostik. Künstliche Intelligenz (KI) ist gerade dabei eine unschlagbare Vorhersage-Kompetenz in der Medizin zu entwickeln – ein Potenzial, das wir zum Wohle unserer Patienten nutzen können. Gleichzeitig stellt diese Entwicklung eine Herausforderung für das ärztliche Selbstverständnis dar. Diese narrative Übersichtsarbeit beleuchtet die Rolle von KI in der Wirbelsäulenchirurgie, mit besonderem Fokus auf die Vorhersage klinischer Ergebnisse. Ziel ist es, dem Leser ein Verständnis der aktuellen Entwicklungen in der KI zu vermitteln, sie einzuordnen und ihre Bedeutung für die Zukunft unseres Berufsbildes zu reflektieren.
Article
In-memory computing offers a transformative alternative to traditional von Neumann architecture, with memristors enabling accelerated, low-power computation. Halide perovskites, known for ion migration with low activation energy and synapse-like switching behavior, hold great potential but face challenges in conductance linearity and predictability. Here, we report flexible lead-free Cs3Bi2I9 8 × 8 crossbar memristors exhibiting bipolar resistive switching with a high on/off ratio (106), endurance (104 cycles), long retention (105 s), and a device yield exceeding 93%. Electrical pulse engineering reveals synaptic behaviors such as paired-pulse facilitation, potentiation, and depression with excellent linearity and minimal variability. In situ training of artificial neural networks, including MLP and VGG-8, achieves 88.19% accuracy on reduced MNIST and 91.38% on CIFAR-10 data sets. This work demonstrates energy-efficient, high-performance neuromorphic hardware, paving the way for advanced parallel computing to address the growing demands of AI and data science.
Article
Full-text available
Discussing deep learning, not just talking about Mindful, meaningful and joyful learning. However, there is a further leap regarding educational transformation. The history of applying deep learning in the classroom reflects a long journey from initial theory to transformational practical applications. Despite facing many challenges, this technology continues to develop and offers great potential to revolutionize learning. With the right investment in research and infrastructure, the future of deep learning-based education will become brighter. The deep learning curriculum in Indonesia must be designed to overcome challenges and take advantage of existing opportunities. By developing a comprehensive curriculum that includes theory, practical skills, ethics and real-world applications, Indonesia can produce a workforce that is skilled and ready to compete in the era of artificial intelligence. The government, educational institutions and industry need to work together to accelerate the implementation of this curriculum to drive broader digitaltransformation across the country.
Article
Car price prediction is an essential task in the automotive industry, benefiting manufacturers, dealers, and customers alike. The objective of this research is to predict the price of a car based on various parameters using Machine Learning (ML) techniques. In this study, we employ regression models such as Linear Regression, Decision Tree, and Random Forest to analyze and predict car prices. The dataset used contains multiple features, including brand, year, mileage, fuel type, and transmission type. The results indicate that the Random Forest model performs better than other models in terms of accuracy. The proposed model provides a reliable approach for estimating car prices, thereby aiding buyers and sellers in making informed decisions. Key Words: Car Price Prediction, Machine Learning, Regression Models, Random Forest, Linear Regression.
Article
Full-text available
Studies of cortical neurons in monkeys performing short-term memory tasks have shown that information about a stimulus can be maintained by persistent neuron firing for periods of many seconds after removal of the stimulus. The mechanism by which this sustained activity is initiated and maintained is unknown. In this article we present a spiking neural network model of short-term memory and use it to investigate the hypothesis that recurrent, or “re-entrant,” networks with constant connection strengths are sufficient to store graded information temporarily. The synaptic weights that enable the network to mimic the input-output characteristics of an active memory module are computed using an optimization procedure for recurrent networks with non-spiking neurons. This network is then transformed into one with spiking neurons by interpreting the continuous output values of the nonspiking model neurons as spiking probabilities. The behavior of the model neurons in this spiking network is compared with that of 179 single units previously recorded in monkey inferotemporal (IT) cortex during the performance of a short-term memory task. The spiking patterns of almost every model neuron are found to resemble closely those of IT neurons. About 40% of the IT neuron firing patterns are also found to be of the same types as those of model neurons. A property of the spiking model is that the neurons cannot maintain precise graded activity levels indefinitely, but eventually relax to one of a few constant activities called fixed-point attractors. The noise introduced into the model by the randomness of spiking causes the network to jump between these attractors. This switching between attractor states generates spike trains with a characteristic statistical temporal structure. We found evidence for the same kind of structure in the spike trains from about half of the IT neurons in our test set. These results show that the behavior of many real cortical memory neurons is consistent with an active storage mechanism based on recurrent activity in networks with fixed synaptic strengths.
Technical Report
Full-text available
In recent years connectionist models, or neural networks, have been used with some success in problems related to sensory perception, such as speech recognition and image processing. As these problems become more complex, they require larger networks, and this typically leads to very slow training times. Work on these has primarily involved the use of supervised models, networks with a 'teacher' which indicates the desired output. If it were possible to use unsupervised models in the early stages of systems to help with the solutions to these sensory problems, it might be possible to approach larger and more complex problems than are currently attempted. We may also gain more insight into the representation of sensory data used in such sensory systems, and this may also help with our understanding of biological sensory systems. In contrast to supervised models, unsupervised models are not provided with any teacher input to guide them as to what they should 'learn' to perform. In this thesis, an information-theoretic approach to this problem is explored: in particular, the principle that an unsupervised model should adjust itself to minimise the information loss, while in some way producing a simplified representation of its input data as output. Initially, general concepts about information theory, entropy and mutual information are reviewed, and some systems which use other information-theoretic principles are described. The concept of information loss and some of its properties are introduced, and this concept is related to Linsker's 'Infomax' principle. The information loss across supervised learning systems is briefly considered, and various conditions are described for a close match between minimisation of information loss and minimisation of various distortion measures. Next information loss across a simple linear network with one layer of processing units is considered. In order to progress, an assumption must be made concerning the noise in the system instead. With the noise on the input to the network dominant, a network which performs a type of principal component analysis is optimal. A common framework for various neural network algorithms which find principal components of their input data is derived, these are shown to be equivalent in an information transmission sense. The case of significant output noise, for position- and time-invariant signals is considered. Given a power cost constraint in our system, the form of the optimum linear filter required to minimise the information loss under this constraint is analysed. This filter changes in a non-trivial manner with varying noise levels, mirroring the way that the response of biological retinal systems changes as the background light level changes. When the output noise is dominant, the optimum configuration can be found by using anti-Hebbian algorithms to decorrelate the outputs. Various forms of networks of this type are considerd, and an algorithm for a novel Skew-Symmetric Network which employs inhibitory interneurons is derived, which suggests a possible role for cortical back-projections. In conclusion, directions for further work are suggested, including the expansion of this analysis for systems with various non-linearities; and general problems of representation of sensory information are discussed.
Article
Full-text available
How does the brain form a useful representation of its environment? It is shown here that a layer of simple Hebbian units connected by modifiable anti-Hebbian feed-back connections can learn to code a set of patterns in such a way that statistical dependency between the elements of the representation is reduced, while information is preserved. The resulting code is sparse, which is favourable if it is to be used as input to a subsequent supervised associative layer. The operation of the network is demonstrated on two simple problems
Chapter
Neural Networks for Control highlights key issues in learning control and identifies research directions that could lead to practical solutions for control problems in critical application domains. It addresses general issues of neural network based control and neural network learning with regard to specific problems of motion planning and control in robotics, and takes up application domains well suited to the capabilities of neural network controllers. The appendix describes seven benchmark control problems. Contributors Andrew G. Barto, Ronald J. Williams, Paul J. Werbos, Kumpati S. Narendra, L. Gordon Kraft, III, David P. Campagna, Mitsuo Kawato, Bartlett W. Met, Christopher G. Atkeson, David J. Reinkensmeyer, Derrick Nguyen, Bernard Widrow, James C. Houk, Satinder P. Singh, Charles Fisher, Judy A. Franklin, Oliver G. Selfridge, Arthur C. Sanderson, Lyle H. Ungar, Charles C. Jorgensen, C. Schley, Martin Herman, James S. Albus, Tsai-Hong Hong, Charles W. Anderson, W. Thomas Miller, III Bradford Books imprint
Chapter
This book brings together contributions to the Fourth Artificial Life Workshop, held at the Massachusetts Institute of Technology in the summer of 1994. July 6-8, 1994 · The Massachusetts Institute of Technology The field of artificial life has recently emerged through the interaction of research in biology, physics, parallel computing, artificial intelligence, and complex adaptive systems. The goal is to understand, through synthetic experiments, the organizational principles underlying the dynamics (usually the nonlinear dynamics) of living systems. This book brings together contributions to the Fourth Artificial Life Workshop, held at the Massachusetts Institute of Technology in the summer of 1994. Topics include: Self-organization and emergent functionality • Definitions of life • Origin of life • Self-reproduction • Computer viruses • Synthesis of "the living state." • Evolution and population genetics • Coevolution and ecological dynamics • Growth, development, and differentiation • Organization and behavior of social and colonial organisms • Animal behavior • Global and local ecosystems and their intersections • Autonomous agents (mobile robots and software agents) • Collective intelligence ("swarm" intelligence) • Theoretical biology • Philosophical issues in A-life (from ontology to ethics) • Formalisms and tools for A-life research • Guidelines and safeguards for the practice of A-life. Bradford Books imprint
Chapter
This book exemplifies the interplay between the general formal framework of graphical models and the exploration of new algorithm and architectures. The selections range from foundational papers of historical importance to results at the cutting edge of research. Graphical models use graphs to represent and manipulate joint probability distributions. They have their roots in artificial intelligence, statistics, and neural networks. The clean mathematical formalism of the graphical models framework makes it possible to understand a wide variety of network-based approaches to computation, and in particular to understand many neural network algorithms and architectures as instances of a broader probabilistic methodology. It also makes it possible to identify novel features of neural network algorithms and architectures and to extend them to more general graphical models.This book exemplifies the interplay between the general formal framework of graphical models and the exploration of new algorithms and architectures. The selections range from foundational papers of historical importance to results at the cutting edge of research. Contributors H. Attias, C. M. Bishop, B. J. Frey, Z. Ghahramani, D. Heckerman, G. E. Hinton, R. Hofmann, R. A. Jacobs, Michael I. Jordan, H. J. Kappen, A. Krogh, R. Neal, S. K. Riis, F. B. Rodríguez, L. K. Saul, Terrence J. Sejnowski, P. Smyth, M. E. Tipping, V. Tresp, Y. Weiss Bradford Books imprint
Chapter
Papers from the 2006 flagship meeting on neural computation, with contributions from physicists, neuroscientists, mathematicians, statisticians, and computer scientists. The annual Neural Information Processing Systems (NIPS) conference is the flagship meeting on neural computation and machine learning. It draws a diverse group of attendees—physicists, neuroscientists, mathematicians, statisticians, and computer scientists—interested in theoretical and applied aspects of modeling, simulating, and building neural-like or intelligent systems. The presentations are interdisciplinary, with contributions in algorithms, learning theory, cognitive science, neuroscience, brain imaging, vision, speech and signal processing, reinforcement learning, and applications. Only twenty-five percent of the papers submitted are accepted for presentation at NIPS, so the quality is exceptionally high. This volume contains the papers presented at the December 2006 meeting, held in Vancouver. Bradford Books imprint
Chapter
Papers from the 2006 flagship meeting on neural computation, with contributions from physicists, neuroscientists, mathematicians, statisticians, and computer scientists. The annual Neural Information Processing Systems (NIPS) conference is the flagship meeting on neural computation and machine learning. It draws a diverse group of attendees—physicists, neuroscientists, mathematicians, statisticians, and computer scientists—interested in theoretical and applied aspects of modeling, simulating, and building neural-like or intelligent systems. The presentations are interdisciplinary, with contributions in algorithms, learning theory, cognitive science, neuroscience, brain imaging, vision, speech and signal processing, reinforcement learning, and applications. Only twenty-five percent of the papers submitted are accepted for presentation at NIPS, so the quality is exceptionally high. This volume contains the papers presented at the December 2006 meeting, held in Vancouver. Bradford Books imprint
Chapter
The major purpose of this chapter is to further our understanding of optimal signal processing by integrating concepts from function approximation, linear and nonlinear regression, dynamic modeling, and delay operators using the concepts from approximations of causal shift-invariant processes. We seek an integrating view of all these subjects with an emphasis on the choice of the basis functions. We review proofs of the uniform approximation properties of dynamic networks created by a cascade of linear filters and static nonlinearities. We conclude by presenting a general class of linear operators that can implement the finite memory kernels required to approximate nonlinear operators with approximate finite memory.
Article
In this study, a new multi-layered Group Method of Data Handling (GMDH)-type neural network self-selecting optimum neural network architecture is proposed. We call this algorithm as revised GMDH-type neural network algorithm self-selecting optimum neural network architecture. Revised GMDH-type neural network algorithm has an ability of self-selecting optimum neural network architecture from three neural network architectures such as sigmoid function neural network, radial basis function (RBF) neural network and polynomial neural network. Revised GMDH-type neural network also have abilities of self-selecting the number of layers, the number of neurons in hidden layers and useful input variables. This algorithm is applied to medical image recognition and it is shown that this algorithm is useful for medical image recognition and is very easy to apply practical complex problem because optimum neural network architecture is automatically organized.
Book
The mathematical theory of computation has given rise to two important ap­ proaches to the informal notion of "complexity": Kolmogorov complexity, usu­ ally a complexity measure for a single object such as a string, a sequence etc., measures the amount of information necessary to describe the object. Compu­ tational complexity, usually a complexity measure for a set of objects, measures the compuational resources necessary to recognize or produce elements of the set. The relation between these two complexity measures has been considered for more than two decades, and may interesting and deep observations have been obtained. In March 1990, the Symposium on Theory and Application of Minimal­ Length Encoding was held at Stanford University as a part of the AAAI 1990 Spring Symposium Series. Some sessions of the symposium were dedicated to Kolmogorov complexity and its relations to the computational complexity the­ ory, and excellent expository talks were given there. Feeling that, due to the importance of the material, some way should be found to share these talks with researchers in the computer science community, I asked the speakers of those sessions to write survey papers based on their talks in the symposium. In response, five speakers from the sessions contributed the papers which appear in this book.
Conference Paper
Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible, wide and deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks.
Conference Paper
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.7 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a strong phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which beats the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Conference Paper
Can a large convolutional neural network trained for whole-image classification on ImageNet be coaxed into detecting objects in PASCAL? We show that the answer is yes, and that the resulting system is simple, scalable, and boosts mean average precision, relative to the venerable deformable part model, by more than 40% (achieving a final mAP of 48% on VOC 2007). Our framework combines powerful computer vision techniques for generating bottom-up region proposals with recent advances in learning high-capacity convolutional neural networks. We call the resulting system R-CNN: Regions with CNN features. The same framework is also competitive with state-of-the-art semantic segmentation methods, demonstrating its flexibility. Beyond these results, we execute a battery of experiments that provide insight into what the network learns to represent, revealing a rich hierarchy of discriminative and often semantically meaningful features.
Conference Paper
We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique. We empirically verify that the model successfully accomplishes both of these tasks. We use maxout and dropout to demonstrate state of the art classification performance on four benchmark datasets: MNIST, CIFAR-10, CIFAR-100, and SVHN.
Conference Paper
The key limiting factor in graphical model inference and learning is the complexity of the partition function. We thus ask the question: what are the most general conditions under which the partition function is tractable? The answer leads to a newkind of deep architecture, which we call sum-product networks (SPNs). SPNs are directed acyclic graphs with variables as leaves, sums and products as internal nodes, and weighted edges. We show that if an SPN is complete and consistent it represents the partition function and all marginals of some graphical model, and give semantics to its nodes. Essentially all tractable graphical models can be cast as SPNs, but SPNs are also strictly more general. We then propose learning algorithms for SPNs, based on backpropagation and EM. Experiments show that inference and learning with SPNs can be both faster and more accurate than with standard deep networks. For example, SPNs perform face completion better than state-of-the-art deep networks for this task. SPNs also have intriguing potential connections to the architecture of the cortex.
Conference Paper
We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks. Our generic tasks may differ significantly from the originally trained tasks and there may be insufficient labeled or unlabeled data to conventionally train or adapt a deep architecture to the new tasks. We investigate and visualize the semantic clustering of deep convolutional features with respect to a variety of such tasks, including scene recognition, domain adaptation, and fine-grained recognition challenges. We compare the efficacy of relying on various network levels to define a fixed feature, and report novel results that significantly outperform the state-of-the-art on several important vision challenges. We are releasing DeCAF, an open-source implementation of these deep convolutional activation features, along with all associated network parameters to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms.
Conference Paper
Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates backslashemphdeep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Conference Paper
We investigate the use of information from all second order derivatives of the error function to perform network pruning (i.e., removing unimportant weights from a trained network) in order to improve generalization, simplify networks, reduce hardware or storage requirements, increase the speed of further training, and in some cases enable rule extraction. Our method, Optimal Brain Surgeon (OBS), is Significantly better than magnitude-based methods and Optimal Brain Damage [Le Cun, Denker and Sol1a, 1990], which often remove the wrong weights. OBS permits the pruning of more weights than other methods (for the same error on the training set), and thus yields better generalization on test data. Crucial to OBS is a recursion relation for calculating the inverse Hessian matrix H^(-1) from training data and structural information of the net. OBS permits a 90%, a 76%, and a 62% reduction in weights over backpropagation with weigh decay on three benchmark MONK's problems [Thrun et aI., 1991]. Of OBS, Optimal Brain Damage, and magnitude-based methods, only OBS deletes the correct weights from a trained XOR network in every case. Finally, whereas Sejnowski and Rosenberg [1987J used 18,000 weights in their NETtalk network, we used OBS to prune a network to just 1560 weights, yielding better generalization.
Article
We describe a set of preliminary experiments to evolve spiking neural controllers for a vision-based mobile robot. All the evolutionary experiments are carried out on physical robots without human intervention. After discussing how to implement and interface these neurons with a physical robot, we show that evolution finds relatively quickly functional spiking controllers capable of navigating in irregularly textured environments without hitting obstacles using a very simple genetic encoding and fitness function. Neuroethological analysis of the network activity let us understand the functioning of evolved controllers and tell the relative importance of single neurons independently of their observed firing rate. Finally, a number of systematic lesion experiments indicate that evolved spiking controllers are very robust to synaptic strength decay that typically occurs in hardware implementations of spiking circuits.
Article
A generalized form of the cross‐validation criterion is applied to the choice and assessment of prediction using the data‐analytic concept of a prescription. The examples used to illustrate the application are drawn from the problem areas of univariate estimation, linear regression and analysis of variance.