Article

Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position

Authors:
  • Fuzzy Logic Systems Institute
To read the full-text of this research, you can request a copy directly from the author.

Abstract

A neural network model for a mechanism of visual pattern recognition is proposed in this paper. The network is self-organized by "learning without a teacher", and acquires an ability to recognize stimulus patterns based on the geometrical similarity (Gestalt) of their shapes without affected by their positions. This network is given a nickname "neocognitron". After completion of self-organization, the network has a structure similar to the hierarchy model of the visual nervous system proposed by Hubel and Wiesel. The network consists of an input layer (photoreceptor array) followed by a cascade connection of a number of modular structures, each of which is composed of two layers of cells connected in a cascade. The first layer of each module consists of "S-cells", which show characteristics similar to simple cells or lower order hypercomplex cells, and the second layer consists of "C-cells" similar to complex cells or higher order hypercomplex cells. The afferent synapses to each S-cell have plasticity and are modifiable. The network has an ability of unsupervised learning: We do not need any "teacher" during the process of self-organization, and it is only needed to present a set of stimulus patterns repeatedly to the input layer of the network. The network has been simulated on a digital computer. After repetitive presentation of a set of stimulus patterns, each stimulus pattern has become to elicit an output only from one of the C-cells of the last layer, and conversely, this C-cell has become selectively responsive only to that stimulus pattern. That is, none of the C-cells of the last layer responds to more than one stimulus pattern. The response of the C-cells of the last layer is not affected by the pattern's position at all. Neither is it affected by a small change in shape nor in size of the stimulus pattern.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... All plotted models performed excellent predictions. [205] 0.00355 0.01641 0.04278 -Regression Tree [206] 0.00349 0.01612 0.02535 -Random Forest [104] 0.00167 0.00772 0.01262 -SVM [207] 0.00148 0.00684 0.02107 -LS-SVM [208] 0.00063 0.00293 0.00532 -GBM [108] 0.00167 0.00772 0.01386 -XGBoost [109] 0.00355 0.01640 0.03281 -LGBM [110] 0.00243 0.01125 0.02212 -MLP [209] 0.00000 0.00000 0.00000 -CNN [210] 0.00146 0.00677 0.02083 -RNN [211] 0.00020 0.00092 0.00215 -LSTM [212] 0.00037 0.00170 0.00656 -GRU [213] 0.00035 0.00163 0.00500 -WaveNet [214] 0.00021 0.00097 0.00088 -eTS [59] 0.00000 0.00000 0.00000 5 Simpl_eTS [60] 0.00000 0.00001 0.00002 2 exTS [61] 0.00007 0.00031 0.00050 5 ePL [63] 0.00000 0.00000 0.00000 1 eMG [65] 0.00040 0.00184 0.00505 54 ePL+ [64] 0.00000 0.00000 0.00000 Table 18 presents the simulations' results for the Mackey-Glass time series. KNN obtained the lowest errors among the classical models, LSTM the lowest among the DL, ePL-KRLS-DISCO the lowest errors among all models, and RF-NTSK for the proposed fuzzy models. ...
... The results suggest that the models with lower errors are more suitable in predicting the long-term chaotic series under controlled uncertainty. [205] 0.00737 0.02873 0.00447 -Regression Tree [206] 0.02054 0.08007 0.00985 -Random Forest [104] 0.01064 0.04146 0.00666 -SVM [207] 0.06987 0.27232 0.06244 -LS-SVM [208] 0.09983 0.38908 0.08763 -GBM [108] 0.00870 0.03390 0.00526 -XGBoost [109] 0.06875 0.26795 0.05901 -LGBM [110] 0.01263 0.04924 0.00906 -MLP [209] 0.04539 0.17692 0.03147 -CNN [210] 0.02210 0.08614 0.01712 -RNN [211] 0.01872 0.07296 0.01432 -LSTM [212] 0.01237 0.04820 0.00977 -GRU [213] 0.01279 0.04983 0.01019 -WaveNet [214] 0.01695 0.06607 0.01356 -eTS [59] 0 Table 20 presents the simulations' results for the Nonlinear time series. KNN, RT, RF, and GBM obtained the lowest errors for the classical models, RNN for DL, Simpl_eTS and ePL-KRLS-DISCO for eFSs, and RF-NTSK for the proposed models. ...
... Table 21 supports that RF-NTSK performs better predictions than all models but KNN, RT, RF, and LGBM. Figure 23 presents the best predictions of each class for the Nonlinear time series. [205] 0.00000 0.00000 0.00000 -Regression Tree [206] 0.00000 0.00000 0.00000 -Random Forest [104] 0.00000 0.00000 0.00000 -SVM [207] 0.02038 0.05866 0.34041 -LS-SVM [208] 0.01539 0.04430 0.41073 -GBM [108] 0.00000 0.00000 0.00000 -XGBoost [109] 0.00412 0.01186 0.04387 -LGBM [110] 0.00560 0.01612 0.01342 -MLP [209] 0.00110 0.00317 0.00404 -CNN [210] 0.00100 0.00288 0.00450 -RNN [211] 0.00087 0.00252 0.00273 -LSTM [212] 0.00137 0.00396 0.00701 -GRU [213] 0.00280 0.00805 0.03260 -WaveNet [214] 0.00092 0.00264 0.00574 -eTS [59] 0 Table 22 presents the simulations' results for the Alice 1A time series. SVM and LS-SVM obtained the lowest errors among the classical models, LSTM the lowest among DL models, eTS and ePL+ the lowest for the eFSs, and GEN-NTSK (wRLS) and R-NTSK for the proposed fuzzy models. ...
Thesis
Full-text available
Fuzzy inference systems, widely studied in the literature, are machine learning models that balance accuracy and interpretability/explainability. The two main types of fuzzy inference systems are Mamdani and Takagi-Sugeno-Kang. While Mamdani models prioritize interpretability, Takagi-Sugeno-Kang models achieve higher accuracy by approximating nonlinear systems through a collection of linear subsystems. However, there is no standardized method for design fuzzy rules. Existing techniques often suffer from limitations, including a lack of direct control over the number of rules, an excessive number of hyperparameters, and increased complexity. To address these issues, this work introduces new fuzzy inference systems that comprises a novel data-driven mechanism to define Mamdani and Takagi-Sugeno-Kang rules while reducing complexity, minimizing hyperparameters, enabling direct control over the number of rules, and enhancing interpretability. Additionally, feature selection techniques, including genetic algorithms and ensemble methods, are incorporated to improve the models' ability to handle large datasets, optimize performance, increase interpretability, and prevent overfitting. The proposed models are evaluated using benchmark time series, renewable energy, financial, and cryptocurrency datasets. Their performance is compared against state-of-the-art machine learning models, including classical approaches, deep learning architectures, and rule-based evolving fuzzy systems. The evaluation considers both error metrics and the final number of rules. The results indicate that the proposed models effectively handle complex, non-stationary datasets, such as those in finance and cryptocurrency. All proposed models are available as a Python package, which can be installed via pip: pip install nfisis (https://pypi.org/project/nfisis/0.0.4/).
... In 1979, Fukushima (1979) proposed a neural network called neocognitron that mimics the vision mechanism of animals for pattern recognition. This model was motivated by an experiment conducted by (Hubel and Wiesel, 1962). ...
... Journal Pre-proof Introduction to CNNs and their architecture CNN is a type of feedforward neural network that was inspired by a virtual cortex-based neural model proposed by (Fukushima, 1979). This model was designed to mimic human vision for pattern recognition and gave rise to the convolution and downsampling methods used in CNN. ...
Article
Full-text available
Deep learning stands at the forefront of contemporary machine learning techniques and is well-known for its outstanding predictive accuracy, adaptability to data variability, and remarkable ability to generalize across diverse domains. These attributes have spurred rapid progress and the emergence of novel iterations within the discipline. Yet, this swift evolution often obscures the foundational breakthroughs, with even trailblazing researchers at risk of fading into obscurity despite their seminal contributions. This study aims to provide a historical narrative of deep learning, tracing its origins from the cybernetic era to its current state-of-the-art status. We critically examine the contributions of individual pioneer scholars who have profoundly influenced the development of deep neural networks under the taxonomy of supervised, unsupervised, and reinforcement learning. Furthermore, the study also discusses the trending deep neural network architectures, explaining their operational principles, confronting associated challenges, exploring real-world applications, and outlining potential future trajectories that could offer a starting point for aspiring researchers in the field.
... The Convolutional Neural Network (CNN), one of the most representative deep learning models, evolved from the Neocognitron proposed by Fukushima [47] and has since given rise to numerous variants. A typical CNN architecture consists of an input layer, alternating convolutional and pooling layers, one or more fully connected layers, activation functions, and an output layer [48]. ...
... A typical CNN architecture consists of an input layer, alternating convolutional and pooling layers, one or more fully connected layers, activation functions, and an output layer [48]. CNNs are capable of recognizing stimulus patterns with robustness to minor variations in position or shape [47], and they have been widely applied in pattern recognition tasks with high accuracy [49,50]. In aquaculture production, the underwater environment is often complex, which significantly hinders the extraction of image features such as texture and shape [51]. ...
Article
Full-text available
With the rising global demand for aquatic products, aquaculture has become a cornerstone of food security and sustainability. This review comprehensively analyzes the application of deep learning in sustainable aquaculture, covering key areas such as fish detection and counting, growth prediction and health monitoring, intelligent feeding systems, water quality forecasting, and behavioral and stress analysis. The study discusses the suitability of deep learning architectures, including CNNs, RNNs, GANs, Transformers, and MobileNet, under complex aquatic environments characterized by poor image quality and severe occlusion. It highlights ongoing challenges related to data scarcity, real-time performance, model generalization, and cross-domain adaptability. Looking forward, the paper outlines future research directions including multimodal data fusion, edge computing, lightweight model design, synthetic data generation, and digital twin-based virtual farming platforms. Deep learning is poised to drive aquaculture toward greater intelligence, efficiency, and sustainability.
... With recent advancements in deep learning (DL), DL architectures have emerged as effective tools for designing robust high-level controllers based on sEMG signals [6]. As demonstrated in our previous work [6], Convolutional Neural Networks (CNNs) [7] are particularly well-suited for this purpose, due to their ability to automatically extract meaningful features from raw sensory data. Although CNNs are traditionally used for processing two-dimensional data such as images, they have also proven highly effective for analyzing one-dimensional signals, including sEMG. ...
Preprint
Full-text available
Accurate classification of lower limb movements using surface electromyography (sEMG) signals plays a crucial role in assistive robotics and rehabilitation systems. In this study, we present a lightweight attention-based deep neural network (DNN) for real-time movement classification using multi-channel sEMG data from the publicly available BASAN dataset. The proposed model consists of only 62,876 parameters and is designed without the need for computationally expensive preprocessing, making it suitable for real-time deployment. We employed a leave-oneout validation strategy to ensure generalizability across subjects, and evaluated the model on three movement classes: walking, standing with knee flexion, and sitting with knee extension. The network achieved 86.74% accuracy on the validation set and 85.38% on the test set, demonstrating strong classification performance under realistic conditions. Comparative analysis with existing models in the literature highlights the efficiency and effectiveness of our approach, especially in scenarios where computational cost and real-time response are critical. The results indicate that the proposed model is a promising candidate for integration into upper-level controllers in human-robot interaction systems.
... 1D CNN, a derivative of the widely used 2D CNN (Fukushima 1980), specializes in extracting features from sequential data like time series. Unlike their 2D counterpart, 1D CNNs handle one-dimensional data through 1D convolutions, making them ideal for real-time applications involving sensor signals, especially when data is limited (Shahid et al. 2022). ...
Article
Construction workers frequently face the risk of adopting awkward work postures, which can lead to work-related musculoskeletal disorders. Many existing solutions using wearable sensors suffer from intrusiveness and the need for multiple sensor attachments. This study proposes a novel method for automatic recognition of awkward postures using wristband biosensors and deep learning algorithms. Physiological data from ten subjects were collected, processed, and used to train the models. Long short-term memory (LSTM), bidirectional long short-term memory (Bi-LSTM), and one-dimensional convolutional neural network (1D-CNN) models were compared. The Bi-LSTM model achieved the highest accuracy at 95.09%, followed by the LSTM model with 91.49%, and the 1D-CNN model with 89.50%. The study also conducted a comprehensive analysis of the impact of diverse signal combinations and time windows on posture recognition, providing valuable insights. The findings expand the use of physiological signals for safety enhancement, specifically in recognizing awkward postures. This study contributes to wearable sensor-based posture recognition, ultimately enhancing the health and safety of construction workers.
... CNN is a type of deep learning neural network introduced in 1980 (Fukushima, 1980). CNN is a powerful technology for image classification and its architecture consists of different layers with the core building layers including convolutional layer, pooling layer, and fully connected layer. ...
Conference Paper
The traditional methods for obtaining pavement condition data include manual and semiautomated surveys, which involve significant human intervention, are time-consuming, and variable in repeatability and accuracy. Various initiatives have often underestimated the complexities of developing fully automated pavement feature detection tools, primarily due to the challenge of replicating human cognitive abilities in computational processes. Recent advancements in deep learning (DL) provide promising solutions for fully automated condition evaluation. While DL has proven effective for detecting pavement cracking, simultaneous detection of other feature types, such as potholes, patching, and sealed cracks, remains limited. This study utilizes 3D laser imaging data to extract scale- and mode-specific features for multi-object DL detection across various pavement features. A diverse, high-resolution, manually annotated image library labeled with multiple feature categories was created for training, accompanied by intelligent data augmentation methods. Both traditional Convolutional Neural Networks (CNN) and the new generation Transformer architecture were developed for pixel-level multi-object semantic detection of various pavement surface features. CNN model was developed using the traditional supervised learning technique, and the transformer model was developed using the transfer learning technique. Model performance was assessed and compared at pixel accuracy. This multi-object detection DL approach has the potential to revolutionize pavement condition surveys by enabling swift, one-pass assessments of multiple pavement features at highway speeds.
... For change detection, the extraction and processing of image features have a significant impact on the overall detection performance of the model. Initially, with the introduction of CNNs [4], [5], most models adopted a twin structure based on CNNs to extract features [6], [7], often using stacked convolutional modules to enrich the extracted features. However, such methods often lead to complex model structures. ...
Article
Full-text available
Change detection in very high-resolution (VHR) remote sensing images has gained significant attention, particularly with the rise of deep learning techniques like CNNs and Transformers. The Mamba structure, successful in computer vision, has been applied to this domain, enhancing computational efficiency. However, much of the research focuses on improving global modeling, neglecting the role of local information crucial for change detection. Moreover, there remains a gap in understanding which structural modifications are more suited for the change detection task. This paper investigates the impact of different scanning mechanisms within Mamba, evaluating five mainstream methods to optimize its performance in change detection. We propose LBCDMamba, a novel architecture based on our proposed LocalGlobal Selective Scan Module, which effectively integrates global and local information through a unified scanning strategy. To address the lack of fine-grained details in current models, we propose a Multi-Branch Patch Attention module, which captures both local and global features by partitioning data into smaller patches. Additionally, a Bi-Temporal Feature Fusion module is proposed to fuse bi-temporal features, improving temporalspatial feature representation. Extensive experiments on three benchmark datasets demonstrate the superior performance of LBCDMamba outperforms existing popular methods in change detection tasks. This work also provides new insights into optimizing Mamba for change detection, with potential applications across remote sensing and related fields.
... This study introduces a lightweight CNN architecture specifically designed for NTC. While CNNs are widely known for their powerful feature extraction capabilities in applications like object detection and image classification [45], their computational demand can pose challenges in resourceconstrained environments, such as real-time traffic classification. To address this, we leverage separable convolution, introduced by Sifre and Mallat [46], to minimize the model's parameters while preserving high performance. ...
Article
Full-text available
With the rapid growth of internet usage and the increasing number of connected devices, there is a critical need for advanced Network Traffic Classification (NTC) solutions to ensure optimal performance and robust security. Traditional NTC methods, such as port-based analysis and deep packet inspection, struggle to cope with modern network complexities, particularly dynamic port allocation and encrypted traffic. Recently, Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) have been employed to develop classification models to accomplish this task. Existing models for NTC often require significant computational resources due to their large number of parameters, leading to slower inference times and higher memory consumption. To overcome these limitations, we introduce a lightweight NTC model based on Depthwise Separable Convolutions and compare its performance against CNN, RNN, and state-of-the-art models. In terms of computational efficiency, our proposed lightweight CNN exhibits a markedly reduced computational footprint. It utilizes only 30,611 parameters and 0.627 MFLOPS, achieving inference times of 1.49 seconds on the CPU and 0.43 seconds on the GPU. This corresponds to roughly 4× fewer FLOPS than the RNN baseline and 16× fewer than the CNN baseline, while also offering an ultracompact design compared to state-of-the-art models. Such efficiency makes it exceptionally well-suited for real-time applications in resource-constrained environments. In addition, we have integrated eXplainable Artificial Intelligence techniques, specifically LIME and SHAP, to provide valuable insights into model predictions. LIME and SHAP help interpret the contribution of each feature in decision-making, enhancing the transparency and trust in the model's predictions, without compromising its lightweight nature. To support reproducibility and foster collaborative development, all associated code and resources have been made publicly available.
... The CNN architecture comprised four convolutional layers [78] with Rectified Linear Unit (ReLU) activations [79] and max-pooling [80] followed by a fully connected layer [81] and dropout regularization (rate = 0.2) [82]. The final classification layer used a sigmoid activation function [83] to produce class probabilities corresponding to the number of ripen-ing stages. ...
Article
Full-text available
This study presents a non-invasive approach to monitoring post-harvest fruit quality by applying CO2 laser photoacoustic spectroscopy (CO2LPAS) to study the respiration of “Conference” pears from local and commercially stored (supermarket) sources. Concentrations of ethylene (C2H4), ethanol (C2H6O), and ammonia (NH3) were continuously monitored under shelf-life conditions. Our results reveal that ethylene emission peaks earlier in supermarket pears, likely due to post-harvest treatments, while ethanol accumulates over time, indicating fermentation-related deterioration. Significantly, ammonia levels increased during the late stages of senescence, suggesting its potential role as a novel biomarker for fruit degradation. The application of CO2LPAS enabled highly sensitive, real-time detection of trace gases without damaging the fruit, offering a powerful alternative to traditional monitoring methods. Additionally, artificial intelligence (AI) models, particularly convolutional neural networks (CNNs), were explored to enhance data interpretation, enabling early detection of ripening and spoilage patterns through volatile compound profiling. This study advances our understanding of post-harvest physiological processes and proposes new strategies for improving storage and distribution practices for climacteric fruits.
... Discriminative and generative models of the brain. Computational models of cortical processing can be roughly categorized into discriminative, such as those that use a feedforward neural network to filter visual inputs [22,23], and generative, which are rooted in the Bayesian Brain hypothesis [24,25] and instantiated by theories like predictive coding [10,11]. However, increasing evidence suggests that the brain takes a combined approach [26,3,4]. ...
Preprint
Full-text available
Predictive coding (PC) is an influential computational model of visual learning and inference in the brain. Classical PC was proposed as a top-down generative model, where the brain actively predicts upcoming visual inputs, and inference minimises the prediction errors. Recent studies have also shown that PC can be formulated as a discriminative model, where sensory inputs predict neural activities in a feedforward manner. However, experimental evidence suggests that the brain employs both generative and discriminative inference, while unidirectional PC models show degraded performance in tasks requiring bidirectional processing. In this work, we propose bidirectional PC (bPC), a PC model that incorporates both generative and discriminative inference while maintaining a biologically plausible circuit implementation. We show that bPC matches or outperforms unidirectional models in their specialised generative or discriminative tasks, by developing an energy landscape that simultaneously suits both tasks. We also demonstrate bPC's superior performance in two biologically relevant tasks including multimodal learning and inference with missing information, suggesting that bPC resembles biological visual inference more closely.
... Each specimen's chromosome represents a convolutional neural network (CNN) architecture. CNNs, inspired by the mammalian visual cortex and developed by researchers like Fukushima [13], and projects like LeNet-5 [14], process multidimensional data while preserving shape information. They consist of convolutional and pooling layers. ...
... Since the advent of the first multilayer neural network, the Neocognitron [1], CNN (convolutional neural network) architectures have been successfully applied in numerous fields requiring image processing and classification. AlexNet [2], which won the ImageNet Large-Scale Visual Recognition Challenge in 2012, has become an important milestone in the rapid development of CNN architectures [3]. ...
Article
Full-text available
Existing CNN architectures have been developed to train digital image datasets obtained from hardware systems operating with classical bits, such as optical cameras. With the increase of quantum computing algorithms and quantum system providers, academic research is being conducted to combine the strengths of classical computing and quantum algorithms. This fusion allows for the development of hybrid quantum systems, with proposed methods specifically for the quantum representation of digital images. While methods for transforming digital images into quantum-compatible circuits have been proposed, no study has been found on the quantum transformation of entire datasets, especially for the use of fully classical CNN architectures. This article presents the quantum image dataset transform method, which utilizes quantum circuits to transform digital images and create a new dataset of the transformed images. Each of the 10,000 digital images of 28 ×\times 28 dimensions in the MNIST handwritten digits dataset is individually sub-parts, and the common weight values of each segment are determined as the phase value to be used in the quantum circuit. The quantum outputs of each sub-part are converted into classical equivalents by creating a quantum converter, and a new digital image is obtained by combining all the sub-parts. The newly generated digital images are labeled as MNISTQimage+{\textbf {MNIST}} {\textbf {Q}}^{{\textbf {+}}}_{{{\textbf {image}}}} and are publicly shared along with the original MNIST dataset. The paper evaluates both a custom 3-layer CNN architecture and several pre-trained models, including EfficientNetV2B3, ResNet-50, DenseNet-121, and ConvNeXt Tiny. After training for 30 epochs, the 3-layer CNN architecture achieved the highest accuracy of 99.23%, significantly outperforming the pre-trained models, with DenseNet-121 achieving 81.70%, EfficientNetV2B3 64.23%, ResNet-50 53.25%, and ConvNeXt Tiny 53.41%. The results highlight the superior performance of the 3-layer CNN in adapting to the quantum-transformed dataset and demonstrate the potential of quantum transformations to enhance the learning ability of classical CNN models. This foundational research aims to pave the way for further exploration into the integration of quantum-transformed datasets in classical deep learning frameworks.
... The CNN (convolutional neural network) was proposed by Kunihiko Fukushima in 1980, who introduced the concept of hierarchical structure and local connectivity, and something that is good at dealing with data with a grid structure [16]. Yann LeCun et al. in 1998 described in detail the structure of convolutional neural networks (CNNs) and the application of handwritten digit recognition, which contains convolutional, pooling, and fully connected layers, etc., as well as how to use the backpropagation algorithm for training and prediction [17]. ...
Article
Full-text available
In the context of new quality productive forces, enterprises must leverage technological innovation and intelligent management to enhance financial risk resilience. This article proposes a financial distress prediction model based on deep learning, combined with a CNN, BiLSTM, and attention mechanism, using SMOTE for sample imbalance and Hyperband for hyperparameter optimization. Among four CNN-BiLSTM-AT model structures and seven mainstream models (CNN, BiLSTM, CNN-BiLSTM, CNN-AT, BiLSTM-AT, CNN-GRU, and Transformer), the 1CNN-1BiLSTM-AT model achieved the highest validation accuracy and relatively faster training speed. We conducted 100 repeated experiments using data from two companies, with validation on 2025 data, confirming the model’s stability and effectiveness in real-world scenarios. This article lays a solid empirical foundation for further optimization of financial distress warning models.
... Later, Espeholt et al. (2018) proposed using ResNet based architecture which demonstrates significant performance improvements over the original CNN architecture. For pixel-based environments, this family of networks consists of a set of convolutional layers (Fukushima, 1980), which we will collectively refer to as (and often referred to as the encoder or representation), followed by a set of dense layers, which we will collectively refer to as . Thus given an input , the network approximates the -values as˜( , ·) = ( ( )). ...
Preprint
Full-text available
Scaling deep reinforcement learning in pixel-based environments presents a significant challenge, often resulting in diminished performance. While recent works have proposed algorithmic and architectural approaches to address this, the underlying cause of the performance drop remains unclear. In this paper, we identify the connection between the output of the encoder (a stack of convolutional layers) and the ensuing dense layers as the main underlying factor limiting scaling capabilities; we denote this connection as the bottleneck, and we demonstrate that previous approaches implicitly target this bottleneck. As a result of our analyses, we present global average pooling as a simple yet effective way of targeting the bottleneck, thereby avoiding the complexity of earlier approaches.
... Convolu�onal neural networks (CNNs) (Fukushima, 1980;LeCun et al., 1998) long served as the main architecture for visual processing tasks and are widely used for classifica�on and predic�on based on magne�c resonance imaging (MRI) data. However, transformer models u�lizing the aten�on mechanism (Bahdanau et al., 2014) and introduced by Vaswani et al. (2017) have recently revolu�onized the field of natural language processing (NLP) and have also proven valuable for vision tasks involving image data (Dosovitskiy et al., 2020). ...
Preprint
Full-text available
Convolutional neural networks (CNNs) have been the standard for computer vision tasks and are frequently applied in medical conditions, such as in Alzheimer’s disease (AD). Recently, Vision Transformers (ViTs) have been introduced, which provide a strong alternative to CNNs by discarding the convolution approach in favor of the attention mechanism. This allows modeling global and distant relationships within distinct parts of an image without relying on the strong inductive biases present in CNNs. A common precursor stage of AD is a syndrome called mild cognitive impairment (MCI). However, not all individuals diagnosed with MCI progress to AD. The establishment of reliable classification models that predict converters versus non-converters would be a valuable tool to support clinical decision-making, such as enabling early treatment. Hence, in this investigation a transfer learning approach was used by applying a pretrained ViT model, fine-tuned on the ADNI dataset comprising 575 subjects with MCI. We included baseline T1-weighted structural MRI data from 299 stable MCI and 276 progressive MCI individuals, who developed Alzheimer’s disease within 36 months. Inputs to the model were three normalized axial slices covering areas of the hippocampal region, consisting of the combined gray and white matter segmentations. The final model was evaluated over multiple runs to obtain stable performance estimates, yielding an average area under the receiver operating characteristic curve (AUC-ROC) on the test set of 0.74 ± 0.02 (mean ± SD), an accuracy of 0.69 ± 0.03, a sensitivity of 0.65 ± 0.07, a specificity of 0.72 ± 0.06, and a F1-score for the pMCI class of 0.67 ± 0.04. By specifically focusing on axial slices covering the hippocampal region, we aimed to target the brain structure often reported as being the first affected by the disease, while our results indicate that a ViT approach achieves reasonable classification accuracy for predicting the conversion from MCI to AD.
... The domain shift experiments with the classification network necessitated careful architectural design. The first section of the architecture was comprised a convolutional neural network [50] in the form of a modular residual network [51] subnetwork in the lean ResNet18 variant. This was followed by a fully connected feedforward (FCNN) head for binary classification. ...
Preprint
Full-text available
Rapid and reliable automated identification of wood species can be a boon for applications across wood scientific context including forestry and biodiversity conservation, as well as in an industrial context via requirements for timber trade regulations. However, robust machine learning classifiers must be properly analyzed and immunized against domain shift effects. These can degrade the automated system performance for input data variations occurring in many scenarios. This work analyses the domain shift generated by using two differing sub-micro-scale and micro-scale computed tomography setups in the context of deep learning based binary wood classification from volumetric image data. Further, we examine several mitigation strategies and propose data- and model-level intertwined strategies to effectively minimize the performance domain gap. Core elements of the strategy include the combined usage of phase-correction methods, low-pass pyramid representation of the data and model normalization and regularization approaches. Vanishing domain performance differences led to the conclusion that the combined strategy ultimately prompted the model to learn robust features. These features are discriminative for input data from both sub-micro-system and micro-system domains, despite the substantial differences in data acquisition setup that propagate into fundamental image quality metrics like resolution, contrast and signal-to-noise ratio.
... Vision backbones Since the pioneering work of Neocognitron [22] and LeNet [23], Convolutional Neural Networks (CNNs) have been propelling the advancements in data-driven computer vision. These networks typically employ a hierarchy of convolutional layers that apply a set of learnable filters to the input feature map, which are alternated with feature downsampling operations, yielding hiearchy of multi-scale feature maps. ...
Preprint
Uniform downsampling remains the de facto standard for reducing spatial resolution in vision backbones. In this work, we propose an alternative design built around a content-aware spatial grouping layer, that dynamically assigns tokens to a reduced set based on image boundaries and their semantic content. Stacking our grouping layer across consecutive backbone stages results in hierarchical segmentation that arises natively in the feature extraction process, resulting in our coined Native Segmentation Vision Transformer. We show that a careful design of our architecture enables the emergence of strong segmentation masks solely from grouping layers, that is, without additional segmentation-specific heads. This sets the foundation for a new paradigm of native, backbone-level segmentation, which enables strong zero-shot results without mask supervision, as well as a minimal and efficient standalone model design for downstream segmentation tasks. Our project page is https://research.nvidia.com/labs/dvl/projects/native-segmentation.
... Similar to RNNs, convolutional neural networks (CNNs) can also make use of data from previous time steps. While a CNN is mostly applied to image data or to other multidimensional datasets for pattern detection using trained filters (Fukushima, 1980), we used its 1D equivalent for time series analysis (Kiranyaz et al., 2021). Similarly to the RNN, a reduced set of architectural properties (i.e. between two and five layers with sizes between 32 and 128) has been assessed, while the rest of the tunable parameters were kept identical to the other networks. ...
Article
Full-text available
Weather types are used to characterise large-scale synoptic weather patterns over a region. Long-standing records of weather types hold important information about day-to-day variability and changes in atmospheric circulation and the associated effects on the surface. However, most weather type reconstructions are restricted in their temporal extent and suffer from methodological limitations. In our study, we assess various machine learning approaches for station-based weather type reconstruction over Europe based on the nine-class cluster analysis of principal components (CAP9) weather type classification. With a common feedforward neural network performing best in this model comparison, we reconstruct a daily CAP9 weather type series back to 1728. This new reconstruction constitutes the longest daily weather type series available. Detailed validation shows considerably better performance compared to previous statistical approaches and good agreement with the reference series for various climatological analyses. Our approach may serve as a guide for other weather type classifications.
... The dense layers at the end serve the purpose of deducing from the analyzed images the desired effect, whether it be classification, noise removal, etc. [2,247]. There is a biological inspiration for this choice, but it is beyond the scope of this work and can be found in [94]. ...
Preprint
Full-text available
Machine learning techniques have emerged as powerful tools to tackle various challenges. The integration of machine learning methods with Physics has led to innovative approaches in understanding, controlling, and simulating physical phenomena. This article aims to provide a practical introduction to neural network and their basic concepts. It presents some perspectives on recent advances at the intersection of machine learning models with physical systems. We introduce practical material to guide the reader in taking their first steps in applying neural network to Physics problems. As an illustrative example, we provide four applications of increasing complexity for the problem of a simple pendulum, namely: parameter fitting of the pendulum's ODE for the small-angle approximation; Application of Physics-Inspired Neural Networks (PINNs) to find solutions of the pendulum's ODE in the small-angle regime; Autoencoders applied to an image dataset of the pendulum's oscillations for estimating the dimensionality of the parameter space in this physical system; and the use of Sparse Identification of Non-Linear Dynamics (SINDy) architectures for model discovery and analytical expressions for the nonlinear pendulum problem (large angles).
... So far, we have considered neural networks with a fully-connected architecture. In this section, we instead discuss convolutional networks-a classical architecture originated in computer vision [10,17]. We start by recalling the basic definitions. ...
Preprint
Full-text available
Deep neural networks often infer sparse representations, converging to a subnetwork during the learning process. In this work, we theoretically analyze subnetworks and their bias through the lens of algebraic geometry. We consider fully-connected networks with polynomial activation functions, and focus on the geometry of the function space they parametrize, often referred to as neuromanifold. First, we compute the dimension of the subspace of the neuromanifold parametrized by subnetworks. Second, we show that this subspace is singular. Third, we argue that such singularities often correspond to critical points of the training dynamics. Lastly, we discuss convolutional networks, for which subnetworks and singularities are similarly related, but the bias does not arise.
... The next breakthrough on multilayer networks was proposed by Kunihiko Fukushima, who described his new network architecture in 1980 in "Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position" (Fukushima, 1980). His network, dubbed Neocognitron, was largely inspired by advances in neurobiology during the 1960s, when it was discovered that neurons from different phases of the human visual system responded to different stimulus patterns. ...
Chapter
Full-text available
Artificial intelligence (AI) as a field of study has grown exponentially in recent years, as has applications in a multitude of fields. Within it, deep learning techniques have shown truly impressive results on all kinds of pattern recognition tasks during the last decade, and their impact will only continue to grow. To understand the virtues and weaknesses of this technology, it may be beneficial to look back at the technological advances within the field of AI and, more specifically, of computer vision and neural networks, where the ideas that gave rise to what is now known as deep learning were first developed. That is why the main objective of this chapter is to give as complete a vision as possible of the history of this technology, throughout its nearly 80 years of development. By reviewing the scientific publications that brought progress in this area of research in chronological order, this chapter will elucidate the origins of the main components of neural networks and will highlight the main contributors in this field, as well as the factors that resulted in setbacks and advances.
Chapter
The transformative journey of artificial intelligence (AI) in software development, from its foundational stages of rule-based systems and symbolic reasoning to the emergence of machine learning paradigms and the revolutionary impact of generative AI has redefined how software is designed, developed, and deployed. By analyzing the integration of AI technologies within the Software Development Life Cycle (SDLC) key milestones including the rise of neural networks, natural language processing (NLP), and transformer-based models such as GPT and Codex have been discussed. These advancements have automated complex tasks, enhanced coding precision, and redefined productivity in software engineering. While addressing the challenges of ethical considerations, data dependency, and accessibility, it also explores the opportunities for future intelligent systems to drive innovation, efficiency, and creativity. The findings underscore AI's pivotal role in reshaping software engineering, fostering a collaborative dynamic between human ingenuity and machine intelligence.
Article
Advances in neural networks have enabled unmanned aerial vehicles (UAVs) to detect and recognize objects in real time, which has facilitated the use of UAVs autonomously in a variety of scenarios, including fire detection in emergency situations. The paper reviews a number of existing neural network-based detection algorithms, including convolutional neural networks, regional convolutional neural networks and their variants, deep neural networks with convolutional long short-term memory (ConvLSTM), methods integrating deep learning with correlation filtering through self-training, Siamese neural networks for target tracking, and the YOLO (You Only Look Once) family of algorithms. The main characteristics and differences between neural network algorithms are described, and a comparison of their performance in terms of mean average precision (mAP) and frame rate per second (FPS) is given. The conclusions of the article provide insight into the trade-offs between accuracy, speed and task-specific requirements in detection tasks, which allows one to make an informed choice on the use of one or another algorithm.
Preprint
Full-text available
Evolving Fuzzy Systems (eFS) have gained significant attention due to their ability to adaptively update their structure in response to data dynamics while maintaining interpretability. However, the lack of publicly available implementations of these models limits their accessibility and widespread adoption. To address this gap, we present evolvingfuzzysystems, a Python library that provides implementations of several well-established eFS models, including ePL-KRLS-DISCO, ePL+, eMG, ePL, exTS, Simpl\_eTS, and eTS. The library facilitates model evaluation and comparison by offering built-in tools for training, visualization, and performance assessment. The models are evaluated using the fetch\_california\_housing dataset, with performance measured in terms of normalized root-mean-square error (NRMSE), non-dimensional error index (NDEI), and mean absolute percentage error (MAPE). Additionally, computational complexity is analyzed by measuring execution times and rule evolution during training and testing phases. The results highlight ePL as a simple yet efficient model that balances accuracy and computational cost, making it particularly suitable for real-world applications. By making these models publicly available, evolvingfuzzysystems aims to foster research and practical applications in adaptive and interpretable machine learning.
Chapter
The integration of Neural Network Technologies (NNTs) and Brain-Computer Interfaces (BCIs) has transformed human-machine interaction, enabling direct brain-device communication. BCIs empower individuals with neurological disorders, while neural networks enhance signal processing, pattern recognition, and adaptive learning. Advances in deep learning, including CNNs, RNNs, GANs, and Federated Learning, have improved BCI accuracy in motor imagery, authentication, and neurofeedback. Hybrid BCIs combining multiple neural signals enhance adaptability and real-world application. However, challenges remain in ethics, data privacy, hardware, and personalized AI. Future AI-powered BCIs, brain-to-brain communication, and non-invasive neurostimulation promise breakthroughs in cognition, rehabilitation, and augmentation. This chapter explores key principles, innovations, applications, challenges, and future directions in AI-driven neuroscience.
Article
Artificial intelligence (AI) and machine learning (ML) are rapidly transforming clinical decision support systems (CDSSs) in intensive care units (ICUs), where vast amounts of real-time data present both an opportunity and a challenge for timely clinical decision-making. Here, we trace the evolution of machine intelligence in critical care. This technology has been applied across key ICU domains such as early warning systems, sepsis management, mechanical ventilation, and diagnostic support. We highlight a transition from rule-based systems to more sophisticated machine learning approaches, including emerging frontier models. While these tools demonstrate strong potential to improve predictive performance and workflow efficiency, their implementation remains constrained by concerns around transparency, workflow integration, bias, and regulatory challenges. Ensuring the safe, effective, and ethical use of AI in intensive care will depend on validated, human-centered systems supported by transdisciplinary collaboration, technological literacy, prospective evaluation, and continuous monitoring.
Article
With the development of high‐performance computing and advanced experimental methods, data‐driven machine learning, especially neural network technology, has shown great potential in fluid mechanics research and has become a fourth paradigm research tool. In particular, remarkable achievements have been made in turbulence modeling, near‐wall flow prediction, and combustion dynamic evolution. Researchers use neural network model to assist turbulence control, improve Reynolds average turbulence model, and harnesses the deep learning method to solve the problem of complex flow phenomenon prediction driven by large‐scale data, which effectively improves the accuracy and efficiency of internal flow and wall effect simulation of supersonic combustion ramjet (scramjet) engine. These studies not only promote the development of fluid mechanics but also provide strong support for the design optimization of scramjet engines.
Article
The identification of compound mechanical faults in bearings from vibration signals is challenging due to the concurrence and coupling of different fault types and the exponentially growing number of possible fault modes. Existing AI-based models can extract fault features when there is a large number of labelled compound fault samples, but this is impractical in industrial scenarios. To address this gap, we propose a mechanism-constrained decomposition diffusion network (McDDN) framework tailored for compound bearing fault diagnosis in rotating machinery, which only requires labeled single-fault samples and unlabeled compound fault signals for training. The framework integrates a mechanism-constrained decomposition UNet (McD-UNet) into the diffusion process, leveraging the feature mode decomposition (FMD) principle as a physical constraint via a specially designed training loss. This allows the decomposition of compound fault signals into interpretable single-fault components, which are then diagnosed using a pre-trained single-fault classifier. Experimental validation on the PU bearing dataset and BJTU-RAO industrial dataset demonstrates that McDDN achieves high diagnostic accuracy for bearing compound faults, outperforming state-of-the-art methods in both closed-set and cross-domain scenarios. Rigorous analyses, including model interpretability, ablation studies, and hyperparameter sensitivity tests, validate the robustness and stability of the proposed approach. While focused on bearings, the framework provides a generalizable paradigm for compound fault diagnosis of other rotating machinery components by incorporating component-specific physical constraints, offering new insights for intelligent maintenance systems in industrial applications.
Article
Full-text available
This paper compares the performance of five different supervised machine learning (SL) algorithms to identify the practical model for tracking copper using solution cathode glow discharge optical emission spectroscopy (SCGD-OES). A total of 4500 SCGD-OES spectra were collected, corresponding to 15 different copper concentrations ranging from 10.3 mg l⁻¹ to 59.6 mg l⁻¹, for training and validating five models: K-nearest neighbors (KNN), random forest (RF), artificial neural network (ANN), convolutional neural network (CNN), and recurrent neural network (RNN). Then, performance of these models were evaluated based on accuracy, precision of predictions and training time. In terms of accuracy, the relative error (RE) was 3% for the CNN model and 4% for the ANN model. Conversely, the RE increased from 21% and 43% for the RF and RNN models to 103% for the KNN model due to the unpredictability of Cu concentration outside the training range. Regarding precision, the relative standard deviation (RSD) values were similar across all models, with the lowest RSD being 3% for the CNN model and the highest being 5% for the RF model. The training time indicated that the KNN and RF models provided the best learning speed, while the CNN model had the slowest learning speed, taking 5 times longer than the ANN model. The accuracy of the ANN and CNN models is in agreement with inductively coupled plasma-optical emission spectroscopy (ICP-OES), with a average difference not exceeding 6% when tracking six different Cu concentrations, both inside and outside the training range. Additionally, this research investigated the weight distribution in the hidden layers of the ANN, CNN, and RNN models and feature importances in the RF model to enhance understanding of the internal structure of models and assist analysts in selecting the practical model for tracking heavy metals based on SCGD-OES.
Article
This doctoral seminar investigates the application of deep learning techniques, particularly Artificial Neural Networks (ANNs), to enhance the effectiveness and accuracy of risk analysis in project management. Traditional risk assessment methods often rely on subjective evaluations, limited datasets, and static assumptions, hindering accurate risk prediction in complex and dynamic project environments. Deep learning offers a powerful tool to overcome these limitations by learning complex patterns from large datasets and generating predictions. The seminar will explore various ANN architectures and algorithms (e.g., Multilayer Perceptrons, Convolutional Neural Networks, Recurrent Neural Networks), comparing their suitability and performance in processing and analyzing project risk data. The stages of deep learning-based risk analysis, including project risk factor identification, data preprocessing, model training, validation, and testing, will be explained in detail. Furthermore, the seminar will present application examples using real-world project data from diverse sectors such as construction, software development, and supply chain management. These examples will focus on demonstrating the practical applications and potential benefits of deep learning in predicting project risks, assessing risk impact, and developing risk mitigation strategies. For instance, in a construction project, factors like weather data, material prices, and labor availability can be used to predict project delay risks. In conclusion, this seminar aims to highlight the potential of deep learning-based risk analysis for developing more effective risk management strategies and improving project success, while addressing the opportunities and challenges presented by deep learning in the field of project management. Participants will gain practical insights into applying deep learning techniques to project risk analysis and acquire new perspectives for future research.
Article
Full-text available
Foreign objects such as packaging bags on the road pose a significant threat to driving safety, especially at high speeds or under low-visibility conditions. However, research on detecting road packaging bags remains limited, and existing object detection models face challenges in small object detection, computational efficiency, and embedded deployment. To address these issues, the lightweight deep learning model RGE-YOLO is the foundation for the real-time detection technique proposed in this contribution. Built upon YOLOv8s, RGE-YOLO incorporates RepViTBlock, Grouped Spatial Convolution (GSConv), and Efficient Multi-Scale Attention (EMA) to optimize computational efficiency, model stability, and detection accuracy. GSConv reduces redundant computations, enhancing model lightweight; EMA enhances the model’s ability to capture multi-scale information by integrating channel and spatial attention mechanisms; RepViTBlock integrates convolution and self-attention mechanisms to improve feature extraction capabilities. The proposed method was validated on a custom-built road plastic bag dataset comprising 6,000 augmented images. Experimental results demonstrate that RGE-YOLO outperforms state-of-the-art models such as Single Shot MultiBox Detector (SSD) and Faster Region-based Convolutional Neural Network (Faster R-CNN) in terms of mean average precision (mAP 92.2%) and detection speed (250 FPS), while significantly reducing model parameters (9.1 M) and computational complexity (23.9 GFLOPs), increasing its suitability for installation on computerized systems within vehicles. It introduces an effective and lightweight approach for detecting road packaging bags and contributes to increased driving safety.
Article
Full-text available
This work aims at obtaining Artificial Neural Networks (ANNs) to assess the seakeeping of ships navigating with forward speed. The targets of these ANNs are the Froude–Krylov and wave diffraction-radiation loads needed to compute the ship’s Response Amplitude Operators (RAOs). This research presents a methodology for obtaining the optimal ANN architecture, generating the ship database used for training, and data treatment to enable the prediction of the targets. The dataset is generated with a tridimensional potential code used to solve the wave diffraction-radiation problem using the Boundary Element Method (BEM) for different wave headings and a range of Froude numbers. To assess the developed tool, six assessment ships not included within the training database are used to compare the ANNs predictions against BEM results. The results show deviations of less than 3% compared to BEM for RAO curves. Moreover, RAO curves exhibit high adjustment compared with BEM results for different encounter wave frequencies. Furthermore, ANN’s computational times show a speedup of ×3750 respect to BEM computations.
Chapter
Traveling waves have been measured at a diversity of regions and scales in the brain, however a consensus as to their computational purpose has yet to be reached. An intriguing hypothesis is that traveling waves serve to structure neural representations both in space and time, thereby acting as an inductive bias towards natural data. In this chapter, we investigate this hypothesis by introducing the Neural Wave Machine (NWM)—a locally coupled oscillatory recurrent neural network capable of exhibiting traveling waves in its hidden state. After training on simple dynamic sequences, we show that this model indeed learns static spatial structure such as topographic organization, and further uses complex spatiotemporal structure such as traveling waves to encode observed transformations. To measure the computational implications of this structure, we use a suite of sequence classification and physical dynamics modeling tasks to show that the NWM is both more parameter efficient, and is able to forecast future trajectories of simple physical dynamical systems more accurately than existing state of the art counterparts. We conclude with a discussion of how this model may allow for novel investigations of the computational hypotheses surrounding traveling waves which were previously challenging or impossible.
Preprint
This work provides an interdisciplinary introduction to the mathematical modeling of biological neurons as the foundation of artificial neural networks. It presents the early theoretical work of McCulloch and Rosenblatt, who abstracted key aspects of neural biophysics, and concludes with practical applications in healthcare, including diagnostics, pharmaceutical research, and therapy prediction.
Article
Deep learning models trained for facial recognition now surpass the highest performing human participants. Recent evidence suggests that they also model some qualitative aspects of face processing in humans. This review compares the current understanding of deep learning models with psychological models of the face processing system. Psychological models consist of two components that operate on the information encoded when people perceive a face, which we refer to here as ‘face codes’. The first component, the core system, extracts face codes from retinal input that encode invariant and changeable properties. The second component, the extended system, links face codes to personal information about a person and their social context. Studies of face codes in existing deep learning models reveal some surprising results. For example, face codes in networks designed for identity recognition also encode expression information, which contrasts with psychological models that separate invariant and changeable properties. Deep learning can also be used to implement candidate models of the face processing system, for example to compare alternative cognitive architectures and codes that might support interchange between core and extended face processing systems. We conclude by summarizing seven key lessons from this research and outlining three open questions for future study.
Article
Full-text available
Artificial Intelligence is a field that lives many lives, and the term has come to encompass a motley collection of scientific and commercial endeavours. In this paper, I articulate the contours of a rather neglected but central scientific role that AI has to play, which I dub “AI-as-exploration”. The basic thrust of AI-as-exploration is that of creating and studying systems that can reveal candidate building blocks of intelligence that may differ from the forms of human and animal intelligence we are familiar with. In other words, I suggest that AI is one of the best tools we have for exploring intelligence space, namely the space of possible intelligent systems. I illustrate the value of AI-as-exploration by focusing on a specific case study, i.e., recent work on the capacity to combine novel and invented concepts in humans and Large Language Models. I show that the latter, despite showing human-level accuracy in such a task, most probably solve it in ways radically different, but no less relevant to intelligence research, to those hypothesised for humans.
Conference Paper
Full-text available
Abstract— Because deep learning is so adept at finding patterns in complex, high-dimensional data, it has revolutionized computer vision, beating traditional machine learning models. The development of brain imaging technology led to extensive multim...ORIZATIONS: Deep learning algorithms, Alzheimer's illness, and classification This chapter presents a review of recent neuroimaging studies that used deep learning techniques to enhance AD classification and evaluation. Articles Published between January 2013 and July 2018 (searched on Pubmed and Google Scholar) Alzheimer's Disease Deep Learning Following a review of the articles, the findings were rated based on the classification technique and the kind of neuroimaging data utilized in each research. Twelve of the sixteen included research exclusively used deep learning models (CNN for eleven[5–14] and Bidirectional Long Short-Term Memory networks—Bi-LSTM for one study); the other four studies mixed classical machine learning with deep learning techniques [1-4]. The best accurate diagnosis result was obtained when these multimodal techniques were combined with biomarkers found in cerebrospinal fluid (used for further investigations). Deep learning algorithms have a lot of promise for the diagnostic criteria used to classify AD since they are getting more efficient. Understanding AD is still in its early stages, but deep learning and explainable technologies like these may help shed more light on the symptoms of the illness.
Chapter
Artificial systems for solving difficult problems of pattern recognition are still very inefficient compared to the human visual system — at least if we regard the error rate. On the other hand characters are formed in such a manner that they can be easily distinguished by the human visual system which defines their meaning. It is not necessary that an artificial recognition system is constructed in the same way as neuronal systems; but if it is based on the same principles of perception — as far as they are known — it might have a better chance of performing the same recognition operation.
Conference Paper
In this paper, I propose a new algorithm for self-organizing a multilayered neural network which has an ability to recognize patterns based on the geometrical similarity of their shapes. This network, whose nickname is "neo-cognitron", has a structure similar to the hierarchy model of the visual nervous system proposed by Hubel and Wiesel. The network consists of a photoreceptor layer followed by a cascade connection of a number of modular structures, each of which is composed of two layers of cells connected in a cascade. The first layer of each module consists of "S-cells", which show characteristics similar to simple cells or lower order hypercomplex cells, and the second layer consists of "C-cells" similar to complex cells or higher order hypercomplex cells. The input synapses to each S-cell have plasticity and are modifiable. The network has an ability of unsupervised learning: We don't need any "teacher" during the process of self-organization, and it is only needed to present a set of stimulus patterns repeatedly to the input layer. The network has been simulated on a digital computer. After completion of self-organization, the stimulus patterns has become to elicit their own response from the last C-cell layer. That is, the response of the last C-cell layer changes without fail, if a stimulus patterns of a different category is presented to the input layer. The response of that layer, however, is not affected by the pattern's position at all. Neither is it affected by a certain amount of changes of the pattern's shape or size.
Article
Of the many possible functions of the macaque monkey primary visual cortex (striate cortex, area 17) two are now fairly well understood. First, the incoming information from the lateral geniculate bodies is rearranged so that most cells in the striate cortex respond to specifically oriented line segments, and, second, information originating from the two eyes converges upon single cells. The rearrangement and convergence do not take place immediately, however: in layer IVc, where the bulk of the afferents terminate, virtually all cells have fields with circular symmetry and are strictly monocular, driven from the left eye or from the right, but not both; at subsequent stages, in layers above and below IVc, most cells show orientation specificity, and about half are binocular. In a binocular cell the receptive fields in the two eyes are on corresponding regions in the two retinas and are identical in structure, but one eye is usually more effective than the other in influencing the cell; all shades of ocular dominance are seen. These two functions are strongly reflected in the architecture of the cortex, in that cells with common physiological properties are grouped together in vertically organized systems of columns. In an ocular dominance column all cells respond preferentially to the same eye. By four independent anatomical methods it has been shown that these columns have the from of vertically disposed alternating left-eye and right-eye slabs, which in horizontal section form alternating stripes about 400 mu m thick, with occasional bifurcations and blind endings. Cells of like orientation specificity are known from physiological recordings to be similarly grouped in much narrower vertical sheeet-like aggregations, stacked in orderly sequences so that on traversing the cortex tangentially one normally encounters a succession of small shifts in orientation, clockwise or counterclockwise; a 1 mm traverse is usually accompanied by one or several full rotations through 180 degrees, broken at times by reversals in direction of rotation and occasionally by large abrupt shifts. A full complement of columns, of either type, left-plus-right eye or a complete 180 degrees sequence, is termed a hypercolumn. Columns (and hence hypercolumns) have roughly the same width throughout the binocular part of the cortex. The two independent systems of hypercolumns are engrafted upon the well known topographic representation of the visual field. The receptive fields mapped in a vertical penetration through cortex show a scatter in position roughly equal to the average size of the fields themselves, and the area thus covered, the aggregate receptive field, increases with distance from the fovea. A parallel increase is seen in reciprocal magnification (the number of degrees of visual field corresponding to 1 mm of cortex). Over most or all of the striate cortex a movement of 1-2 mm, traversing several hypercolumns, is accompanied by a movement through the visual field about equal in size to the local aggregate receptive field. Thus any 1-2 mm block of cortex contains roughly the machinery needed to subserve an aggregate receptive field. In the cortex the fall-off in detail with which the visual field is analysed, as one moves out from the foveal area, is accompanied not by a reduction in thickness of layers, as is found in the retina, but by a reduction in the area of cortex (and hence the number of columnar units) devoted to a given amount of visual field: unlike the retina, the striate cortex is virtually uniform morphologically but varies in magnification. In most respects the above description fits the newborn monkey just as well as the adult, suggesting that area 17 is largely genetically programmed. The ocular dominance columns, however, are not fully developed at birth, since the geniculate terminals belonging to one eye occupy layer IVc throughout its length, segregating out into separate columns only after about the first 6 weeks, whether or not the animal has visual experience. If one eye is sutured closed during this early period the columns belonging to that eye become shrunken and their companions correspondingly expanded. This would seem to be at least in part the result of interference with normal maturation, though sprouting and retraction of axon terminals are not excluded.
Article
A new hypothesis for the organization of synapses between neurons is proposed: The synapse from neuron x to neuron y is reinforced when x fires provided that no neuron in the vicinity of y is firing stronger than y. By introducing this hypothesis, a new algorithm with which a multilayered neural network is effectively organized can be deduced. A self-organizing multilayered neural network, which is named cognitron, is constructed following this algorithm, and is simulated on a digital computer. Unlike the organization of a usual brain models such as a three-layered perceptron, the self-organization of a cognitron progresses favorably without having a teacher which instructs in all particulars how the individual cells respond. After repetitive presentations of several stimulus patterns, the cognitron is self-organized in such a way that the receptive fields of the cells become relatively larger in a deeper layer. Each cell in the final layer integrates the information from whole parts of the first layer and selectively responds to a specific stimulus pattern or a feature.
Principles of neurodynamics Responsiveness of neurons to visual patterns in inferotemporal cortex of behaving monkeys
  • F D C Rosenblatt
  • T Sato
  • T Kawamura
  • E Iwai
(eds.), pp. 45-63. New York, San Francisco, London : Academic Press 1974 Rosenblatt, F. : Principles of neurodynamics. Washington, D.C. : Spartan Books 1962 Sato, T., Kawamura, T., Iwai, E.: Responsiveness of neurons to visual patterns in inferotemporal cortex of behaving monkeys. J
Functional architecture of macaque monkey visual cortex A proposed model for visual information processing in the human brain. Urbana, London: Univ. of Illinois Press Explanatory models for neuroplasticity in retinotectral connections In: Plasticity and function in the central nervous system
  • Neurophysiol
  • D H Hubel
  • T N Wiesel
  • M Kabrisky
  • R L Meyer
  • R W Sperry
Neurophysiol. 28, 229-289 (1965) Hubel, D.H., Wiesel, T.N. : Functional architecture of macaque monkey visual cortex. Proc. R. Soc. London, Ser. B 198, 1 59 (1977) Kabrisky, M. : A proposed model for visual information processing in the human brain. Urbana, London: Univ. of Illinois Press 1966 Meyer, R.L., Sperry, R.W. : Explanatory models for neuroplasticity in retinotectral connections. In: Plasticity and function in the central nervous system. Stein, D.G., Rosen, J.J., Butters, N
Improvement in pattern-selectivity of a cognitron
  • K Fukushima
Responsiveness of neurons to visual patterns in inferotemporal cortex of behaving monkeys
  • T Sato
  • T Kawamura
  • E Iwai
Sato, T., Kawamura, T., Iwai, E.: Responsiveness of neurons to visual patterns in inferotemporal cortex of behaving monkeys. J. Physiol. Soc. Jpn. 40, 285-286 (1978) Received:October 28, 1979 Dr. Kunihiko Fukushima NHK Broadcasting Science Research Laboratories 1-10-11, Kinuta, Setagaya Tokyo 157
Kunihiko Fukushima NHK Broadcasting Science Research Laboratories 1-10-11
  • Dr
Dr. Kunihiko Fukushima NHK Broadcasting Science Research Laboratories 1-10-11, Kinuta, Setagaya Tokyo 157
Principles of neurodynamics
  • F Rosenblatt
  • F. Rosenblatt
Receptive fields and functional architecture in two nonstriate visual area (18 and 19) of the cat
  • D H Hubei
  • T N Wiesel