Conference Paper

Stochastic Schemata Exploiter-Based Optimization of Convolutional Neural Network

To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Epigenetics’ flexibility in terms of finer manipulation of genes renders unprecedented levels of refined and diverse evolutionary mechanisms possible. From the epigenetic perspective, the main limitations to improving the stability and accuracy of genetic algorithms are as follows: (1) the unchangeable nature of the external environment, which leads to excessive disorders in the changed phenotype after mutation and crossover; (2) the premature convergence due to the limited types of epigenetic operators. In this paper, a probabilistic environmental gradient-driven genetic algorithm (PEGA) considering epigenetic traits is proposed. To enhance the local convergence efficiency and acquire stable local search, a probabilistic environmental gradient (PEG) descent strategy together with a multi-dimensional heterogeneous exponential environmental vector tendentiously generates more offsprings along the gradient in the solution space. Moreover, to balance exploration and exploitation at different evolutionary stages, a variable nucleosome reorganization (VNR) operator is realized by dynamically adjusting the number of genes involved in mutation and crossover. Based on the above-mentioned operators, three epigenetic operators are further introduced to weaken the possible premature problem by enriching genetic diversity. The experimental results on the open Congress on Evolutionary Computation-2017 (CEC’ 17) benchmark over 10-, 30-, 50-, and 100-dimensional tests indicate that the proposed method outperforms 10 state-of-the-art evolutionary and swarm algorithms in terms of accuracy and stability on comprehensive performance. The ablation analysis demonstrates that for accuracy and stability, the fusion strategy of PEG and VNR are effective on 96.55% of the test functions and can improve the indicators by up to four orders of magnitude. Furthermore, the performance of PEGA on the real-world spacecraft trajectory optimization problem is the best in terms of quality of the solution.
The Stochastic Schemata Exploiter (SSE), one of the Evolutionary Algorithms, is designed to find the optimal solution of a function. SSE extracts common schemata from individual sets with high fitness and generates individuals from the common schemata. For hyper-parameter optimization, the initialization method, the schema extraction method, and the new individual generation method, which are characteristic processes in SSE, are extended. In this paper, an SSE-based multi-objective optimization for AutoML is proposed. AutoML gives good results in terms of model accuracy. However, if only model accuracy is considered, the model may be too complex. Such complex models cannot always be allowed because of the long computation time. The proposed method maximizes the stacking model accuracy and minimizes the model complexity simultaneously. When compared with existing methods, SSE has interesting features such as fewer control parameters and faster convergence properties. The visualization method makes the optimization process transparent and helps users understand the process.
Conference Paper
Full-text available
There has been a recent surge of success in utilizing Deep Learning (DL) in imaging and speech applications for its relatively automatic feature generation and, in particular for convolutional neural networks (CNNs), high accuracy classification abilities. While these models learn their parameters through data-driven methods, model selection (as architecture construction) through hyper-parameter choices remains a tedious and highly intuition driven task. To address this, Multi-node Evolutionary Neural Networks for Deep Learning (MENNDL) is proposed as a method for automating network selection on computational clusters through hyper-parameter optimization performed via genetic algorithms.
Full-text available
We present a highly accurate single-image super-resolution (SR) method. Our method uses a very deep convolutional network inspired by VGG-net used for ImageNet classification \cite{simonyan2015very}. We find increasing our network depth shows a significant improvement in accuracy. Our final model uses 20 weight layers. By cascading small filters many times in a deep network structure, contextual information over large image regions is exploited in an efficient way. With very deep networks, however, convergence speed becomes a critical issue during training. We propose a simple yet effective training procedure. We learn residuals only and use extremely high learning rates (10410^4 times higher than SRCNN \cite{dong2015image}) enabled by adjustable gradient clipping. Our proposed method performs better than existing methods in accuracy and visual improvements in our results are easily noticeable.
Full-text available
We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.
Full-text available
This paper reviews the use of evolutionary algorithms (EAs) to optimize artificial neural networks (ANNs). First, we briefly introduce the basic principles of artificial neural networks and evolutionary algorithms and, by analyzing the advantages and disadvantages of EAs and ANNs, explain the advantages of using EAs to optimize ANNs. We then provide a brief survey on the basic theories and algorithms for optimizing the weights, optimizing the network architecture and optimizing the learning rules, and discuss recent research from these three aspects. Finally, we speculate on new trends in the development of this area.
Conference Paper
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. © 2014 Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov.
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
The ability to accurately represent sentences is central to language understanding. We describe a convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) that we adopt for the semantic modelling of sentences. The network uses Dynamic k-Max Pooling, a global pooling operation over linear sequences. The network handles input sentences of varying length and induces a feature graph over the sentence that is capable of explicitly capturing short and long-range relations. The network does not rely on a parse tree and is easily applicable to any language. We test the DCNN in four experiments: small scale binary and multi-class sentiment prediction, six-way question classification and Twitter sentiment prediction by distant supervision. The network achieves excellent performance in the first three tasks and a greater than 25% error reduction in the last task with respect to the strongest baseline.
A new population-oriented search algorithm called Stochastic Schemata Exploiter (SSE) is proposed here. Similarly to genetic algorithms (GAs), SSE also performs search in the space of solutions by a schemata processing mechanism; however, SSE is characterized by an emphasis on local search. So far, the adaptive global search ability of GAs has been viewed positively. In contrast, SSE takes the point of view that, when applied to actual optimization problems, the GA adaptive global search ability is not necessarily effective; instead, the global search is started by simplifying the population-oriented search of GAs and reducing the number of control parameters. After reviewing herein the schemata processing in GAs, the property of schemata exploitation is defined, a schemata exploitation search algorithm is devised and the proposed method is evaluated and compared with a simple GA using GA-easy and GA-hard test problems.
Conference Paper
Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these “Stepped Sigmoid Units ” are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset. Unlike binary units, rectified linear units preserve information about relative intensities as information travels through multiple layers of feature detectors. 1.
Investigation of Real-valued Stochastic Schemata Exploiter
  • T Maruyama
  • E Kita
Extension of Stochastic Schemata Exploiter to real-valued problem
  • T Maruyama
  • E Kita