[Show abstract][Hide abstract] ABSTRACT: Theoretical and empirical evidence indicates that the depth of neural
networks is crucial for their success. However, training becomes more difficult
as depth increases, and training of very deep networks remains an open problem.
Here we introduce a new architecture designed to overcome this. Our so-called
highway networks allow unimpeded information flow across many layers on
information highways. They are inspired by Long Short-Term Memory recurrent
networks and use adaptive gating units to regulate the information flow. Even
with hundreds of layers, highway networks can be trained directly through
simple gradient descent. This enables the study of extremely deep and efficient
[Show abstract][Hide abstract] ABSTRACT: Dependable cyber-physical systems strive to deliver anticipative, multi-objective performance anytime, facing deluges of inputs with varying and limited resources. This is even more challenging for life-long learning rational agents as they also have to contend with the varying and growing know-how accumulated from experience. These issues are of crucial practical value, yet have been only marginally and unsatisfactorily addressed in AGI research. We present a value-driven computational model of anytime bounded rationality robust to variations of both resources and knowledge. It leverages continually learned knowledge to anticipate, revise and maintain concurrent courses of action spanning over arbitrary time scales for execution anytime necessary.
8th Conference on Artificial General Intelligence AGI 2015; 07/2015
[Show abstract][Hide abstract] ABSTRACT: Convolutional Neural Networks (CNNs) can be shifted across 2D images or 3D
videos to segment them. They have a fixed input size and typically perceive
only small local contexts of the pixels to be classified as foreground or
background. In contrast, Multi-Dimensional Recurrent NNs (MD-RNNs) can perceive
the entire spatio-temporal context of each pixel in a few sweeps through all
pixels, especially when the RNN is a Long Short-Term Memory (LSTM). Despite
these theoretical advantages, however, unlike CNNs, previous MD-LSTM variants
were hard to parallelize on GPUs. Here we re-arrange the traditional cuboid
order of computations in MD-LSTM in pyramidal fashion. The resulting
PyraMiD-LSTM is easy to parallelize, especially for 3D data such as stacks of
brain slice images. PyraMiD-LSTM achieved best known pixel-wise brain image
segmentation results on MRBrainS13 (and competitive results on EM-ISBI12).
[Show abstract][Hide abstract] ABSTRACT: There is plenty of theoretical and empirical evidence that depth of neural
networks is a crucial ingredient for their success. However, network training
becomes more difficult with increasing depth and training of very deep networks
remains an open problem. In this extended abstract, we introduce a new
architecture designed to ease gradient-based training of very deep networks. We
refer to networks with this architecture as highway networks, since they allow
unimpeded information flow across several layers on "information highways". The
architecture is characterized by the use of gating units which learn to
regulate the flow of information through a network. Highway networks with
hundreds of layers can be trained directly using stochastic gradient descent
and with a variety of activation functions, opening up the possibility of
studying extremely deep and efficient architectures.
[Show abstract][Hide abstract] ABSTRACT: Several variants of the Long Short-Term Memory (LSTM) architecture for
recurrent neural networks have been proposed since its inception in 1995. In
recent years, these networks have become the state-of-the-art models for a
variety of machine learning problems. This has led to a renewed interest in
understanding the role and utility of various computational components of
typical LSTM variants. In this paper, we present the first large-scale analysis
of eight LSTM variants on three representative tasks: speech recognition,
handwriting recognition, and polyphonic music modeling. The hyperparameters of
all LSTM variants for each task were optimized separately using random search
and their importance was assessed using the powerful fANOVA framework. In
total, we summarize the results of 5400 experimental runs (about 15 years of
CPU time), which makes our study the largest of its kind on LSTM networks. Our
results show that none of the variants can improve upon the standard LSTM
architecture significantly, and demonstrate the forget gate and the output
activation function to be its most critical components. We further observe that
the studied hyperparameters are virtually independent and derive guidelines for
their efficient adjustment.
[Show abstract][Hide abstract] ABSTRACT: In the absence of external guidance, how can a robot learn to map the many raw pixels of high-dimensional visual inputs to useful action sequences? We propose here Continual Curiosity driven Skill Acquisition (CCSA). CCSA makes robots intrinsically motivated to acquire, store and reuse skills. Previous curiosity-based agents acquired skills by associating intrinsic rewards with world model improvements, and used reinforcement learning to learn how to get these intrinsic rewards. CCSA also does this, but unlike previous implementations, the world model is a set of compact low-dimensional representations of the streams of high-dimensional visual information, which are learned through incremental slow feature analysis. These representations augment the robot's state space with new information about the environment. We show how this information can have a higher-level (compared to pixels) and useful interpretation, for example, if the robot has grasped a cup in its field of view or not. After learning a representation, large intrinsic rewards are given to the robot for performing actions that greatly change the feature output, which has the tendency otherwise to change slowly in time. We show empirically what these actions are (e.g., grasping the cup) and how they can be useful as skills. An acquired skill includes both the learned actions and the learned slow feature representation. Skills are stored and reused to generate new observations, enabling continual acquisition of complex skills. We present results of experiments with an iCub humanoid robot that uses CCSA to incrementally acquire skills to topple, grasp and pick-place a cup, driven by its intrinsic motivation from raw pixel vision.
[Show abstract][Hide abstract] ABSTRACT: The proliferative activity of breast tumors, which is routinely estimated by
counting of mitotic figures in hematoxylin and eosin stained histology
sections, is considered to be one of the most important prognostic markers.
However, mitosis counting is laborious, subjective and may suffer from low
inter-observer agreement. With the wider acceptance of whole slide images in
pathology labs, automatic image analysis has been proposed as a potential
solution for these issues. In this paper, the results from the Assessment of
Mitosis Detection Algorithms 2013 (AMIDA13) challenge are described. The
challenge was based on a data set consisting of 12 training and 11 testing
subjects, with more than one thousand annotated mitotic figures by multiple
observers. Short descriptions and results from the evaluation of eleven methods
are presented. The top performing method has an error rate that is comparable
to the inter-observer agreement among pathologists.
Medical Image Analysis 11/2014; DOI:10.1016/j.media.2014.11.010 · 3.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Recently proposed neural network activation functions such as rectified
linear, maxout, and local winner-take-all have allowed for faster and more
effective training of deep neural architectures on large and complex datasets.
The common trait among these functions is that they implement local competition
between small groups of units within a layer, so that only part of the network
is activated for any given input pattern. In this paper, we attempt to
visualize and understand this self-modularization, and suggest a unified
explanation for the beneficial properties of such networks. We also show how
our insights can be directly useful for efficiently performing retrieval over
large datasets using neural networks.
[Show abstract][Hide abstract] ABSTRACT: The automatic reconstruction of neurons from stacks of electron microscopy sections is an important computer vision problem in neuroscience. Recent advances are based on a two step approach: First, a set of possible 2D neuron candidates is generated for each section independently based on membrane predictions of a local classifier. Second, the candidates of all sections of the stack are fed to a neuron tracker that selects and connects them in 3D to yield a reconstruction. The accuracy of the result is currently limited by the quality of the generated candidates. In this paper, we propose to replace the heuristic set of candidates used in previous methods with samples drawn from a conditional random field (CRF) that is trained to label sections of neural tissue. We show on a stack of Drosophila melanogaster neural tissue that neuron candidates generated with our method produce 30% less reconstruction errors than current candidate generation methods. Two properties of our CRF are crucial for the accuracy and applicability of our method: (1) The CRF models the orientation of membranes to produce more plausible neuron candidates. (2) The interactions in the CRF are restricted to form a bipartite graph, which allows a great sampling speed-up without loss of accuracy.
[Show abstract][Hide abstract] ABSTRACT: Four principal features of autonomous control systems are left both unaddressed and unaddressable by present-day engineering methodologies: (1) The ability to operate effectively in environments that are only partially known at design time; (2) A level of generality that allows a system to reassess and redefine the fulfillment of its mission in light of unexpected constraints or other unforeseen changes in the environment; (3) The ability to operate effectively in environments of significant complexity; and (4) The ability to degrade gracefully— how it can continue striving to achieve its main goals when resources become scarce, or in light of other expected or unexpected constraining factors that impede its progress. We describe new methodological and engineering principles for addressing these shortcomings, that we have used to design a machine that becomes increasingly better at behaving in underspecified circumstances, in a goal-directed way, on the job, by modeling itself and its environment as experience accumulates. The work provides an architectural blueprint for constructing systems with high levels of operational autonomy in underspecified circumstances , starting from only a small amount of designer-specified code—a seed. Using value-driven dynamic priority scheduling to control the parallel execution of a vast number of lines of reasoning, the system accumulates increasingly useful models of its experience, resulting in recursive self-improvement that can be autonomously sustained after the machine leaves the lab, within the boundaries imposed by its designers. A prototype system named AERA has been implemented and demonstrated to learn a complex real-world task—real-time multimodal dialogue with humans—by on-line observation. Our work presents solutions to several challenges that must be solved for achieving artificial general intelligence.
[Show abstract][Hide abstract] ABSTRACT: An important part of human intelligence is the ability to use language. Humans learn how to use language in a society of language users, which is probably the most effective way to learn a language from the ground up. Principles that might allow an artificial agents to learn language this way are not known at present. Here we present a framework which begins to address this challenge. Our auto-catalytic, endogenous, reflective architecture (AERA) supports the creation of agents that can learn natural language by observation. We present results from two experiments where our S1 agent learns human communication by observing two humans interacting in a realtime mock television interview, using gesture and situated language. Results show that S1 can learn multimodal complex language and multimodal communicative acts, using a vocabulary of 100 words with numerous sentence formats, by observing unscripted interaction between the humans, with no grammar being provided to it a priori, and only high-level information about the format of the human interaction in the form of high-level goals of the interviewer and interviewee and a small ontology. The agent learns both the pragmatics, semantics, and syntax of complex sentences spoken by the human subjects on the topic of recycling of objects such as aluminum cans, glass bottles, plastic, and wood, as well as use of manual deictic reference and anaphora.
IADIS International Conference on Intelligent Systems & Agents 2014; 07/2014
[Show abstract][Hide abstract] ABSTRACT: Dealing with high-dimensional input spaces, like visual input, is a challenging task for reinforcement learning (RL). Neuroevolution (NE), used for continuous RL problems, has to either reduce the problem dimensionality by (1) compressing the representation of the neural network controllers or (2) employing a pre-processor (compressor) that transforms the high-dimensional raw inputs into low-dimensional features. In this paper, we are able to evolve extremely small recurrent neural network (RNN) controllers for a task that previously required networks with over a million weights. The high-dimensional visual input, which the controller would normally receive, is first transformed into a compact feature vector through a deep, max-pooling convolutional neural network (MPCNN). Both the MPCNN preprocessor and the RNN controller are evolved successfully to control a car in the TORCS racing simulator using only visual input. This is the first use of deep learning in the context evolutionary RL.
[Show abstract][Hide abstract] ABSTRACT: Traditional convolutional neural networks (CNN) are stationary and
feedforward. They neither change their parameters during evaluation nor use
feedback from higher to lower layers. Real brains, however, do. So does our
Deep Attention Selective Network (dasNet) architecture. DasNets feedback
structure can dynamically alter its convolutional filter sensitivities during
classification. It harnesses the power of sequential processing to improve
classification performance, by allowing the network to iteratively focus its
internal attention on some of its convolutional filters. Feedback is trained
through direct policy search in a huge million-dimensional parameter space,
through scalable natural evolution strategies (SNES). On the CIFAR-10 and
CIFAR-100 datasets, dasNet outperforms the previous state-of-the-art model.
Advances in neural information processing systems 07/2014; 4.
[Show abstract][Hide abstract] ABSTRACT: We propose a method for learning specific object representations that can be applied (and reused) in visual detection and identification tasks. A machine learning technique called Cartesian Genetic Programming (CGP) is used to create these models based on a series of images. Our research investigates how manipulation actions might allow for the development of better visual models and therefore better robot vision.
This paper describes how visual object representations can be learned and improved by performing object manipulation actions, such as, poke, push and pick-up with a humanoid robot. The improvement can be measured and allows for the robot to select and perform the ‘right’ action, i.e. the action with the best possible improvement of the detector.
World Congress on Computational Intelligence 2014 - International Joint Conference on Neural Networks (IJCNN), Beijing, China; 07/2014
[Show abstract][Hide abstract] ABSTRACT: How can a humanoid robot autonomously learn and refine multiple sensorimotor skills as a byproduct of curiosity driven exploration, upon its high-dimensional unprocessed visual input? We present SKILLABILITY, which makes this possible. It combines the recently introduced Curiosity Driven Modular Incremental Slow Feature Analysis (Curious Dr. MISFA) with the well-known options framework. Curious Dr. MISFA's objective is to acquire abstractions as quickly as possible. These abstractions map high-dimensional pixel-level vision to a low-dimensional manifold. We find that each learnable abstraction augments the robot's state space (a set of poses) with new information about the environment, for example, when the robot is grasping a cup. The abstraction is a function on an image, called a slow feature, which can effectively discretize a high-dimensional visual sequence. For example, it maps the sequence of the robot watching its arm as it moves around, grasping randomly, then grasping a cup, and moving around some more while holding the cup, into a step function having two outputs: when the cup is or is not currently grasped. The new state space includes this grasped/not grasped information. Each abstraction is coupled with an option. The reward function for the option's policy (learned through Least Squares Policy Iteration) is high for transitions that produce a large change in the step-functionlike slow features. This corresponds to finding bottleneck states, which are known good subgoals for hierarchical reinforcement learning in the example, the subgoal corresponds to grasping the cup. The final skill includes both the learned policy and the learned abstraction. SKILLABILITY makes our iCub the first humanoid robot to learn complex skills such as to topple or grasp an object, from raw high-dimensional video input, driven purely by its intrinsic motivations.
IEEE International Joint Conference on Neural Networks, Beijin; 07/2014
[Show abstract][Hide abstract] ABSTRACT: In recent years, deep neural networks (including recurrent ones) have won
numerous contests in pattern recognition and machine learning. This historical
survey compactly summarises relevant work, much of it from the previous
millennium. Shallow and deep learners are distinguished by the depth of their
credit assignment paths, which are chains of possibly learnable, causal links
between actions and effects. I review deep supervised learning (also
recapitulating the history of backpropagation), unsupervised learning,
reinforcement learning & evolutionary computation, and indirect search for
short programs encoding deep and large networks.
[Show abstract][Hide abstract] ABSTRACT: Sequence prediction and classification are ubiquitous and challenging
problems in machine learning that can require identifying complex dependencies
between temporally distant inputs. Recurrent Neural Networks (RNNs) have the
ability, in theory, to cope with these temporal dependencies by virtue of the
short-term memory implemented by their recurrent (feedback) connections.
However, in practice they are difficult to train successfully when the
long-term memory is required. This paper introduces a simple, yet powerful
modification to the standard RNN architecture, the Clockwork RNN (CW-RNN), in
which the hidden layer is partitioned into separate modules, each processing
inputs at its own temporal granularity, making computations only at its
prescribed clock rate. Rather than making the standard RNN models more complex,
CW-RNN reduces the number of RNN parameters, improves the performance
significantly in the tasks tested, and speeds up the network evaluation. The
network is demonstrated in preliminary experiments involving two tasks: audio
signal generation and TIMIT spoken word classification, where it outperforms
both RNN and LSTM networks.