Jurgen Schmidhuber’s research while affiliated with Dalle Molle Institute for Artificial Intelligence Research and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (85)


Motion Dynamics Improve Speaker-Independent Lipreading
  • Conference Paper

May 2020

·

37 Reads

·

5 Citations

Matteo Riva

·

Michael Wand

·

Jurgen Schmidhuber


Fig. 1: Example frame from GRID with extracted mouth area
Result breakdown with different noise levels, for the best setup. Improvement is given as error reduction compared to audio-only baseline. Averaged over development speakers.
Investigations on End- to-End Audiovisual Fusion
  • Conference Paper
  • Full-text available

April 2018

·

114 Reads

·

31 Citations

Audiovisual speech recognition (AVSR) is a method to alleviate the adverse effect of noise in the acoustic signal. Leveraging recent developments in deep neural network-based speech recognition, we present an AVSR neural network architecture which is trained end-to-end, without the need to separately model the process of decision fusion as in conventional (e.g. HMM-based) systems. The fusion system outperforms single-modality recognition under all noise conditions. Investigation of the saliency of the input features shows that the neural network automatically adapts to different noise levels in the acoustic signal.

Download

Discovering Boolean Gates in Slime Mould

January 2018

·

49 Reads

·

18 Citations

Slime mould of Physarum polycephalum is a large cell exhibiting rich spatial non-linear electrical characteristics. We exploit the electrical properties of the slime mould to implement logic gates using a flexible hardware platform designed for investigating the electrical properties of a substrate (Mecobo). We apply arbitrary electrical signals to ‘configure’ the slime mould, i.e. change shape of its body and, measure the slime mould’s electrical response. We show that it is possible to find configurations that allow the Physarum to act as any 2-input Boolean gate. The occurrence frequency of the gates discovered in the slime was analysed and compared to complexity hierarchies of logical gates obtained in other unconventional materials. The search for gates was performed by both sweeping across configurations in the real material as well as training a neural network-based model and searching the gates therein using gradient descent.


Table 2 : Number of XOR gates found for given Input pin configurations.
Figure 3: Time-lapse of Physarum growing on agar with electrodes. Images are approximately 6 hours apart.  
Figure 6: Loss, true output, and prediction for all validation samples on each of the three tasks. The examples are sorted by their associated training loss.  
Discovering Boolean Gates in Slime Mould

July 2016

·

303 Reads

·

14 Citations

·

Jan Koutnik

·

·

[...]

·

Andy Adamatzky

Slime mould of Physarum polycephalum is a large cell exhibiting rich spatial non-linear electrical characteristics. We exploit the electrical properties of the slime mould to implement logic gates using a flexible hardware platform designed for investigating the electrical properties of a substrate (MECOBO). We apply arbitrary electrical signals to `configure' the slime mould, i.e. change shape of its body and, measure the slime mould's electrical response. We show that it is possible to find configurations that allow the Physarum to act as any 2-input Boolean gate. The occurrence frequency of the gates discovered in the slime was analysed and compared to complexity hierarchies of logical gates obtained in other unconventional materials. The search for gates was performed by both sweeping across configurations in the real material as well as training a neural network-based model and searching the gates therein using gradient descent.





Fig. 1. 
Fig. 2. System architecture. See text for details.
Fig. 3. Shaping improves the speed of policy learning, with an appropriate exploration setting.
Fig. 4. Value functions in various conditions of learning viewed as graphs. All graphs were constructed in the following way. The weight for the edge shown between each two EBs is larger of the two directional transitions in the value function. In this way, a symmetric weighted adjacency matrix is constructed, which is row-normalized. (a). Initial value function using zeros. (b). State-action values of SARSA( λ ) using shaping and memory (eligibility trace) after 20,000 transitions. (c). Transition values of T-learning using shaping and memory after 20,000 transitions. (d). State-action values when shaping was not used. (e). State-action values (outcome of SARSA) after the experiment where several EBs had a high chance of failure. (f). The result of T-learning using the same conditions as (e). 
Fig. 5. 
Reinforcement and shaping in learning action sequences with neural dynamics

December 2014

·

142 Reads

·

3 Citations

Neural dynamics offer a theoretical and computational framework, in which cognitive architectures may be developed, which are suitable both to model psychophysics of human behaviour and to control robotic behaviour. Recently, we have introduced reinforcement learning in this framework, which allows an agent to learn goal-directed sequences of behaviours based on a reward signal, perceived at the end of a sequence. Although stability of the dynamic neural fields and behavioural organisation allowed to demonstrate autonomous learning in the robotic system, learning of longer sequences was taking prohibitedly long time. Here, we combine the neural dynamic reinforcement learning with shaping, which consists in providing intermediate rewards and accelerates learning.We have implemented the new learning algorithm on a simulated Kuka YouBot robot and evaluated robustness and efficacy of learning in a pick-and-place task.


A comparison of algorithms and humans for mitosis detection

April 2014

·

51 Reads

·

19 Citations

We consider the problem of detecting mitotic figures in breast cancer histology slides. We investigate whether the performance of state-of-the-art detection algorithms is comparable to the performance of humans, when they are compared under fair conditions: our test subjects were not previously exposed to the task, and were required to learn their own classification criteria solely by studying the same training set available to algorithms. We designed and implemented a standardized web-based test based on the publicly-available MITOS dataset, and compared results with the performance of the 6 top-scoring algorithms in the ICPR 2012 Mitosis Detection Contest. The problem is presented as a classification task on a balanced dataset. 45 different test subjects produced a total of 3009 classifications. The best individual (accuracy = 0.859 ± 0.012), is outperformed by the most accurate algorithm (accuracy = 0.873 ± 0.004). This suggests that state-of-the-art detection algorithms are likely limited by the size of the training set, rather than by lack of generalization ability.


Citations (72)


... Instead of fusing decisions, representation fusion is an alternative fusion approach for AVSR, e.g., via multi-modal attentions [16] or via gating [17,18]-for example in [18], which proposed the gated multi-modal unit to dynamically fuse different feature streams. Another example for representation fusion is in [19][20][21], which used deep feed-forward networks to first create and secondly fuse audio and video representations. ...

Reference:

Reliability-Based Large-Vocabulary Audio-Visual Speech Recognition
Motion Dynamics Improve Speaker-Independent Lipreading
  • Citing Conference Paper
  • May 2020

... Surface electromyography, which can be used to detect the activities of facial muscles during speech production using electrodes, intrinsically suffers from high signal variability between speech sessions owing to variations in sensor placement [32]. Moreover, their usability is limited because they require the attachment of electrodes to the skin. ...

Adaptation of an EMG-Based Speech Recognizer via Meta-Learning
  • Citing Conference Paper
  • November 2019

... Audio speech recognition faces challenges in accurately transcribing spoken language due to background noise, variations in speech patterns, speaker accents, and homophones that can result in different words being transcribed identically. To address these challenges, researchers have explored combining audio features with visual features techniques to improve the overall accuracy of speech recognition systems [1,2]. Despite the numerous advantages of audio visual speech recognition, it encounters challenges that can lead to reduced accuracy. ...

Investigations on End- to-End Audiovisual Fusion

... The distribution demonstrates frequencies of discoveries of the four-input-oneoutput logical gates and could be used for characterisation of a computational power of the fungal substrates. This is accompanied by distributions of gates detcted in experimental laboratory reservoir computing with slime mould Physarum polycephalum [18], succulent plant [8] and numerical modelling experiments on computing with protein verotoxin [3], actin bundles network [9], and actin monomer [4]. The distributions of gates discovered in natural systems are alike to each other in the hierarchies of the gates frequencies. ...

Discovering Boolean Gates in Slime Mould
  • Citing Chapter
  • January 2018

... In this way, the adversarial loss has been recently employed to learn multi-modality by assigning different weights to loss for boundary consistency (Connelly Barnes, 2009;Kaiming He J. S., 2014). In addition, the multi-column structure (Dan Ciregan, 2012;Yingying Zhang, 2016;Forest Agostinelli, 2013) is used in the model since it can decompose images into components with different receptive fields and feature resolutions. Unlike multi-scale or coarse-to-fine strategies (Chao Yang, 2017;T. ...

Multi-column Deep Neural Networks for Image Classification
  • Citing Conference Paper
  • June 2012

... And the most rare gate is a logical exclusion "x XOR y". Meanwhile, the frequencies of Boolean gates for the plasmodium of Physarum polycephalum is quite different [14]. First, the two gates "SELECT x" and "SELECT y" occur mostly frequently, but the gates "NOT x AND y" and "x AND NOT y" do not occur at all. ...

Discovering Boolean Gates in Slime Mould

... In the analysis phase, more and more successful results are obtained with deep learning methods [4]. When lip-reading studies using deep learning methods are examined, it is seen that the studies focus mainly on the English language [5][6][7][8]. At the same time, it is seen that there are many data sets for lip reading in English in order to be able to do these studies. ...

Lipreading with long short-term memory
  • Citing Conference Paper
  • March 2016

... Instead of relying on an explicitly defined reward function, IL uses expert demonstrations to model the desired behavior. This paradigm has proven effective in various realworld applications, such as autonomous driving (Bojarski et al., 2016;Codevilla et al., 2019) and robotics (Giusti et al., 2015;Finn et al., 2016). A prominent method within IL is Behavior Cloning (BC), where supervised learning is applied to a dataset of stateaction pairs D = {(s, a = π e (s))}, provided by an expert policy π e . ...

A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots
  • Citing Article
  • January 2015

IEEE Robotics and Automation Letters

... An agent can learn a goal-directed sequence of actions using Reinforcement Learning (RL). For example, [16] demonstrates a robot learning a pick-and place behavior, but policies learned from RL are not necessarily general and applicable to a wide variety of novel scenarios. We demonstrate here that the robot can apply a generalized action in a scenario that it has not seen before. ...

Reinforcement and shaping in learning action sequences with neural dynamics

... After the pre-processing operations, the region of interest (ROI) is divided or broken down, which is referred to as "image segmentation" [26,34]. This division of an image into multiple simple and meaningful components is then termed as image segmentation. ...

A comparison of algorithms and humans for mitosis detection
  • Citing Conference Paper
  • April 2014