Conference PaperPDF Available

A Machine Learning Based Approach to Control Network Activity

Frontiers
Frontiers in Neuroscience
Authors:

Abstract and Figures

Motivation Electrical stimulation of the brain is increasingly used as a strategy to alleviate the symptoms of a range of neurological disorders, and as a possible means to artificially inject information into neural circuits, e.g. towards bidirectional neural prostheses [1]. Conventionally, stimulation of neuronal networks explicitly or implicitly assumes that the response to repeated constant stimuli is predictable. The measured response, however, typically results from interaction with additional neuronal activity not controlled by the stimulus [2]. Constant stimuli are therefore not optimal to reliably induce specific responses. Yet, without suitable models of the interaction between stimulus and ongoing activity it is not possible to adjust individual stimuli such that a defined response feature is achieved optimally.
No caption available
… 
Content may be subject to copyright.
A Machine Learning Based Approach to Control Network Activity
Sreedhar S. Kumar1*, Jan Wülfing2, Samora Okujeni1, Joschka Boedecker3, Martin
Riedmiller3 and Ulrich Egert1
1 University of Freiburg, Biomicrotechnology, Institute of Microsystems Engineering, Germany
2 University of Freiburg, Bernstein Center Freiburg, Germany
3 University of Freiburg, Machine Learning Lab, Germany
Keywords: reinforcement learning, microelectrode arrays, Neuronal cultures, closed-loop stimulation
Motivation
Electrical stimulation of the brain is increasingly used as a strategy to alleviate the symptoms of a
range of neurological disorders, and as a possible means to artificially inject information into neural
circuits, e.g. towards bidirectional neural prostheses [1]. Conventionally, stimulation of neuronal
networks explicitly or implicitly assumes that the response to repeated constant stimuli is
predictable. The measured response, however, typically results from interaction with additional
neuronal activity not controlled by the stimulus [2]. Constant stimuli are therefore not optimal to
reliably induce specific responses. Yet, without suitable models of the interaction between stimulus
and ongoing activity it is not possible to adjust individual stimuli such that a defined response feature
is achieved optimally.
To address these challenges, we propose an autonomous closed-loop paradigm using techniques of
Reinforcement Learning (RL). The approach poses the following questions: How to (1) identify and
capture informative activity patterns in a quantifiable ‘state’ so that a well posed control problem
may be formulated, (2) find the optimal stimulation strategy given a goal and (3) evaluate the quality
of the solution found. In this study we consider a toy control problem defined for a generic network
of neurons. Our objective is to demonstrate how these questions could be addressed and thus apply
an RL controller to autonomously adjust stimulus settings without prior knowledge of the rules
governing the interaction of electrical stimulation with ongoing activity in the network.
Material and Methods
To develop the concept and techniques, we employed generic neuronal networks in vitro as a model
system. Cultured neuronal networks exhibit activity characterized by intermittent network-wide
spontaneous bursts (SB), separated by periods of reduced activity. Electrical stimulation of the
network between SBs also evokes bursts of action potentials (responses). For our experiments, we
selected one stimulating and recording channel each. Response strengths depended on the latency
of the stimulus relative to the previous SB and can be described by a saturating exponential model
[3]. However, this period of latency is also prone to interruption by ongoing activity. Therefore
stimulus efficacy, defined here as the response strengths per SB, depends on both opposing
modalities. Using phenomenological models, we show that their dynamic interplay presents a trade-
off scenario that admits a network-specific unique optimal stimulus latency that maximizes stimulus
efficacy. In this study, we asked if an RL based controller can autonomously find the ideal balance in
this trade-off: the optimal stimulus latency. An open-loop characterization of each network was used
to make parametric model-based predictions of optimal stimulus latencies. The quality of the
controller's learned strategy was evaluated using these predictions.
Results
In order to extract the parameters of the response strength model, stimuli were first delivered at
random latencies relative to SBs in an open-loop setting. A statistical model of the probability of
occurrence of SBs was estimated using spontaneous activity recordings. Weighting the response
strengths with the interruption probabilities yielded quasi-concave objective functions and unique
optimal latencies for each of the 20 networks studied (Fig. 1A,B).
In a closed-loop session, an RL controller interacted with the network with the goal of autonomously
maximizing stimulus efficacy. Learning proceeded in alternating training and testing sessions. During
training, the controller explored the parameter space while in testing, the learned strategy was
executed.
Stimulus latencies learned by the controller were strongly correlated with optimal latencies as
predicted from open-loop studies (r=0.94, p<10-8, n=17 networks, Fig. 1C). Moreover, in 94.2% of the
sessions (n=52, 11 networks), the percentage of interrupted events per session diminished post
learning. After learning, stimulus efficacy improved in each of these networks, further supporting the
effectiveness of the learning algorithm (Fig. 1D).
Discussion
Closed-loop stimulation has been proposed as a promising strategy to intervene in the dynamics of
pathological networks while adapting to ongoing activity. The selection of signal features to close
such a loop and strategies to identify optimal stimulus settings given a desired network response
remain open problems. We propose methods of RL to autonomously choose optimal control policies
given a pre-defined goal. We considered a toy problem that captures some of the major challenges
that a closed-loop paradigm would face in a biomedical application, i.e. in a complex, adaptive
environment. Balancing the trade-off of response strengths and interruptions involves finding the
dependence of response strengths on stimulus latencies and adapting at the same time to the
dynamics of ongoing activity. In this study, we demonstrate the capacity of RL based techniques to
address such a challenge. Using phenomenological models derived from prior studies on such
networks, we independently validate the performance of the controller.
Conclusion
We show that an RL based autonomous strategy is capable of choosing optimal strategies in the
context of a dynamic neuronal system. We focused on a trade-off problem: to maximize a derived
feature of the response (stimulus efficacy measured as the response strength per SB) with a priori
unknown value. Estimates of a unique network-specific optimal strategy can be computed for this
problem. This allowed us to validate the latencies learned autonomously by the controller. Our
paradigm offers the ability to learn optimal interaction strategies in the absence of complete
knowledge about the network or quantitative principles defining its dynamics.
Acknowledgements
This project was supported by BrainLinks-BrainTools Cluster of Excellence (DFG-EXC 1086) and the
Bernstein Focus Neurotechnology Freiburg*Tübingen (BMBF FKZ 01GQ0830).
References
[1] Raspopovic et al. (2014) Sci Transl Med 6, 222ra19.
[2] Arieli, A. et al. (1996) Science 273, 18681871.
[3] Weihberger,O. et al. (2013) J. Neurophys 109,17641774.
Figure Legend
(A) Fitted models of the probability of avoiding interruptions due to SBs (blue), response
strengths(orange), and the resulting weighted response curve (black) shown for a network. An
optimal latency of ~1.5 s emerges in this case.
(B) All predicted objective functions for each of the 20 networks studied were quasiconcave and
unique choices of optimal latencies were available.
(C) Across networks, learned stimulus latencies showed a positive correlation with predicted optimal
values.
(D) After learning, mean rewards increased in each network, indicative of the improvement in
stimulation efficacy.
ResearchGate has not been able to resolve any citations for this publication.
  • Raspopovic
Raspopovic et al. (2014) Sci Transl Med 6, 222ra19.
  • O Weihberger
Weihberger,O. et al. (2013) J. Neurophys 109,1764-1774.
  • A Arieli
Arieli, A. et al. (1996) Science 273, 1868-1871.