December 2023
·
5 Reads
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
December 2023
·
5 Reads
September 2022
·
121 Reads
Optimizing combinatorial structures is core to many real-world problems, such as those encountered in life sciences. For example, one of the crucial steps involved in antibody design is to find an arrangement of amino acids in a protein sequence that improves its binding with a pathogen. Combinatorial optimization of antibodies is difficult due to extremely large search spaces and non-linear objectives. Even for modest antibody design problems, where proteins have a sequence length of eleven, we are faced with searching over 2.05 x 10^14 structures. Applying traditional Reinforcement Learning algorithms such as Q-learning to combinatorial optimization results in poor performance. We propose Structured Q-learning (SQL), an extension of Q-learning that incorporates structural priors for combinatorial optimization. Using a molecular docking simulator, we demonstrate that SQL finds high binding energy sequences and performs favourably against baselines on eight challenging antibody design tasks, including designing antibodies for SARS-COV.
June 2022
·
45 Reads
Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost is sparse and unknown, which unavoidably leads to constraint violations -- a phenomenon ideally to be avoided in safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is nonnegative if and only if the constraint is satisfied. The value of this state also serves as a distance toward constraint violation, while its initial value indicates the available safety budget. This idea allows us to derive policies for scheduling the safety budget during training. We call our approach Simmer (Safe policy IMproveMEnt for RL) to reflect the careful nature of these schedules. We apply this idea to two safe RL problems: RL with constraints imposed on an average cost, and RL with constraints imposed on a cost with probability one. Our experiments suggest that simmering a safe algorithm can improve safety during training for both settings. We further show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
May 2022
·
24 Reads
Many real-world settings involve costs for performing actions; transaction costs in financial systems and fuel costs being common examples. In these settings, performing actions at each time step quickly accumulates costs leading to vastly suboptimal outcomes. Additionally, repeatedly acting produces wear and tear and ultimately, damage. Determining when to act is crucial for achieving successful outcomes and yet, the challenge of efficiently learning to behave optimally when actions incur minimally bounded costs remains unresolved. In this paper, we introduce a reinforcement learning (RL) framework named Learnable Impulse Control Reinforcement Algorithm (LICRA), for learning to optimally select both when to act and which actions to take when actions incur costs. At the core of LICRA is a nested structure that combines RL and a form of policy known as impulse control which learns to maximise objectives when actions incur costs. We prove that LICRA, which seamlessly adopts any RL method, converges to policies that optimally select when to perform actions and their optimal magnitudes. We then augment LICRA to handle problems in which the agent can perform at most actions and more generally, faces a budget constraint. We show LICRA learns the optimal value function and ensures budget constraints are satisfied almost surely. We demonstrate empirically LICRA's superior performance against benchmark RL methods in OpenAI gym's Lunar Lander and in Highway environments and a variant of the Merton portfolio problem within finance.
May 2022
·
32 Reads
Efficient reinforcement learning (RL) involves a trade-off between "exploitative" actions that maximise expected reward and "explorative'" ones that sample unvisited states. To encourage exploration, recent approaches proposed adding stochasticity to actions, separating exploration and exploitation phases, or equating reduction in uncertainty with reward. However, these techniques do not necessarily offer entirely systematic approaches making this trade-off. Here we introduce SElective Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game between an RL agent -- \exploiter, which purely exploits known rewards, and another RL agent -- \switcher, which chooses at which states to activate a pure exploration policy that is trained to minimise system uncertainty and override Exploiter. Using a form of policies known as impulse control, \switcher is able to determine the best set of states to switch to the exploration policy while Exploiter is free to execute its actions everywhere else. We prove that SEREN converges quickly and induces a natural schedule towards pure exploitation. Through extensive empirical studies in both discrete (MiniGrid) and continuous (MuJoCo) control benchmarks, we show that SEREN can be readily combined with existing RL algorithms to yield significant improvement in performance relative to state-of-the-art algorithms.
April 2022
·
50 Reads
·
13 Citations
We introduce a new design framework for implementing negative feedback regulation in synthetic biology, which we term 'dichotomous feedback'. Our approach is different from current methods, in that it sequesters existing fluxes in the process to be controlled, and in this way takes advantage of the process's architecture to design the control law. This signal sequestration mechanism appears in many natural biological systems and can potentially be easier to realize than 'molecular sequestration' and other comparison motifs that are nowadays common in biomolecular feedback control design. The loop is closed by linking the strength of signal sequestration to the process output. Our feedback regulation mechanism is motivated by two-component signalling systems, where a second response regulator could be competing with the natural response regulator thus sequestering kinase activity. Here, dichotomous feedback is established by increasing the concentration of the second response regulator as the level of the output of the natural process increases. Extensive analysis demonstrates how this type of feedback shapes the signal response, attenuates intrinsic noise while increasing robustness and reducing crosstalk.
February 2022
·
51 Reads
Satisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting them into the state-space and reshaping the objective. We show that Saute MDP satisfies the Bellman equation and moves us closer to solving Safe RL with constraints satisfied almost surely. We argue that Saute MDP allows to view Safe RL problem from a different perspective enabling new features. For instance, our approach has a plug-and-play nature, i.e., any RL algorithm can be "sauteed". Additionally, state augmentation allows for policy generalization across safety constraints. We finally show that Saute RL algorithms can outperform their state-of-the-art counterparts when constraint satisfaction is of high importance.
February 2022
·
24 Reads
We consider a context-dependent Reinforcement Learning (RL) setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution. We argue that this challenging case is often met in applications and we tackle it using a Bayesian approach and variational inference. We adapt a sticky Hierarchical Dirichlet Process (HDP) prior for model learning, which is arguably best-suited for Markov process modeling. We then derive a context distillation procedure, which identifies and removes spurious contexts in an unsupervised fashion. We argue that the combination of these two components allows to infer the number of contexts from data thus dealing with the context cardinality assumption. We then find the representation of the optimal policy enabling efficient policy learning using off-the-shelf RL algorithms. Finally, we demonstrate empirically (using gym environments cart-pole swing-up, drone, intersection) that our approach succeeds where state-of-the-art methods of other frameworks fail and elaborate on the reasons for such failures.
January 2022
·
131 Reads
·
23 Citations
Machine Learning
In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel acquisition functions for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our acquisition functions and safety constraints.
January 2022
·
69 Reads
·
16 Citations
IEEE Transactions on Automatic Control
Semidefinite and sum-of-squares (SOS) optimization are fundamental computational tools in many areas, including linear and nonlinear systems theory. However, the scale of problems that can be addressed reliably and efficiently is still limited. In this paper, we introduce a new notion of block factor-width-two matrices and build a new hierarchy of inner and outer approximations of the cone of positive semidefinite (PSD) matrices. This notion is a block extension of the standard factor-width-two matrices, and allows for an improved inner-approximation of the PSD cone. In the context of SOS optimization, this leads to a block extension of the scaled diagonally dominant sum-of-squares (SDSOS) polynomials. By varying a matrix partition, the notion of block factor-width-two matrices can balance a tradeoff between the computation scalability and solution quality for solving semidefinite and SOS optimization problems. Numerical experiments on a range of large-scale instances confirm our theoretical findings.
... Natural biological systems may exhibit dichotomous feedback, which works through sequestration of a molecule or a signal. Sootla et al. (2022) proposed several ways of implementing this functionality. Here we study the following model, which is described in equations (2.7) of their article: ...
April 2022
... -the idea to optimize over the dual cone of FW n (k) by utilizing clique trees [24]. -a variation on the factor width cone involving fewer blocks [25]. ...
January 2022
IEEE Transactions on Automatic Control
... In model-based RL an approximate modelP of the true dynamics is learned, and optionally an approximate reward functionR. This model is then used for data generation [4], [5], [42], planning [2], [3], [43]- [45] or stochastic optimization [14], [16], [19], [46]. ...
January 2022
Machine Learning
... For instance, specific Koopman eigenfunctions were used in [7] to obtain necessary and sufficient conditions for global stability of hyperbolic attractors, a result which mirrors well-known spectral stability results for linear systems. Moreover, a numerical method was proposed in [13] to compute Lyapunov functions from a finite dimensional approximation of the Koopman operator. ...
February 2020
Lecture Notes in Control and Information Sciences
... for some µ ∈ R. Since W ⊤ W = W , pre-and post-multiplying (19) withH ⊤ andH yields the matrix inequality forK S,ext in (11), implying that K S,ext ⊂K S,ext . Next, for K S ⊂ K S,ext , one can chooseG =G ⊤ =Q to recover the LMIs in (8). ...
October 2019
IEEE Transactions on Automatic Control
... Verifying stability of the signalling cascades is well studied in the literature and our sequestration of phosphorylation motif can be treated by existing methods (e.g. [33,40]). First we will slightly simplify the system ...
June 2019
... Recently, a new block extension of SDD matrices, called block factor-width-two matrices, has been introduced in [20], [21], where Q i 0 involves a 2 × 2 block principle matrix. This notion works on block-partitioned matrices, and the block partition brings flexibility in terms of both solution quality and numerical efficiency in solving (1), as demonstrated extensively in [20], [21]. ...
June 2019
... On the other hands, a canonical decentralized LQ optimal control can be cast as a centralized problem where the stabilizing feedback gain matrix is restricted to lie in a particular subspace K. If K denotes the block-diagonal matrix space, classic convex parametrization methods can be utilized to obtain an approximate convex problem [9], and ADMM (Alternating Direction Method of Multipliers) based method is shown to converge to the global optimizer of the abovementioned approximate convex problem [52,49,24]. If K possesses general sparsity structure, the SDP (Semidefinite Programming) relaxation perspective is of concern. ...
August 2019
IEEE Transactions on Control of Network Systems
... This encompasses many current implementations of feedback, such as sigma/ anti-sigma factors [18], scaffold/anti-scaffold proteins [19], mRNA-sRNA interactions [20] and others. In fact, most of the recent research in feedback control for synthetic biology is aimed at analysing, realizing and applying different versions of this motif [21][22][23][24][25]. All of these motifs, however, are usually designed without taking into account the architecture of the process to be controlled-the process is instead used to guide tuning of the controller parameters in order to achieve the desired performance. ...
December 2018
... The chordal decomposition [12] has been used to reduce the complexity of sparse semidefinite programs as in [13], where it is guaranteed near-linear time complexity for off-the-shelf interior-point methods implemented in SeDuMi and MOSEK. However, for controller design, only linear systems have been considered to specific applications, such as distributed robustness analysis of interconnected uncertain systems [14], design of distributed control of interconnected linear systems [15], [16], and decentralized control design to weakly coupled linear systems [17]. Thus, to the best of the authors' knowledge, there exists a lack in the literature on the possible advantages of using chordal decomposition to obtain distributed controllers for LSSs composed of nonlinear subsystems. ...
June 2018