Aleksandr I. Panov

Aleksandr I. Panov
Russian Academy of Sciences | RAS · Federal Research Center "Computer Science and Control"

PhD

About

68
Publications
10,991
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
407
Citations
Additional affiliations
September 2015 - August 2020
National Research University Higher School of Economics
Position
  • Research Associate
September 2011 - June 2016
Peoples' Friendship University of Russia
Position
  • Professor (Assistant)
September 2011 - October 2020
Moscow Institute of Physics and Technology
Position
  • Principal Investigator
Description
  • Head of Cognitive Dynamic Systems Laboratory
Education
July 2011 - August 2014
Institute for Systems Analysis RAS
Field of study
  • Theoretical Bases of Computer Science
September 2009 - July 2011
Moscow Institute of Physics and Technology
Field of study
  • Applied Mathematics and Physics
September 2005 - July 2009
Novosibirsk State University
Field of study
  • Physics

Publications

Publications (68)
Article
Full-text available
Biologically plausible models of learning may provide a crucial insight for building autonomous intelligent agents capable of performing a wide range of tasks. In this work, we propose a hierarchical model of an agent operating in an unfamiliar environment driven by a reinforcement signal. We use temporal memory to learn sparse distributed represen...
Preprint
We introduce POGEMA (https://github.com/AIRI-Institute/pogema) a sandbox for challenging partially observable multi-agent pathfinding (PO-MAPF) problems . This is a grid-based environment that was specifically designed to be a flexible, tunable and scalable benchmark. It can be tailored to a variety of PO-MAPF, which can serve as an excellent testi...
Preprint
Full-text available
We present the IGLU Gridworld: a reinforcement learning environment for building and evaluating language conditioned embodied agents in a scalable way. The environment features visual agent embodiment, interactive learning through collaboration, language conditioned RL, and combinatorically hard task (3d blocks building) space.
Preprint
Full-text available
Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interac...
Preprint
Full-text available
Human intelligence has the remarkable ability to quickly adapt to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose \emph{IGLU: I...
Preprint
Full-text available
Human intelligence has the remarkable ability to quickly adapt to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose \emph{IGLU: I...
Chapter
In this paper, we consider the problem of multi-agent navigation in partially observable grid environments. This problem is challenging for centralized planning approaches as they typically rely on full knowledge of the environment. To this end, we suggest utilizing the reinforcement learning approach when the agents first learn the policies that m...
Chapter
This paper explores an application of image augmentation in reinforcement learning tasks - a popular regularization technique in the computer vision area. The analysis is based on the model-free off-policy algorithms. As a regularization, we consider the augmentation of the frames that are sampled from the replay buffer of the model. Evaluated augm...
Chapter
Exploration is an essential part of reinforcement learning, which restricts the quality of learned policy. Hard-exploration environments are defined by huge state space and sparse rewards. In such conditions, an exhaustive exploration of the environment is often impossible, and the successful training of an agent requires a lot of interaction steps...
Chapter
In this paper, we propose an HISNav VQA dataset – a challenging dataset for a Visual Question Answering task that is aimed at the needs of Visual Navigation in human-centered environments. The dataset consists of images of various room scenes that were captured using the Habitat virtual environment and of questions important for navigation tasks us...
Article
In this paper, we propose a Vector Semiotic Model as a possible solution to the symbol grounding problem in the context of Visual Question Answering. The Vector Semiotic Model combines the advantages of a Semiotic Approach implemented in the Sign-Based World Model and Vector Symbolic Architectures. The Sign-Based World Model represents information...
Preprint
Full-text available
Exploration is an essential part of reinforcement learning, which restricts the quality of learned policy. Hard-exploration environments are defined by huge state space and sparse rewards. In such conditions, an exhaustive exploration of the environment is often impossible, and the successful training of an agent requires a lot of interaction steps...
Preprint
This work studies object goal navigation task, which involves navigating to the closest object related to the given semantic category in unseen environments. Recent works have shown significant achievements both in the end-to-end Reinforcement Learning approach and modular systems, but need a big step forward to be robust and optimal. We propose a...
Chapter
In this paper, we propose a biologically plausible model for learning the decision-making sequence in an external environment with internal motivation. As a computational model, we propose a hierarchical architecture of an intelligent agent acquiring experience based on reinforcement learning. We use the basal ganglia model to aggregate a reward, a...
Chapter
The multi-modal tasks have started to play a significant role in the research on Artificial Intelligence. A particular example of that domain is visual-linguistic tasks, such as Visual Question Answering and its extension, Visual Dialog. In this paper, we concentrate on the Visual Dialog task and dataset. The task involves two agents. The first age...
Article
Full-text available
In this work we study the behavior of groups of autonomous vehicles, which are the part of the Internet of Vehicles systems. One of the challenging modes of operation of such systems is the case when the observability of each vehicle is limited and the global/local communication is unstable, e.g. in the crowded parking lots. In such scenarios the v...
Preprint
In this paper, we consider the problem of multi-agent navigation in partially observable grid environments. This problem is challenging for centralized planning approaches as they, typically, rely on the full knowledge of the environment. We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map ob...
Article
Full-text available
Deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. These results are commonly achieved at the expense of huge computational costs and require an incredible number of episodes of interactions between the agent and the environment. Hierarchical methods and expert demonstrations are among the most pro...
Chapter
In recent years, the task of visual question answering (VQA) at the intersection of computer vision and natural language processing is gaining interest in the scientific community. Even though modern systems achieve good results on standard datasets, these results are far from what is achieved in Computer Vision or Natural Language Processing separ...
Article
We present Hierarchical Deep Q-Network (HDQfD) that won first place in the MineRL competition. The HDQfD works on imperfect demonstrations and utilizes the hierarchical structure of expert trajectories. We introduce the procedure of extracting an effective sequence of meta-actions and subgoals from the demonstration data. We present a structured ta...
Chapter
In this work, we explore the application of the Self-other-Modelling algorithm (SOM) to several agent architectures for the collaborative grid-based environment. Asynchronous Advantage Actor-Critic (A3C) algorithm was compared with the OpenAI Hide-and-seek (HNS) agent. We expand their implementation by adding the SOM algorithm. As an extension of t...
Article
Deep learning and especially deep reinforcement learning usually require huge amount of data for training and simulators using is perspective approach to provide this data. Model trained in simulator can be transferred on real robot without wasting a lot of time to collect data. Training in simulator also allows using of different techniques to spe...
Book
This book constitutes the proceedings of the 19th Russian Conference on Artificial Intelligence, RCAI 2021, held in Moscow, Russia, in October 2021. The 19 full papers and 7 short papers presented in this volume were carefully reviewed and selected from 80 submissions. The conference deals with a wide range of topics, categorized into the following...
Chapter
In the paper, we consider the problem of the robotic movement inaccuracy. We suggest that clarifying the abstract actions of the behavior planner will help build more precise control of the robot. A multi-agent planner for the synthesis of the abstract actions with refinement for a two-dimensional movement task was proposed. We analyze the problem...
Chapter
In this paper, we consider the problem of controlling an agent that simulates the behavior of an self-driving car when passing a road intersection together with other vehicles. We consider the case of using smart city systems, which allow the agent to get full information about what is happening at the intersection in the form of video frames from...
Chapter
This work is devoted to unresolved problems of Artificial General Intelligence - the inefficiency of transfer learning. One of the mechanisms that are used to solve this problem in the area of reinforcement learning is a model-based approach. In the paper we are expanding the schema networks method which allows to extract the logical relationships...
Chapter
The paper is dedicated to the use of distributed hyperdimensional vectors to represent sensory information in the sign-based cognitive architecture, in which the image component of a sign is encoded by a causal matrix. The hyperdimensional representation allows us to update the precedent dimension of the causal matrix and accumulate information in...
Chapter
Today planning algorithms are among the most sought after. One of the main such algorithms is Monte Carlo Tree Search. However, this architecture is complex in terms of parallelization and development. We presented possible approximations for the MCTS algorithm, which allowed us to significantly increase the learning speed of the agent.
Preprint
This work is devoted to unresolved problems of Artificial General Intelligence - the inefficiency of transfer learning. One of the mechanisms that are used to solve this problem in the area of reinforcement learning is a model-based approach. In the paper we are expanding the schema networks method which allows to extract the logical relationships...
Preprint
Currently, deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. Often these results are achieved at the expense of huge computational costs and require an incredible number of episodes of interaction between the agent and the environment. There are two main approaches to improving the sample efficien...
Chapter
Full-text available
The task of building unmanned automated vehicle (UAV) control systems is developing in the direction of complication of options for interaction of UAV with the environment and approaching real life situations. A new concept of so called “smart city” was proposed and view of transportation shifted in direction to self-driving cars. In this work we d...
Chapter
Full-text available
Hierarchies are used in reinforcement learning to increase learning speed in sparse reward tasks. In this kind of tasks, the main problem is elapsed time, required for the initial policy to reach the goal during the first steps. Hierarchies can split a problem into a set of subproblems that could be reached in less time. In order to implement this...
Article
Full-text available
Nowadays our knowledge of the brain is actively getting wider. Hierarchical Temporal Memory is the technology that arose due to new discoveries in neurobiology, such as research on the structure of the neocortex. One of the most popular applications of this technology is image recognition and anomaly detection. Nevertheless, both in the neocortex a...
Book
This book constitutes the refereed proceedings of the 13th International Conference on Artificial General Intelligence, AGI 2020, held in St. Petersburg, Russia, in September 2020. The 30 full papers and 8 short papers presented in this book were carefully reviewed and selected from 60 submissions. The papers cover topics such as AGI architectures,...
Book
This book constitutes the proceedings of the 18th Russian Conference on Artificial Intelligence, RCAI 2020, held in Moscow, Russia, in October 2020. The 27 full papers and 8 short papers presented in this volume were carefully reviewed and selected from 140 submissions. The conference deals with a wide range of topics, including data mining and kno...
Article
Full-text available
In the last years, deep learning and reinforcement learning methods have significantly improved mobile robots in such fields as perception, navigation, and planning. But there are still gaps in applying these methods to real robots due to the low computational efficiency of recent neural network architectures and their poor adaptability to robotic...
Preprint
Full-text available
We present hierarchical Deep Q-Network with Forgetting (HDQF) that took first place in MineRL competition. HDQF works on imperfect demonstrations utilize hierarchical structure of expert trajectories extracting effective sequence of meta-actions and subgoals. We introduce structured task dependent replay buffer and forgetting technique that allow t...
Chapter
Hierarchical reinforcement learning (HRL) is another step towards the convergence of learning and planning methods. The resulting reusable abstract plans facilitate both the applicability of transfer learning and increasing of resilience in difficult environments with delayed rewards. However, on the way of the practical application of HRL, especia...
Chapter
Standard robotic control works perfectly in case of ordinary conditions, but in the case of a change in the conditions (e.g. damaging of one of the motors), the robot won’t achieve its task anymore. We need an algorithm that provide the robot with the ability of adaption to unforeseen situations. Reinforcement learning provide a framework correspon...
Article
Full-text available
Among a number of problems in the behavior planning of an unmanned vehicle the central one is movement in difficult areas. In particular, such areas are intersections at which direct interaction with other road agents takes place. In our work, we offer a new approach to train of the intelligent agent that simulates the behavior of an unmanned vehic...
Chapter
This paper presents a new algorithm for hierarchical case-based behavior planning in a coalition of agents – HierMAP. The considered algorithm, in contrast to the well-known planners HEART, PANDA, and others, is intended primarily for use in multi-agent tasks. For this, the possibility of dynamically distributing agent roles with different function...
Chapter
The article expounds the functional of a cognitive architecture Sign-Based World Model (SBWM) through the algorithm for the implementation of a particular case of reasoning. The SBWM architecture is a multigraph, called a semiotic network with special rules of activation spreading. In a semiotic network, there are four subgraphs that have specific...
Chapter
Full-text available
This paper describes the application of hierarchical temporal memory (HTM) to the task of anomaly detection in human motions. A number of model experiments with well-known motion dataset of Carnegie Mellon University have been carried out. An extended version of HTM is proposed, in which feedback on the movement of the sensor’s focus on the video f...
Chapter
Full-text available
The “curse of dimensionality” and environments with sparse delayed rewards are one of the main challenges in reinforcement learning (RL). To tackle these problems we can use hierarchical reinforcement learning (HRL) that provides abstraction both on actions and states of the environment. This work proposes an algorithm that combines hierarchical ap...
Book
This book constitutes the proceedings of the 17th Russian Conference on Artificial Intelligence, RCAI 2019, held in Ulyanovsk, Russia, in October 2019. The 23 full papers presented along with 7 short papers in this volume were carefully reviewed and selected from 130 submissions. The conference deals with a wide range of topics, including multi-ag...
Book
This volume contains selected tutorial and young scientist school papers of the 5th RAAI Summer School on Artificial Intelligence, held in July 2019 at Institute of Physics and Technology (MIPT) campus in Dolgoprudny, a suburb of Moscow, Russia. The 11 chapters in this volume present papers focusing on various important aspects of Multiagent system...
Article
Full-text available
According to the modern theories on the emergence of mental functions and the respective role of neurophysiological processes, the formation of mental functions is associated with the existence or communicative synthesis of specific information structures that contain three types of information of different origins: information from the external en...
Chapter
The paper discusses the interaction between methods of modeling reasoning and behavior planning in a sign-based world model for the task of synthesizing a hierarchical plan of relocation. Such interaction is represented by the formalism of intelligent rule-based dynamic systems in the form of alternate use of transition functions (planning) and clo...
Preprint
Full-text available
We introduce a new approach to hierarchy formation and task decomposition in hierarchical reinforcement learning. Our method is based on the Hierarchy Of Abstract Machines (HAM) framework because HAM approach is able to design efficient controllers that will realize specific behaviors in real robots. The key to our algorithm is the introduction of...
Article
Full-text available
In this paper we consider the problem of the role distribution during the construction of a general plan of actions in the coalition of cognitive agents. Cognitive agents realize the basic functions of an intelligent agent using models of human cognitive functions. As a psychological basis for constructing models of cognitive functions, the theory...
Conference Paper
Full-text available
At the moment reinforcement learning have advanced significantly with discovering new techniques and instruments for training. This paper is devoted to the application convolutional and recurrent neural networks in the task of planning with reinforcement learning problem. The aim of the work is to check whether the neural networks are fit for this...
Conference Paper
Full-text available
We present a model of Reinforcement Learning, which consists of modified neural-network architecture with spatio-temporal connections, known as Temporal Hebbian Self-Organizing Map (THSOM). A number of experiments were conducted to test the model on the maze solving problem. The algorithm demonstrates sustainable learning, building a near to optima...
Article
Full-text available
Single-shot grid-based path finding is an important problem with the applications in robotics, video games etc. Typically in AI community heuristic search methods (based on A* and its variations) are used to solve it. In this work we present the results of preliminary studies on how neural networks can be utilized to path planning on square grids,...
Article
Full-text available
We carried out an empirical study of aggression in relation to different personal traits. In this article we present results obtained for different forms of aggression, including results of machine-learning experiments with the AQJSM method. The method distinguishes several classes with different levels of aggression defined with a special form, as...
Conference Paper
Full-text available
The paper considers the task of the group’s collective plan intellectual agents. Robotic systems are considered as agents, possessing a manipulator and acting with objects in a determined external environment. The MultiMAP planning algorithm proposed in the article is hierarchical. It is iterative and based on the original sign representation of kn...
Article
Behavior planning is an important function of any complex technical facility intelligent control system. Presently, a symbol paradigm of artificial intelligence offers a variety of planning algorithms, including those that use precedent information, i.e. algorithms based on acquired knowledge. A symbol grounding problem within the exiting approache...
Conference Paper
Full-text available
Increasing amount of scientific publications makes it difficult to conduct a comprehensive review and objectively compare results of previous researches. In some areas of research it is also difficult to extract regularities without computer aid due to complexity of experimental setup and results. Cancer treatment using dendritic cell vaccines is s...
Article
Full-text available
Behavior planning is known to be one of the basic cognitive functions, which is essential for any cognitive architecture of any control system used in robotics. At the same time most of the widespread planning algorithms employed in those systems are developed using only approaches and models of Artificial Intelligence and don't take into account n...
Chapter
Hierarchical temporal memory is an online machine learning model that simulates some of the structural and algorithmic properties of neocortex. The new implementation of hierarchical temporal memory is proposed in the paper. The main distinction of the implementation is chain extraction module that complements the spatial and temporal polling modul...
Article
Extensive use of unmanned aerial vehicles (UAVs) in recent years has induced the rapid growth of research areas related to UAV production. Among these, the design of control systems capable of automating a wide range of UAV activities is one of the most actively explored and evolving. Currently, researchers and developers are interested in designin...
Article
Full-text available
A technique for retrieving causal connections of binary relationships from a set of fact bases is suggested. The fact bases are formed for the target properties of each class of objects. The class descriptions are formed by habituation on data from a loosely formalized object domain. The habituation is organized using a co-evolutional genetic algor...
Chapter
Full-text available
In this paper we outline the approach of solving special type of navigation tasks for robotic systems, when a coalition of robots (agents) acts in the 2D environment, which can be modified by the actions, and share the same goal location. The latter is originally unreachable for some members of the coalition, but the common task still can be accomp...
Article
Full-text available
Procedures for forming an element of an actor's world model (sign) introduced in the first part of this study are considered. The process of forming the pair image--sign significance taking into account the modern understanding of the human brain cortex operation is investigated. An algorithm for synthesizing a behavior plan is constructed, and a n...
Conference Paper
Full-text available
Dendritic cells (DCs) vaccination is a promising way to contend cancer metastases especially in the case of immunogenic tumors. Unfortunately, it is only rarely possible to achieve a satisfactory clinical outcome in the majority of patients treated with a particular DC vaccine. Apparently, DC vaccination can be successful with certain combinations...
Article
Full-text available
A shared use of the AQ learning and JSM method in extracting cause-effect relationships from psychological test data is considered. The AQ learning is used to descript a test group using rules. The group description is a basis for constructing the factbase of the JSM method. The first stage of the JSM method is used for hypothesizing on cause-effec...
Article
Full-text available
Functions that are referred in psychology as functions of consciousness are considered. These functions include reflection, consciousness of activity motivation, goal setting, synthesis of goal oriented behavior, and some others. The description is based on the concept of sign, which is widely used in psychology and, in particular, in the cultural-...

Network

Cited By