Article

PDP: Parallel dynamic programming

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive dynamic programming U+0028 ADP U+0029 is first presented instead of direct dynamic programming U+0028 DP U+0029, and the inherent relationship between ADP and deep reinforcement learning is developed. Next, analytics intelligence, as the necessary requirement, for the real reinforcement learning, is discussed. Finally, the principle of the parallel dynamic programming, which integrates dynamic programming and analytics intelligence, is presented as the future computational intelligence.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Based on the above model, parallel dynamic programing (PDP) [21] is utilized in RTS to estimate the parameters. PDP can model the strong couple nonlinear relations between different data sources. ...
... From 2013 to 2014, parallel transportation systems for Qingdao, PTS-Qingdao, was constructed to improve the traffic environment in Qingdao, China [21]. This project was the firsttime large-city-wide application of the parallel transportation system. ...
... The comparison of traffic flow (veh/hour) between before (Jul.[15][16][17][18][19][20][21] and after (Aug.[19][20][21][22][23][24][25]. ...
Article
IoT-driven intelligent transportation systems (ITS) have great potential and capacity to make transportation systems efficient, safe, smart, reliable, and sustainable. The IoT provides the access and driving forces of seamlessly integrating transportation systems from the physical world to the virtual counterparts in the cyber world. In this paper, we present visions and works on integrating the artificial intelligent transportation systems and the real intelligent transportation systems to create and enhance "intelligence" of IoT-enabled ITS. With the increasing ubiquitous and deep sensing capacity of IoT-enabled ITS, we can quickly create artificial transportation systems equivalent to physical transportation systems in computers, and thus have parallel intelligent transportation systems, i.e. the real intelligent transportation systems and artificial intelligent transportation systems. The evolution process of transportation system is studied in the view of the parallel world. We can use a large number of long-term iterative simulation to predict and analyze the expected results of operations. Thus, truly effective and smart ITS can be planned, designed, built, operated and used. The foundation of the parallel intelligent transportation systems is based on the ACP theory, which is composed of artificial societies, computational experiments, and parallel execution. We also present some case studies to demonstrate the effectiveness of parallel transportation systems.
... In order to calculate x and y, the properties of congruent triangles have to be applied. By focusing on the red congruent triangles, the distance between O [(N + 1)/2] and the left border has the same length with x and is shown in (4). The blue congruent triangles illustrate that the width from O [(N + 1)/2 + 1] to the right border is equal to the width from O [(N + 1)/2 + 2] to the right border, which is (x + D N ). ...
... O 1 , O 2 , O 3 , O4 and O 5 are the centres of the above circles that guide the vehicle to complete the turn around manoeuvre. S 1 and S 3 are the transition points when the vehicle reaches the right border during the Five-Point Turn; S 2 and S 4 are the ones at the left border. ...
Conference Paper
This paper proposes a new methodology of achieving human-like automated driving, and presents a decision making framework and the minimum thresholds of the occupied widths of multi-point turn for autonomous vehicles. The concept of human-like automated driving and the multi-point turn decision making framework for autonomous vehicles are proposed at first. Then, the geometric characteristics that are provided by the reference paths of turn around manoeuvres are analysed. The minimum operation widths from U Turn to FivePoint Turn are investigated respectively, and the methodology and results are then generalized to solve the multi-point turn (i.e. N-Point Turn) scenario. Finally, by using the derived results and characteristics analysed above, the functions that are able to evaluate the most feasible turn around manoeuvre within the current situation are provided.
... Through investigating and shaping the virtual systems, and interacting with the real physical system, the goal of management and control of the real physical system is achieved [14]. It is worthy of noting that, recently, parallel learning [15] and parallel dynamical programming [16] are derived from the parallel system theory. ...
... Integrating adaptive dynamic programming theory and parallel system theory, a new framework of dynamic programming, the parallel dynamic programming (PDP), is proposed in [16], [19]. PDP is able to incorporate various modern intelligent techniques into its framework, such as aforementioned parallel learning, deep learning, deep network, reinforcement learning, rule-based expert system, etc. ...
Article
Modern power systems are evolving into sociotechnical systems with massive complexity, whose real-time operation and dispatch go beyond human capability. Thus, the need for developing and applying new intelligent power system dispatch tools are of great practical significance. In this paper, we introduce the overall business model of power system dispatch, the top level design approach of an intelligent dispatch system, and the parallel intelligent technology with its dispatch applications. We expect that a new dispatch paradigm, namely the parallel dispatch, can be established by incorporating various intelligent technologies, especially the parallel intelligent technology, to enable secure operation of complex power grids, extend system operators U+02BC capabilities, suggest optimal dispatch strategies, and to provide decision-making recommendations according to power system operational goals.
... To improve the run time of our algorithm, we can employ parallelization techniques. Parallelization, whereby large problems are divided into smaller subproblems which can be solved individually and simultaneously across multiple processing units [32], integrates nicely with dynamic programming models [3] [44]. We parallelize two steps in our dynamic pricing algorithm. ...
Article
Full-text available
In many online markets we observe fierce competition and highly dynamic price adjustments. Competitors frequently adjust their prices in order to respond to changing market situations caused by competitors' price adjustments. In this paper, we examine price response strategies within an infinite horizon duopoly where the competitor's strategy has to be learned. The goal is to derive knowledge about the opponent's pricing strategy in a self-adaptive way and to balance exploration and exploitation. Our models are based on anticipated price reaction probabilities and efficient dynamic programming techniques. We show that our approach works when played against unknown strategies. Further, we analyze the mutual interplay of our self-learning strategies as well as their tendencies to form a cartel when motivated accordingly. Moreover, we propose two extensions of our model to integrate risk aversion. Finally, we demonstrate the effectiveness of parallelization techniques to speed up the computation of strategies as well as their simulation.
... Different from the conventioanl connected automated driving, parallel driving includes artificial drivers and artificial vehicles. The parameters and information of the real and artificial vehicles co-exist within the three parallel domains [12]. Specifically, the real driving exists in the physical world, which consists of the physical behavior of the real vehicle and real driver. ...
Preprint
Full-text available
Digital quadruplets aiming to improve road safety, traffic efficiency, and driving cooperation for future connected automated vehicles are proposed with the enlightenment of ACP based parallel driving. The ACP method denotes Artificial societies, Computational experiments, and Parallel execution modules for cyber-physical-social systems. Four agents are designed in the framework of digital quadruplets: descriptive vehicles, predictive vehicles, prescriptive vehicles, and real vehicles. The three virtual vehicles (descriptive, predictive, and prescriptive) dynamically interact with the real one in order to enhance the safety and performance of the real vehicle. The details of the three virtual vehicles in the digital quadruplets are described. Then, the interactions between the virtual and real vehicles are presented. The experimental results of the digital quadruplets demonstrate the effectiveness of the proposed framework.
... , [5] . , , [22] [23] , , , , [24] . . ; ; , ; , . ...
Article
Full-text available
本文将基于 ACP (Artificial societies, computational experiments, parallel execution) 的平行系统思想与机器人领 域相结合, 形成一种软硬件相结合的框架, 为无人机、无人车、无人船在复杂环境中实验、学习与实际工作提供便捷、安全的 平台, 即平行无人系统. 本文从平行机器人的基本概念出发, 提出平行无人系统的基本框架, 并介绍了各模块的基本功能与实 现方法, 探讨了其中的关键技术. 然后本文围绕无人机、无人车、无人船三个方面展望了无人平行系统在实际中的应用和所面 临的挑战, 提出了平行无人系统的未来发展方向.
... Artificial systems are constructed to represent the actual system, computational experiments are utilized to learn and evaluate various computational models, and parallel execution is implemented to improve the performance of the actual system. In parallel systems, the artificial systems and the actual system work together in a virtual-real interactive manner [57], [58]. The parallel systems theory and ACP approach have now evolved into a more generalized parallel intelligence theory [59]. ...
Article
Recently, generative adversarial networks (GANs) have become a research focus of artificial intelligence. Inspired by two-player zero-sum game, GANs comprise a generator and a discriminator, both trained under the adversarial learning idea. The goal of GANs is to estimate the potential distribution of real data samples and generate new samples from that distribution. Since their initiation, GANs have been widely studied due to their enormous prospect for applications, including image and vision computing, speech and language processing, etc. In this review paper, we summarize the state of the art of GANs and look into the future. Firstly, we survey GANs’ proposal background, theoretic and implementation models, and application fields. Then, we discuss GANs’ advantages and disadvantages, and their development trends. In particular, we investigate the relation between GANs and parallel intelligence, with the conclusion that GANs have a great potential in parallel systems research in terms of virtual-real interaction and integration. Clearly, GANs can provide substantial algorithmic support for parallel intelligence.
... In work [9][10], Wang et al. proposed the theoretical framework of Parallel Vision by extending the ACP approach [12] [13] [14] and elaborated the significance of virtual data. The ACP methodology establishes the foundation for parallel intelligence [15] [16] [17], which provides a new insight to tackle issues in complex systems [18]. Under the framework of Parallel Vision depicted in Fig. 1, it is obvious to see the great advantage of virtual world to produce diverse labeled datasets with different environmental conditions and texture change which are usually regarded as important image features for object detection [19]. ...
Article
Full-text available
In the area of computer vision, deep learning has produced a variety of state-of-the-art models that rely on massive labeled data. However, collecting and annotating images from the real world is too demanding in terms of labor and money investments, and is usually inflexible to build datasets with specific characteristics, such as small area of objects and high occlusion level. Under the framework of Parallel Vision, this paper presents a purposeful way to design artificial scenes and automatically generate virtual images with precise annotations. A virtual dataset named ParallelEye is built, which can be used for several computer vision tasks. Then, by training the DPM (Deformable parts model) and Faster R-CNN detectors, we prove that the performance of models can be significantly improved by combining ParallelEye with publicly available real-world datasets during the training phase. In addition, we investigate the potential of testing the trained models from a specific aspect using intentionally designed virtual datasets, in order to discover the flaws of trained models. From the experimental results, we conclude that our virtual dataset is viable to train and test the object detectors.
... W ITH the rapid development of automated driving [1], [2], parallel unmanned systems [3]− [5], control and computer science [6], [7], intelligent transportation systems (ITS) [8], [9], advanced driver assistance systems (ADAS), vehicle handling stability and active safety have increasingly been promoted since the past century. As a result, various ADAS and vehicle stability control systems have been developed, such as the anti-lock braking system (ABS) [10], [11], adaptive cruise system [12] and traction control system (TCS) [13], [14], which are based on vehicle longitudinal control; the electronic stability program (ESP) [15], [16] and active front steering (AFS) [17], which are concerned with lateral stability; and active suspension control (ASC) [18], [19] and active body control (ABC) [20], which emphasizes vehicle vertical control. ...
Article
Next-generation vehicle control and future autonomous driving require further advances in vehicle dynamic state estimation. This article provides a concise review, along with the perspectives, of the recent developments in the estimation of vehicle dynamic states. The definitions used in vehicle dynamic state estimation are first introduced, and alternative estimation structures are presented. Then, the sensor configuration schemes used to estimate vehicle velocity, sideslip angle, yaw rate and roll angle are presented. The vehicle models used for vehicle dynamic state estimation are further summarized, and representative estimation approaches are discussed. Future concerns and perspectives for vehicle dynamic state estimation are also discussed.
... The term of social transportation was firstly introduced in [10]. Traffic or transportation analytics with social signals using techniques like data mining, parallel intelligence, parallel learning, and natural language processing has recently attracted widespread research interest [11]- [15]. Sasaki et al. analyzed the feasibility on detecting transportation information with Twitter, and demonstrated the high potential of using Twitter to detect train status information [16]. ...
Article
Mining traffic-relevant information from social media data has become an emerging topic due to the real-time and ubiquitous features of social media. In this paper, we focus on a specific problem in social media mining which is to extract traffic relevant microblogs from Sina Weibo, a Chinese microblogging platform. It is transformed into a machine learning problem of short text classification. First, we apply the continuous bag-of-word model to learn word embedding representations based on a data set of three billion microblogs. Compared to the traditional one-hot vector representation of words, word embedding can capture semantic similarity between words and has been proved effective in natural language processing tasks. Next, we propose using convolutional neural networks (CNNs), long short-term memory (LSTM) models and their combination LSTM-CNN to extract traffic relevant microblogs with the learned word embeddings as inputs. We compare the proposed methods with competitive approaches, including the support vector machine (SVM) model based on a bag of n-gram features, the SVM model based on word vector features, and the multi-layer perceptron model based on word vector features. Experiments show the effectiveness of the proposed deep learning approaches.
... Generally speaking, the results of three kinds of clustering are not ideal, and the clustering accuracy is relatively low. This is caused by the data characteristics of Glass dataset, because there is dimensionality disaster in high-dimensional data [21]. (Because in highdimensional space, all data are very sparse, and the volume index increases, the distance difference between objects becomes more insignificant, which leads to great deviation in similarity measurement and distance calculation.).This has a great impact on DBSCAN algorithm and its improved algorithm based on density clustering defined by spatial distance of points. ...
Conference Paper
DBSCAN algorithm is a density-based clustering algorithm, which is widely used in various fields and can identify clusters of arbitrary shape. Aiming at the existing problems that the selection of neighborhood radius parameter Eps depends on manual experiment, judgment intervention or calculation estimation, the accuracy of clustering results is sensitive to Eps value, using uniform global parameters in non-uniform density data sets makes the clustering effect unsatisfactory. An adaptive Eps parameter estimation method based on Gauss kernel density is proposed. The kernel probability density of the point is calculated by the Gauss kernel density estimation, and then the positive correlation between the estimated kernel probability density and Eps value is found. It adaptively matches the appropriate Eps value for the current point, rather than setting a globally unified Eps parameter. Theoretical analysis and experimental results show that the method has good clustering effect on non-uniform data sets, and the algorithm is also effective for uniform data sets.
... To speedup the computation, some famous lower bounds based techniques [37,42,72] have been proposed. There are also attempts on parallelization of DP [80] or GPU acceleration [74]. ...
Article
The Closest Pair problem aims to identify the closest pair (using some similarity measure, e.g., Euclidean distance, Dynamic Time Warping distance, etc.) of points in a metric space. This is one of the fundamental problems that has a wide range of applications in the data mining area, since most of the data can be represented in a vector form residing in a high dimensional space, and we would like to identify the relationship among those data points. Typical applications include but not limited to, social data analysis, user pattern identification, motif mining in biological data, data clustering, etc. This is a very classical problem and has been studied very well in the past decades. In this thesis, we study the Closest Pair problem and its variants, and also bring the machine learning perspective to solve some closely related problems. In particular, we have proposed two approximate algorithms to efficiently address the Closest Pair of Points (CPP) problem, and one deterministic approach to solve the Closest Pair of Subsequences (CPS) problem, using Euclidean distance measure. In addition, to identify the closest subsequences in the time series data, we have proposed a learnable feature extractor embedded in an artificial neural network, to learn patterns in the scope of the Dynamic Time Warping metric. In the end, to speed up the inference speed of the proposed algorithm, we have also proposed a neural network pruning technique to obtain a smaller network with similar capacity. All the proposed methods are shown to have achieved the state-of-the-art performance in various standard benchmark datasets.
... However, if the driver intends to take over the control, the vehicle will switch to a lower level of automation, where driver's physical behaviors have to reengage within the driving tasks. Thus different HD-RD-ADAV units could suggest very different automation-driving patterns (e.g. from Level 0 to Level 6) with possible frequent shifting among different automation levels during real-world driving [41], [42]. ...
Article
The emerging development of connected and automated vehicles imposes a significant challenge on current vehicle control and transportation systems. This paper proposes a novel unified approach, Parallel Driving, a cloud-based cyberphysical-social systems U+0028 CPSS U+0029 framework aiming at synergizing connected automated driving. This study first introduces the CPSS and ACP-based intelligent machine systems. Then the parallel driving is proposed in the cyber-physical-social space, considering interactions among vehicles, human drivers, and information. Within the framework, parallel testing, parallel learning and parallel reinforcement learning are developed and concisely reviewed. Development on intelligent horizon U+0028 iHorizon U+0028 and its applications are also presented towards parallel horizon. The proposed parallel driving offers an ample solution for achieving a smooth, safe and efficient cooperation among connected automated vehicles with different levels of automation in future road transportation systems.
... Different from the ordinary automated driving, parallel driving includes the artificial driver and artificial vehicle. The parameters and information of the real and artificial vehicles co-exist into the three parallel worlds [12]. Specifically, the real driving exists in the physical world, which consists of the physical behavior of the real vehicle and real driver. ...
Preprint
Full-text available
Parallel driving is a novel framework to synthesize vehicle intelligence and transport automation. This article aims to define digital quadruplets in parallel driving. In the cyber-physical-social systems (CPSS), based on the ACP method, the names of the digital quadruplets are first given, which are descriptive, predictive, prescriptive and real vehicles. The objectives of the three virtual digital vehicles are interacting, guiding, simulating and improving with the real vehicles. Then, the three virtual components of the digital quadruplets are introduced in detail and their applications are also illustrated. Finally, the real vehicles in the parallel driving system and the research process of the digital quadruplets are depicted. The presented digital quadruplets in parallel driving are expected to make the future connected automated driving safety, efficiently and synergistically.
... desired trajectory and initial position) is in iteration-varying mode during the entire whole control process. [12][13][14] For iterationvarying desired trajectory, Saab et al. 15 proposed the D, PD, and PID-type ILC learning algorithms, and a bounded tracking error was guaranteed as a result of the presence of nonparametric system uncertainties; Chen and Moore 16 described the ideas on how to harness the nonrepetitiveness in a known or unknown repeating pattern; Zhang et al. 17 presented the observer-based ILC to track the nonidentical trajectory; and Jin 14 put forward hybrid adaptive ILC for nonuniform trajectory tracking. Xu et al. described adaptive ILC algorithms, 18 recursive direct learning control method, and internal model principle-based ILC 19 successively for nonrepetitive trajectory tracking. ...
Article
Full-text available
A novel iterative learning control (ILC) for perspective dynamic system (PDS) is designed and illustrated in detail in this article to overcome the uncertainties in path tracking of mobile service robots. PDS, which transmits the motion information of mobile service robots to image planes (such as a camera), provides a good control theoretical framework to estimate the robot motion problem. The proposed ILC algorithm is applied in accordance with the observed motion information to increase the robustness of the system in path tracking. The convergence of the presented learning algorithm is derived as the number of iterations tends to infinity under a specified condition. Simulation results show that the designed framework performs efficiently and satisfies the requirements of trajectory precision for path tracking of mobile service robots.
... Ancak bu ilerlemenin içeriği tüm mümkün durumların hızlı bir şekilde analiz edilmesine dayanması veya akıl yürütme temelli olması bakımından farklı kıvrımlara sahiptir. Google'ın DeepMind 1 projesinin bir parçası olarak geliştirilen AlphaGo 2 algoritmasının Go oyunundaki başarısının kaba kuvvet (brute force) gibi tüm mümkün durumların analiz edilmesine dayalı algoritmalarla sağlanamayacağı belirtilmiştir(Wang, 2017). Diğer bir taraftan, birçok yapay öğrenme algoritması, öncesinde toplanmış büyük miktarda öğrenme verisine ihtiyaç duyabilmektedir. ...
... The HMMs are used to solve three main problems:  Evaluation: Given the sequence of observations and an HMM , how to assess the probability of observation ( | )? For this problem, a forward-backward dynamic programming procedure [19] is used to calculate the probability of the observation sequence efficiently.  Finding the most likely path: Given the sequence of observations and an HMM , how to find a sequence of states that maximizes the probability of observation of the sequence? ...
Article
Full-text available
Hidden Markov models (HMMs) are one of machine learning algorithms which have been widely used and demonstrated their efficiency in many conventional applications. This paper proposes a modified posterior decoding algorithm to solve hidden Markov models decoding problem based on MapReduce paradigm and spark's resilient distributed dataset (RDDs) concept, for large-scale data processing. The objective of this work is to improve the performances of HMM to deal with big data challenges. The proposed algorithm shows a great improvement in reducing time complexity and provides good results in terms of running time, speedup, and parallelization efficiency for a large amount of data, i.e., large states number and large sequences number.
Article
生成式对抗网络 GAN (Generative adversarial networks) 目前已经成为人工智能学界一个热门的研究方向. GAN 的基本思想源自博弈论的二人零和博弈, 由一个生成器和一个判别器构成, 通过对抗学习的方式来训练. 目的是估测数据样本 的潜在分布并生成新的数据样本. 在图像和视觉计算、语音和语言处理、信息安全、棋类比赛等领域, GAN 正在被广泛研究, 具有巨大的应用前景. 本文概括了 GAN 的研究进展, 并进行展望. 在总结了 GAN 的背景、理论与实现模型、应用领域、优缺 点及发展趋势之后, 本文还讨论了 GAN 与平行智能的关系, 认为 GAN 可以深化平行系统的虚实互动、交互一体的理念, 特 别是计算实验的思想, 为 ACP (Artificial societies, computational experiments, and parallel execution) 理论提供了十分具 体和丰富的算法支持.
Conference Paper
平行钢铁融合了工业 5.0 中的社会物理信息系统 CPSS 的概念,综合物理系统、信息系统和社会系 统的复杂性,通过 ACP 方法,构建平行演化、闭环反馈、协同优化的钢厂智能化体系。该系统由三部分组 成:软件定义的钢厂确定其描述智能,计算实验优化建立其预测智能,虚实互动的平行执行构建其引导智能。 通过描述、预测和引导智能的全方位综合利用,实现人、机、料、法、环各要素的数字化与透明化的钢铁 企业智能化体系。
Article
Full-text available
本文提出了一种新的机器学习理论框架. 该框架结合了现有多种机器学习理论框架的优点, 并针对如何使用软件定 义的人工系统从大数据提取有效数据, 如何结合预测学习和集成学习, 以及如何利用默顿定律进行指示学习等目前机器学习 领域面临的重要问题进行了特别设计.
Article
Knowledge automation is the organic integration of intelligentization, human-machine, automation etc. From the perspective of social signals and Merton systems, we address issues related to the significance and development of knowledge automation. Key topics discussed are technical foundation for smart algorithms and knowledge robots, softwaredefined systems and processes from the viewpoint of systems engineering, and the important role played by knowledge automation in parallel systems for the control and management of complex systems.
Article
本文提出了平行区块链的概念框架、基础理论和研究方法体系, 并探讨了平行区块链的内涵. 平行区块链技术是平行 智能理论方法与区块链技术的有机结合, 致力于通过实际区块链系统与人工区块链系统的平行互动与协同演化, 为目前的区 块链技术增加计算实验与平行决策功能, 实现描述、预测、引导相结合的区块链系统管理与决策. 平行区块链这一新型研究范 式可望为下一步区块链研究和未来产业应用提供有益的启发与借鉴.
Article
Full-text available
In order to solve the problem of how to efficiently control a large-scale swarm Unmanned Aerial Vehicle (UAV) system, which performs complex tasks with limited manpower in a non-ideal environment, this paper proposes a parallel UAV swarm control method. The key technology of parallel control is to establish a one-to-one artificial UAV system corresponding to the aerial swarm UAV on the ground. This paper focuses on the computational experiments algorithm for artificial UAV system establishment, including data processing, model identification, model verification and state prediction. Furthermore, this paper performs a comprehensive flight mission with four common modes (climbing, level flighting, turning and descending) for verification. The results of the identification experiment present a good consistency between the outputs of the refined dynamics model and the real flight data. The prediction experiment results show that the prediction method in this paper can basically guarantee that the prediction states error is kept within 10% about 16 s.
Conference Paper
A Controlled Ecological Life Support System (CELSS) is necessary for the long-term manned space exploration. A primary goal of plant research for CELSS is to generate the largest amount of edible biomass possible for the least amount of electrical energy used. A key factor for implementing crop production systems will be the development of energy-efficient lighting approaches. Artificial lighting is essential in plant production in a CELSS, and energy reduction is one of the most important problems to be solved. The objective of this study is to provide a scheme of light intensity control in order to determine the most suitable light intensity in CELSS for plant cultivation, which can minimum the energy consumption and meet the demand of plant production and oxygen for the crew based on a knowledge-and data-driven modeling approach. The results indicate that the optimization method can increase 11.5% light use efficiency compared to the previous experimental set. Moreover, the biomass production increases under the optimized light intensity. This approach provides a computational basis for life-time optimization of cabin design and experimental setup of CELSS.
Article
为了解决复杂环境中痛风诊疗的精准决策难题,突破不同医生业务水平对于痛风诊疗的局限,提高痛风 诊断的准确率和治疗的有效性,文中提出基于ACP理论的平行痛风诊疗系统框架,称为“平行高特(Gout)冶.平行 高特通过构建人工痛风诊疗系统以模拟和表示实际痛风诊疗系统,运用计算实验进行各种痛风诊疗模型的训练与 评估,借助平行执行对实际痛风诊疗系统进行管理决策与实时优化,实现痛风诊疗过程的自动化与智能化.该平行 的诊疗过程可以帮助医生减少误诊误治,提高效率,提升水平,同时也能帮助患者做好慢病管理,远离疾病.考虑到 痛风病在当前社会的严重程度,平行高特在痛风诊疗中的应用具有重要的实际意义,是传统医疗模式走向智慧化、 平行化的有效途径和自然选择,有利于推进健康中国建设,实现更高水平的全民健康.
Chapter
During peak time period, the entire subway is overloaded for a large amount of passengers. To solve this problem, there are many strategies to reduce or limit the number of passengers entering the subway stations. This paper proposes the parallel passenger flow management system (PPFMS) based on ACP theory to manage the inbound passenger flow. The ACP theory, including artificial societies, computational experiments, and parallel execution, is playing an essential role in modeling and control of complex systems. For a parallel passenger flow management system, realistic artificial scenes are used to model and represent complex real scenes; computational experiments are utilized to train and evaluate a variety of estimating models; and parallel execution is conducted to optimize the passenger flow system and achieve perception and understanding of complex environments.
Article
To give a high-level summary to current approaches for implementing artificial intelligence (AI), we explain the key commonalities and major differences between Turing's approach and Wiener's approach in this perspective. Especially, the problems , successful achievements, limitations, and future research directions of existing approaches that follow Weiner's ideas are addressed, respectively, aiming to provide readers with a good start point and a roadmap. Some other related topics , for example, the role of human experts in developing AI, are also discussed to seek potential solutions for some existing difficulties.
Article
In this paper, we present dynamics analysis and vibration control for a nanoscale beam. The nanobeam is modelled based on the modified couple stress theory which incorporates small length scale effects. The governing equation and boundary conditions of the nanobeam are obtained by Hamilton's principle. We design model-based control and adaptive control with output constraints for the control part. Employing a time-varying function to the constraint part, the displacement of the nanobeam can be well limited within a trajectory. By choosing designed parameters appropriately, signals in the nanobeam system converge to a small neighbourhood of zero and the vibration of the nanobeam system is suppressed with prescribed performance. Correspondingly, the simulation results of the proposed control are verified suitable.
Article
The concept, basic framework, methodology and applications of parallel city were presented in this paper. The parallel city is an application of ACP-based parallel intelligence in city fields. The real city is running with its equivalent, and the artificial city is in virtual space, in a parallel and interactive way. The artificial city has the descriptive, predictive, and prescriptive functions on the real city. There is a closed-loop workflow between the real city and the artificial city, which iteratively optimizes the urban systems, leading to a new paradigm of intelligent urban management.
Article
Nearly all of the work in neural machine translation (NMT) is limited to a quite restricted vocabulary, crudely treating all other words the same as an < unk> symbol. For the translation of language with abundant morphology, unknown (UNK) words also come from the misunderstanding of the translation model to the morphological changes. In this study, we explore two ways to alleviate the UNK problem in NMT: a new generative adversarial network (added value constraints and semantic enhancement) and a preprocessing technique that mixes morphological noise. The training process is like a win-win game in which the players are three adversarial sub models (generator, filter, and discriminator). In this game, the filter is to emphasize the discriminator’s attention to the negative generations that contain noise and improve the training efficiency. Finally, the discriminator cannot easily discriminate the negative samples generated by the generator with filter and human translations. The experimental results show that the proposed method significantly improves over several strong baseline models across various language pairs and the newly emerged Mongolian-Chinese task is state-of-the-art.
Article
Digital quadruplets aiming to improve road safety, traffic efficiency, and driving cooperation for future connected automated vehicles are proposed with the enlightenment of ACP-based parallel driving. The ACP method denotes artificial societies, computational experiments, and parallel execution modules for cyberphysical-social systems. Four agents are designed in the framework of digital quadruplets: descriptive vehicles, predictive vehicles, prescriptive vehicles, and real vehicles. The three virtual vehicles (descriptive, predictive, and prescriptive) dynamically interact with the real one to enhance the safety and performance of the real vehicle. The details of the three virtual vehicles in the digital quadruplets are described. Then, the interactions between the virtual and real vehicles are presented. The experimental results of the digital quadruplets demonstrate the effectiveness of the proposed framework.
Article
Full-text available
Robots are coming to help us in different harsh environments such as deep sea or coal mine. Waste landfill is the place like these with casualty risk, gas poisoning, and explosion hazards. It is reasonable to use robots to fulfill tasks like burying operation, transportation, and inspection. In these assignments, one important issue is to obtain appropriate paths for robots especially in some complex applications. In this context, a novel hybrid swarm intelligence algorithm, ant colony optimization enhanced by chaos-based particle swarm optimization, is proposed in this article to deal with the path planning problem for landfill inspection robots in Asahikawa, Japan. In chaos-based particle swarm optimization, Chebyshev chaotic sequence is used to generate the random factors for particle swarm optimization updating formula so as to effectively adjust particle swarm optimization parameters. This improved model is applied to optimize and determine the hyper parameters for ant colony optimization. In addition, an improved pheromone updating strategy which combines the global asynchronous feature and “Elitist Strategy” is employed in ant colony optimization in order to use global information more appropriately. Therefore, the iteration number of ant colony optimization invoked by chaos-based particle swarm optimization can be reduced reasonably so as to decrease the search time effectively. Comparative simulation experiments show that the chaos-based particle swarm optimization-ant colony optimization has a rapid search speed and can obtain solutions with similar qualities.
Article
In this article, the decentralized control issues of nonlinear large‐scale systems are investigated via critic‐only adaptive dynamic programming learning methods. First of all, we build a specific relation between the decentralized control for interconnected subsystems and the optimal control for isolated subsystems. Then, we utilize neural networks (NNs) to implement a critic‐only online adaptive learning method. The stability analysis result provides the region of the online learning rate, and indicates the NN can approximate the optimal solution. Subsequently, it is proven that the small NN approximation errors cannot affect the system stability. Finally, the simulation examples verify the feasibility.
Article
This paper presents an approach of generation of 3D parallel tiled code implementing an Optimal Binary Search Tree (OBST) algorithm. We demonstrate that the features of data dependences available in the code implementing Knuth's OBST algorithm allow us to generate only 2D tiled code. We suggest a way of transformation of Knuth's OBST algorithm to a modified one exposing dependences allowing us to generate 3D parallel tiled code. The polyhedral model and the corresponding tools supporting that model are used by us to generate 3D target tiled code on the basis of the modified Knuth's OBST algorithm. Program parallelism is based on the wavefront technique and it is presented in the OpenMP C/C++ standard. Experiments carried out by us with obtained 3D tiled code demonstrate that this code considerably outperforms 2D tiled code generated on the basis of serial code implementing classic Knuth's OBST algorithm. Increased code performance is achieved due to much larger locality of 3D tiled code in comparison with that of 2D one.
Article
To give a high-level summary to current approaches for implementing artificial intelligence (AI), we explain the key commonalities and major differences between Turing's approach and Wiener's approach in this perspective. Especially, the problems, successful achievements, limitations, and future research directions of existing approaches that follow Weiner's ideas are addressed, respectively, aiming to provide readers with a good start point and a roadmap. Some other related topics, for example, the role of human experts in developing AI, are also discussed to seek potential solutions for some existing difficulties.
Article
Research on intelligent vehicles has been popular in the past decade. To fill the gap between automatic approaches and man-machine control systems, it is indispensable to integrate visual human-computer interactions (VHCIs) into intelligent vehicles systems. In this article, we review existing studies on VHCI in intelligent vehicles from three aspects: 1) visual intelligence; 2) decision making; and 3) macro deployment. We discuss how VHCI evolves in intelligent vehicles and how it enhances the capability of intelligent vehicles. We present several simulated scenarios and cases for future intelligent transportation system.
Article
Boundary control strategy is developed to analyze the vibration problem of the offshore ocean thermal energy conversion (OTEC) system as well as to constrain the bottom tension and top motion. To provide an accurate dynamic behavior for the OTEC system, this distributed parameter system is modeled and formulated with a governing equation and boundary conditions (PDE-ODEs model). Two robust adaptive boundary controllers are designed and disposed at the endpoints of the system, and the stability of the controlled system under unknown disturbances is achieved. After selecting the relevant parameters appropriately, the offset of the offshore OTEC system can be suppressed to equilibrium position. Finally, the effectiveness of the proposed control is illustrated by simulation.
Article
In this paper, we present the vibration control design for a string with the boundary time‐varying output constraint. The dynamics of the string is a distributed parameter system described by a partial differential equation and two ordinary differential equations. A barrier Lyapunov function with a logarithmic function is adopted to prevent the time‐varying constraint violations. Adaptive control is designed to handle the system parametric uncertainties. Stability analysis and the solvability of the inequality equations are provided. Numerical simulations are provided to illustrate the effectiveness of the proposed control design.
Article
In this paper, an adaptive neural bounded control scheme is proposed for an ${n}$ -link rigid robotic manipulator with unknown dynamics. With the combination of the neural approximation and backstepping technique, an adaptive neural network control policy is developed to guarantee the tracking performance of the robot. Different from the existing results, the bounds of the designed controller are known a priori , and they are determined by controller gains, making them applicable within actuator limitations . Furthermore, the designed controller is also able to compensate the effect of unknown robotic dynamics. Via the Lyapunov stability theory, it can be proved that all the signals are uniformly ultimately bounded. Simulations are carried out to verify the effectiveness of the proposed scheme.
Article
Full-text available
In this paper, we propose a new framework of machine learning theory, parallel learning, which incorporates and inherits many elements from various existing machine learning theories. Special designs are also presented to deal with some important problems in the machine learning research field, e.g., useful data retrieval from big data using software defined artificial systems, combination of predictive learning and ensemble learning, application of Merton's law to prescriptive learning.
Article
Full-text available
This paper presents a concurrent learning-based actor-critic-identifier architecture to obtain an approximate feedback-Nash equilibrium solution to an infinite horizon N-player nonzero-sum differential game. The solution is obtained online for a nonlinear control-affine system with uncertain linearly parameterized drift dynamics. It is shown that under a condition milder than persistence of excitation (PE), uniformly ultimately bounded convergence of the developed control policies to the feedback-Nash equilibrium policies can be established. Simulation results are presented to demonstrate the performance of the developed technique without an added excitation signal.
Article
Full-text available
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.
Article
Full-text available
Stochastic dynamic programming (SDP) is applied to the optimal control of a hybrid electric vehicle in a concerted attempt to deploy and evaluate such a controller in the real world. Practical considerations for robust implementation of the SDP algorithm are addressed, such as the choice of discount factor used and how charge sustaining characteristics of the SDP controller can be examined and adjusted. A novel cost function is used incorporating the square of battery charge (C-rate) as an indicator of electrical powertrain stress, with the aim of lessening the affliction of real-world concerns such as battery health and motor temperature, while allowing short spells of operation toward the system peak power limits where advantageous. This paper presents the simulation and chassis dynamometer results over the LA92 drive cycle, as well as the results of testing on open roads. The hybrid system is operated at several levels of aggressivity, allowing the tradeoff between fuel savings and electrical powertrain stress to be evaluated. In dynamometer testing, this approach yielded a 13% reduction in electrical powertrain stress without sacrificing any fuel savings, compared with a controller that does not consider aggressivity in its optimization.
Article
Full-text available
This paper proposes a three-layer optimization and an intelligent control algorithm for a microgrid with multiple renewable resources. A dual heuristic dynamic programming-based system control layer is used to ensure the dynamic performance and voltage dynamics of the microgrid as the system operation conditions change. A local layer maximizes the capability of the photovoltaic (PV) wind power generators and battery systems, and a model predictive control-based device layer increases the tracking accuracy of the converter control. The proposed control scheme, system wide adaptive predictive supervisory control (SWAPSC) smooths the output of PV and wind generators under intermittencies, maintains bus voltage by providing dynamic reactive power support to the grid, and reduces the total system losses while minimizing degradation of battery life span. Performance comparisons are made with and without SWAPSC for an IEEE 13 node test system with a PV farm, a wind farm, and two battery-based energy storage systems.
Book
Full-text available
From the Publisher: This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.
Article
Full-text available
Unlike the many soft computing applications where it suffices to achieve a "good approximation most of the time," a control system must be stable all of the time. As such, if one desires to learn a control law in real-time, a fusion of soft computing techniques to learn the appropriate control law with hard computing techniques to maintain the stability constraint and guarantee convergence is required. The objective of the paper is to describe an adaptive dynamic programming algorithm (ADPA) which fuses soft computing techniques to learn the optimal cost (or return) functional for a stabilizable nonlinear system with unknown dynamics and hard computing techniques to verify the stability and convergence of the algorithm. Specifically, the algorithm is initialized with a (stabilizing) cost functional and the system is run with the corresponding control law (defined by the Hamilton-Jacobi-Bellman equation), with the resultant state trajectories used to update the cost functional in a soft computing mode. Hard computing techniques are then used to show that this process is globally convergent with stepwise stability to the optimal cost functional/control law pair for an (unknown) input affine system with an input quadratic performance measure (modulo the appropriate technical conditions). Three specific implementations of the ADPA are developed for 1) the linear case, 2) for the nonlinear case using a locally quadratic approximation to the cost functional, and 3) the nonlinear case using a radial basis function approximation of the cost functional; illustrated by applications to flight control.
Article
This article proposes a design of adaptive fuzzy logic based control systems (FLCSs) with neural networks. A detailed discussion of effects of different reasoning methods on fuzzy controls is given and used to illustrate the need for an adaptive implementation of fuzzy controls. The procedure of decision-making of a FLCS leads to a neuro-fuzzy network consisting of three types of subnets for pattern recognition, fuzzy reasoning, and control synthesis, respectively. The unique knowledge structure embedded in this structured network enables it to carry out adaptive changes of fuzzy reasoning methods and membership functions for both input signal patterns and output control actions, and then recover these changes individually and completely later from its sub nets. Gradient methods for optimization have been used to derive off-line training rules and on-line learning algorithms for the structured neuro-fuzzy network. Issues related to rule modification and generation for an FLCS are addressed based on its network implementation.
Article
In this paper, the output feedback based finitehorizon near optimal regulation of nonlinear affine discretetime systems with unknown system dynamics is considered by using neural networks (NNs) to approximate Hamilton-Jacobi-Bellman (HJB) equation solution. First, a NN-based Luenberger observer is proposed to reconstruct both the system states and the control coefficient matrix. Next, reinforcement learning methodology with actor-critic structure is utilized to approximate the time-varying solution, referred to as the value function, of the HJB equation by using a NN. To properly satisfy the terminal constraint, a new error term is defined and incorporated in the NN update law so that the terminal constraint error is also minimized over time. The NN with constant weights and timedependent activation function is employed to approximate the time-varying value function which is subsequently utilized to generate the finite-horizon near optimal control policy due to NN reconstruction errors. The proposed scheme functions in a forward-in-time manner without offline training phase. Lyapunov analysis is used to investigate the stability of the overall closedloop system. Simulation results are given to show the effectiveness and feasibility of the proposed method.
Article
Abstract—This paper concerns a novel optimal self-learning battery sequential control scheme for smart home energy systems. The main idea is to use the adaptive dynamic programming (ADP) technique to obtain the optimal battery sequential control iteratively. First, the battery energy management system model is established, where the power efficiency of the battery is considered. Next, considering the power constraints of the battery, a new non-quadratic form performance index function is established, which guarantees that the value of the iterative control law cannot exceed the maximum charging/discharging power of the battery to extend the service life of the battery. Then, the convergence properties of the iterative ADP algorithm are analyzed, which guarantees that the iterative value function and the iterative control law both reach the optimums. Finally, simulation and comparison results are given to illustrate the performance of the presented method.
Article
In this paper, convergence properties are established for the newly developed discrete-time local value iteration adaptive dynamic programming (ADP) algorithm. The present local iterative ADP algorithm permits an arbitrary positive semidefinite function to initialize the algorithm. Employing a state-dependent learning rate function, for the first time, the iterative value function and iterative control law can be updated in a subset of the state space instead of the whole state space, which effectively relaxes the computational burden. A new analysis method for the convergence property is developed to prove that the iterative value functions will converge to the optimum under some mild constraints. Monotonicity of the local value iteration ADP algorithm is presented, which shows that under some special conditions of the initial value function and the learning rate function, the iterative value function can monotonically converge to the optimum. Finally, three simulation examples and comparisons are given to illustrate the performance of the developed algorithm.
Article
An investigation on the impact and significance of the AlphaGo vs. Lee Sedol Go match is conducted, and concludes with a conjecture of the AlphaGo Thesis and its extension in accordance with the Church-Turing Thesis in the history of computing. It is postulated that the architecture and method utilized by the AlphaGo program provide an engineering solution for tackling issues in complexity and intelligence. Specifically, the AlphaGo Thesis implies that any effective procedure for hard decision problems such as NP-hard can be implemented with AlphaGo-like approach. Deep rule-based networks are proposed in attempt to establish an understandable structure for deep neural networks in deep learning. The success of AlphaGo and corresponding thesis ensure the technical soundness of the parallel intelligence approach for intelligent control and management of complex systems and knowledge automation.
Article
The future of control in cyberspace of parallel worlds is discussed. It argues for the coming age of Control 5.0, the control technology for the new IT capable of dealing with artificial worlds with VR, AR, AI and robotics. The discipline of automation needs a new interpretation of its core knowledge and skill set of modeling, analysis, and control for cyber-social-physical systems, and a paradigm shift from Newtonian Systems with Newton's Laws or Big Laws with Small Data to Mertonian Systems with Merton's Laws or Small Laws with Big Data.
Article
In this paper, a novel local value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. The focuses of this paper are to study admissibility properties and the termination criteria of discrete-time local value iteration ADP algorithms. In the discrete-time local value iteration ADP algorithm, the iterative value functions and the iterative control laws are both updated in a given subset of the state space in each iteration, instead of the whole state space. For the first time, admissibility properties of iterative control laws are analyzed for the local value iteration ADP algorithm. New termination criteria are established, which terminate the iterative local ADP algorithm with an admissible approximate optimal control law. Finally, simulation results are given to illustrate the performance of the developed algorithm.
Article
In this paper, a discrete-time optimal control scheme is developed via a novel local policy iteration adaptive dynamic programming algorithm. In the discrete-time local policy iteration algorithm, the iterative value function and iterative control law can be updated in a subset of the state space, where the computational burden is relaxed compared with the traditional policy iteration algorithm. Convergence properties of the local policy iteration algorithm are presented to show that the iterative value function is monotonically nonincreasing and converges to the optimum under some mild conditions. The admissibility of the iterative control law is proven, which shows that the control system can be stabilized under any of the iterative control laws, even if the iterative control law is updated in a subset of the state space. Finally, two simulation examples are given to illustrate the performance of the developed method.
Article
Deep learning has shown great potential and advantage in feature extraction and model fitting. It is significant to use deep learning for control problems involving high dimension data. Currently, there have been some investigations focusing on deep learning in control. This paper is a review of related work including control object recognition, state feature extraction, system parameter identification and control strategy calculation. Besides, this paper describes the approaches and ideas of deep control, adaptive dynamic programming and parallel control related to deep learning in control. Also, this paper summarizes the main functions and existing problems of deep learning in control, presents some prospects of future work.
Article
In this paper, a novel discrete-time deterministic Q-learning algorithm is developed. In each iteration of the developed Q-learning algorithm, the iterative Q function is updated for all the state and control spaces, instead of updating for a single state and a single control in traditional Q-learning algorithm. A new convergence criterion is established to guarantee that the iterative Q function converges to the optimum, where the convergence criterion of the learning rates for traditional Q-learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative Q function are analyzed to obtain the convergence criterion, instead of analyzing the iterative $ Q$ function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic Q-learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative Q function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic Q-learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.
Article
This note studies the adaptive optimal output regulation problem for continuous-time linear systems, which aims to achieve asymptotic tracking and disturbance rejection by minimizing some predefined costs. Reinforcement learning and adaptive dynamic programming techniques are employed to compute an approximated optimal controller using input/partial-state data despite unknown system dynamics and unmeasurable disturbance. Rigorous stability analysis shows that the proposed controller exponentially stabilizes the closed-loop system and the output of the plant asymptotically tracks the given reference signal. Simulation results on a LCL coupled inverter-based distributed generation system demonstrate the effectiveness of the proposed approach.
Article
We study stochastic motion planning problems which involve a controlled process, with possibly discontinuous sample paths, visiting certain subsets of the state-space while avoiding others in a sequential fashion. For this purpose, we first introduce two basic notions of motion planning, and then establish a connection to a class of stochastic optimal control problems concerned with sequential stopping times. A weak dynamic programming principle (DPP) is then proposed, which characterizes the set of initial states that admit a control enabling the process to execute the desired maneuver with probability no less than some pre-specified value. The proposed DPP comprises auxiliary value functions defined in terms of discontinuous payoff functions. A concrete instance of the use of this novel DPP in the case of diffusion processes is also presented. In this case, we establish that the aforementioned set of initial states can be characterized as the level set of a discontinuous viscosity solution to a sequence of partial differential equations, for which the first one has a known boundary condition, while the boundary conditions of the subsequent ones are determined by the solutions to the preceding steps. Finally, the generality and flexibility of the theoretical results are illustrated on an example involving biological switches.
Article
In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize the algorithm. A novel convergence analysis is developed to guarantee that the iterative value function converges to the optimal performance index function. Initialized by different initial functions, it is proven that the iterative value function will be monotonically nonincreasing, monotonically nondecreasing, or nonmonotonic and will converge to the optimum. In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms. It is emphasized that new termination criteria are established to guarantee the effectiveness of the iterative control laws. Neural networks are used to approximate the iterative value function and compute the iterative control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.
Article
This paper is concerned with a new data-driven zero-sum neuro-optimal control problem for continuous-time unknown nonlinear systems with disturbance. According to the input-output data of the nonlinear system, an effective recurrent neural network is introduced to reconstruct the dynamics of the nonlinear system. Considering the system disturbance as a control input, a two-player zero-sum optimal control problem is established. Adaptive dynamic programming (ADP) is developed to obtain the optimal control under the worst case of the disturbance. Three single-layer neural networks, including one critic and two action networks, are employed to approximate the performance index function, the optimal control law, and the disturbance, respectively, for facilitating the implementation of the ADP method. Convergence properties of the ADP method are developed to show that the system state will converge to a finite neighborhood of the equilibrium. The weight matrices of the critic and the two action networks are also convergent to finite neighborhoods of their optimal ones. Finally, the simulation results will show the effectiveness of the developed data-driven ADP methods.
Article
In this paper, a novel iterative $Q$-learning method called “dual iterative $Q$-learning algorithm” is developed to solve the optimal battery management and control problem in smart residential environments. In the developed algorithm, two iterations are introduced, which are internal and external iterations, where internal iteration minimizes the total cost of power loads in each period, and the external iteration makes the iterative $Q$-function converge to the optimum. Based on the dual iterative $Q$-learning algorithm, the convergence property of the iterative $Q$-learning method for the optimal battery management and control problem is proven for the first time, which guarantees that both the iterative $Q$-function and the iterative control law reach the optimum. Implementing the algorithm by neural networks, numerical results and comparisons are given to illustrate the performance of the developed algorithm.
Article
In this paper, a novel iterative adaptive dynamic programming (ADP)-based infinite horizon self-learning optimal control algorithm, called generalized policy iteration algorithm, is developed for nonaffine discrete-time (DT) nonlinear systems. Generalized policy iteration algorithm is a general idea of interacting policy and value iteration algorithms of ADP. The developed generalized policy iteration algorithm permits an arbitrary positive semidefinite function to initialize the algorithm, where two iteration indices are used for policy improvement and policy evaluation, respectively. It is the first time that the convergence, admissibility, and optimality properties of the generalized policy iteration algorithm for DT nonlinear systems are analyzed. Neural networks are used to implement the developed algorithm. Finally, numerical examples are presented to illustrate the performance of the developed algorithm.
Article
In this paper, a novel distributed iterative adaptive dynamic programming (ADP) method is developed to solve the multibattery optimal coordination control problems for home energy management systems. According to system transformations, the multi-input optimal control problem is transformed into a single-input optimal control problem, where all the batteries are implemented at their worst performance. Next, based on the worst-performance optimal control law, an effective distributed iterative ADP algorithm is developed, where, in each iteration, only a single-input optimization problem is implemented. Convergence properties of the distributed iterative ADP algorithm are developed to show that the iterative performance index function converges to the optimum. Finally, numerical analysis is given to illustrate the performance of the developed algorithm.
Article
This paper introduces the concept, architecture, process and application of a new data-driven and computational control approach, called parallel control. The emphasis here is to illustrate the difference between the proposed system expansion scheme for parallel control through cyber-physicalsocial interaction and the traditional divide-and-conquer method for parallel computing through concurrent task execution. The theory of parallel control comes directly from the ACP approach, where artificial systems are used for modeling and representation, computational experiments are utilized for analysis and evaluation, and parallel execution are conducted for control and management of complex systems. Parallel control can be considered as the extension of feedback control, especially adaptive control, for dealing with problems involved with both engineering and social complexities. Case studies of this new control approach in transportation, production, and social management are presented and discussed.
Article
In computer vision, video stitching is a very challenging problem. In this paper, we proposed an efficient and effective wide-view video stitching method based on fast structure deformation that is capable of simultaneously achieving quality stitching and computational efficiency. For a group of synchronized frames, firstly, an effective double-seam selection scheme is designed to search two distinct but structurally corresponding seams in the two original images. The seam location of the previous frame is further considered to preserve the interframe consistency. Secondly, along the double seams, 1-D feature detection and matching is performed to capture the structural relationship between the two adjacent views. Thirdly, after feature matching, we propose an efficient algorithm to linearly propagate the deformation vectors to eliminate structure misalignment. At last, image intensity misalignment is corrected by rapid gradient fusion based on the successive over relaxation iteration (SORI) solver. A principled solution to the initialization of the SORI significantly reduced the number of iterations required. We have compared favorably our method with seven state-of-the-art image and video stitching algorithms as well as traditional ones. Experimental results show that our method outperforms the existing ones compared in terms of overall stitching quality and computational efficiency.
Article
This paper is concerned with a new iterative $theta $-adaptive dynamic programming (ADP) technique to solve optimal control problems of infinite horizon discrete-time nonlinear systems. The idea is to use an iterative ADP algorithm to obtain the iterative control law which optimizes the iterative performance index function. In the present iterative $theta $-ADP algorithm, the condition of initial admissible control in policy iteration algorithm is avoided. It is proved that all the iterative controls obtained in the iterative $theta $-ADP algorithm can stabilize the nonlinear system which means that the iterative $theta $-ADP algorithm is feasible for implementations both online and offline. Convergence analysis of the performance index function is presented to guarantee that the iterative performance index function will converge to the optimum monotonically. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the iterative $theta $-ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the established method.
Article
In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite approximation errors. First, a new generalized value iteration algorithm of ADP is developed to make the iterative performance index function converge to the solution of the Hamilton--Jacobi--Bellman equation. The generalized value iteration algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. When the iterative control law and iterative performance index function in each iteration cannot accurately be obtained, for the first time a new ``design method of the convergence criteria'' for the finite-approximation-error-based generalized value iteration algorithm is established. A suitable approximation error can be designed adaptively to make the iterative performance index function converge to a finite neighborhood of the optimal performance index function. Neural networks are used to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the developed method.
Article
The problem of H∞ state feedback control of affine nonlinear discrete-time systems with unknown dynamics is investigated in this paper. An online adaptive policy learning algorithm (APLA) based on adaptive dynamic programming (ADP) is proposed for learning in real-time the solution to the Hamilton-Jacobi-Isaacs (HJI) equation, which appears in the H∞ control problem. In the proposed algorithm, three neural networks (NNs) are utilized to find suitable approximations of the optimal value function and the saddle point feedback control and disturbance policies. Novel weight updating laws are given to tune the critic, actor, and disturbance NNs simultaneously by using data generated in real-time along the system trajectories. Considering NN approximation errors, we provide the stability analysis of the proposed algorithm with Lyapunov approach. Moreover, the need of the system input dynamics for the proposed algorithm is relaxed by using a NN identification scheme. Finally, simulation examples show the effectiveness of the proposed algorithm.
Article
In this study, the authors propose a novel adaptive dynamic programming scheme based on general value iteration (VI) to obtain near optimal control for discrete-time affine non-linear systems with continuous state and control spaces. First, the selection of initial value function is different from the traditional VI, and a new method is introduced to demonstrate the convergence property and convergence speed of value function. Then, the control law obtained at each iteration can stabilise the system under some conditions. At last, an error-bound-based condition is derived considering the approximation errors of neural networks, and then the error between the optimal and approximated value functions can also be estimated. To facilitate the implementation of the iterative scheme, three neural networks with Levenberg-Marquardt training algorithm are used to approximate the unknown system, the value function and the control law. Two simulation examples are presented to demonstrate the effectiveness of the proposed scheme.
Article
The flood of big data in cyberspace will require immediate actions from the AI and intelligent systems community to address how we manage knowledge. Besides new methods and systems, we need a total knowledge-management approach that willl require a new perspective on AI. We need "Merton's systems" in which machine intelligence and human intelligence work in tandem. This should become a normal mode of operation for the next generation of AI and intelligent systems.
Conference Paper
Theoretical procedures are developed for comparing the performance of arbitrarily selected admissible feedback controls among themselves with that of the optimal solution of a nonlinear optimal stochastic control problem. Iterative design schemes are proposed for successively improving the performance of a controller until a satisfactory design is achieved. Specifically, the exact design procedure is based on the generalized Hamilton-Jacobi-Bellman equation for the value function of nonlinear stochastic systems, and the approximate design procedure for the nonlinear stochastic regular problem with an infinite horizon is developed by using the upper and lower bounds to the value functions. For a given controller, both the upper and lower bounds to its value function can be obtained by solving a partial differential inequality. In particular, the upper and lower bounds to the optimal value function, which may be used as a measure to evaluate the acceptability of suboptimal controllers, can be constructed without actually knowing the optimal controller
Article
This paper advances a neural-network-based approximate dynamic programming control mechanism that can be applied to complex control problems such as helicopter flight control design. Based on direct neural dynamic programming (DNDP), an approximate dynamic programming methodology, the control system is tailored to learn to maneuver a helicopter. The paper consists of a comprehensive treatise of this DNDP-based tracking control framework and extensive simulation studies for an Apache helicopter. A trim network is developed and seamlessly integrated into the neural dynamic programming (NDP) controller as part of a baseline structure for controlling complex nonlinear systems such as a helicopter. Design robustness is addressed by performing simulations under various disturbance conditions. All designs are tested using FLYRT, a sophisticated industrial scale nonlinear validated model of the Apache helicopter. This is probably the first time that an approximate dynamic programming methodology has been systematically applied to, and evaluated on, a complex, continuous state, multiple-input multiple-output nonlinear system with uncertainty. Though illustrated for helicopters, the DNDP control system framework should be applicable to general purpose tracking control.
Article
Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond. (Thorndike, 1911) The idea of learning to make appropriate responses based on reinforcing events has its roots in early psychological theories such as Thorndike's "law of effect" (quoted above). Although several important contributions were made in the 1950s, 1960s and 1970s by illustrious luminaries such as Bellman, Minsky, Klopf and others (Farley and Clark, 1954; Bellman, 1957; Minsky, 1961; Samuel, 1963; Michie and Chambers, 1968; Grossberg, 1975; Klopf, 1982), the last two decades have wit- nessed perhaps the strongest advances in the mathematical foundations of reinforcement learning, in addition to several impressive demonstrations of the performance of reinforcement learning algo- rithms in real world tasks. The introductory book by Sutton and Barto, two of the most influential and recognized leaders in the field, is therefore both timely and welcome. The book is divided into three parts. In the first part, the authors introduce and elaborate on the es- sential characteristics of the reinforcement learning problem, namely, the problem of learning "poli- cies" or mappings from environmental states to actions so as to maximize the amount of "reward"
Parallel dynammic programming with an average-greedy mechanism for discrete systems
  • J Zhang
  • Q Wei
  • F.-Y. Wang
J. Zhang, Q. Wei, and F.-Y. Wang, "Parallel dynammic programming with an average-greedy mechanism for discrete systems, " SKL-MCCS/QAII Tech Report 01-09-2016, ASIA, Beijing, China.
He joined the University of Arizona in 1990 and became a Professor and Director of the Robotics and Automation Lab (RAL) and Program in Advanced Research for Complex Systems (PARC-S). In 1999, he founded the Intelligent Control and Systems Engineering Center at the Institute of Automation
  • Fei-Yue Wang
Fei-Yue Wang (S'87-M'89-SM'94-F'03) received his Ph. D. in Computer and Systems Engineering from Rensselaer Polytechnic Institute, Troy, New York in 1990. He joined the University of Arizona in 1990 and became a Professor and Director of the Robotics and Automation Lab (RAL) and Program in Advanced Research for Complex Systems (PARC-S). In 1999, he founded the Intelligent Control and Systems Engineering Center at the Institute of Automation, Chinese Academy of Sciences (CAS), Beijing, China, under the support of the Outstanding Oversea Chinese Talents Program from the State Planning Council and "100
Predictive analytics white paper American Institute for Chartered Property Casualty Underwriters/Insurance Institute of America
  • C Nyce
C. Nyce, " Predictive analytics white paper, " American Institute for Chartered Property Casualty Underwriters/Insurance Institute of America, 2007.
Business analytics: The next frontier for decision sciences
  • J R Evans
  • C H Lindner
J. R. Evans and C. H. Lindner, "Business analytics: The next frontier for decision sciences," Decision Line, vol. 43, no. 2, pp. 1−4, Mar. 2012.
Extending the value of your data warehousing investment
  • W Eckerson
W. Eckerson, "Extending the value of your data warehousing investment," The Data Warehouse Institute, USA, 2007.
Building knowledge structure in neural nets using fuzzy logic
  • F.-Y. Wang
F.-Y. Wang, "Building knowledge structure in neural nets using fuzzy logic," Robotics and Manufacturing: Recent Trends in Research Education and Applications, M. Jamshidi (Eds.), New York, NY, ASME (American Society of Mechanical Engineers) Press, 1992.
American Institute for Chartered Property Casualty Underwriters/Insurance Institute of America
  • C Nyce
C. Nyce, "Predictive analytics white paper," American Institute for Chartered Property Casualty Underwriters/Insurance Institute of America, 2007.
he became the State Specially Appointed Expert and the Director of The State Key Laboratory of Management and Control for Complex Systems. Dr. Wang's current research focuses on methods and applications for parallel systems, social computing, and knowledge automation
  • Ifac Incose
Talent Program" from CAS, and in 2002, was appointed as the Director of the Key Lab of Complex Systems and Intelligence Science, CAS. In 2011, he became the State Specially Appointed Expert and the Director of The State Key Laboratory of Management and Control for Complex Systems. Dr. Wang's current research focuses on methods and applications for parallel systems, social computing, and knowledge automation. He was the Founding Editorin-Chief of the International Journal of Intelligent Control and Systems (19952000), Founding EiC of IEEE ITS Magazine (2006-2007), EiC of IEEE Intelligent Systems (2009-2012), and EiC of IEEE Transactions on ITS (20092016). Currently he is EiC of China's Journal of Command and Control. Since 1997, he has served as General or Program Chair of more than 20 IEEE, INFORMS, ACM, ASME conferences. He was the President of IEEE ITS Society (2005-2007), Chinese Association for Science and Technology (CAST, USA) in 2005, the American Zhu Kezhen Education Foundation (2007-2008), and the Vice President of the ACM China Council (2010-2011). Since 2008, he is the Vice President and Secretary General of Chinese Association of Automation. Dr. Wang is elected Fellow of IEEE, INCOSE, IFAC, ASME, and AAAS. In 2007, he received the 2nd Class National Prize in Natural Sciences of China and awarded the Outstanding Scientist by ACM for his work in intelligent control and social computing. He received IEEE ITS Outstanding Application and Research Awards in 2009 and 2011, and IEEE SMC Norbert Wiener Award in 2014. Corresponding author of this paper.
16) is an associate professor with The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences. His current research interests include mechanism design and optimal control in e-commerce and traffic systems
  • Jie Zhang
Jie Zhang (M'16) is an associate professor with The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences. His current research interests include mechanism design and optimal control in e-commerce and traffic systems. He received his Ph.D. degree in Technology of Computer Application from University of Chinese Academy of Sciences in 2015. He received his BSc. degree in Information and Computing Science from Tsinghua University in 2005, and received MSc. degree in Operations Research and Control Theory from Renmin University of China in 2009.
Suboptimal control of nonlinear stochastic systems
  • G N Saridis
  • F.-Y. Wang
G. N. Saridis and F.-Y. Wang, "Suboptimal control of nonlinear stochastic systems," Control Theory and Advanced Technology, vol. 10, no. 4, pp. 847−871, 1994.