Article

Parallel vision: An ACP-based approach to intelligent vision computing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In vision computing, the adaptability of an algorithm to complex environments often determines whether it is able to work in the real world. This issue has become a focus of recent vision computing research. Currently, the ACP theory that comprises artificial societies, computational experiments, and parallel execution is playing an essential role in modeling and control of complex systems. This paper introduces the ACP theory into the vision computing field, and proposes parallel vision and its basic framework and key techniques. For parallel vision, photo-realistic artificial scenes are used to model and represent complex real scenes, computational experiments are utilized to train and evaluate a variety of visual models, and parallel execution is conducted to optimize the vision system and achieve perception and understanding of complex environments. This virtual/real interactive vision computing approach integrates many technologies including computer graphics, virtual reality, machine learning, and knowledge automation, and is developing towards practically effective vision systems.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Therefore, the effect of each component factor of the scene on vision algorithms cannot be analyzed separately. To address the above issues, Wang et al. [9] [10] proposed the theoretical framework of parallel vision. They use the computer graphics and virtual reality technology to build photorealistic artificial scenes. ...
... Parallel vision [9] [10] is an extension of the ACP (Artificial societies, Computational experiments, and Parallel execution) approach [22] [26] into the computer vision field. ...
... Fig. 2 shows the technical flowchart of parallel imaging: real images "small data" parallel imaging "big data" specific "small knowledge". The relationship between parallel vision [9][10], parallel imaging [11], and parallel learning [25] is illustrated in We have designed the ParallelEye dataset (see Fig. 4). ParallelEye [27] is synthesized by referring to the real urban network of Zhongguancun Area, Beijing. ...
Conference Paper
Video image dataset is playing an essential role in design and evaluation of traffic vision methods. However, there is a longstanding difficulty that manually collecting and annotating large-scale diversified dataset from real scenes is time-consuming and prone to error. In 2016, we proposed the parallel vision methodology to tackle the issues of conventional vision computing approach in data collection, model learning and evaluation. We built the ParallelEye dataset with virtual reality and the scene-specific virtual pedestrian dataset with augmented reality. In the dataset compiling process, the graphics rendering engine was used to render the artificial scenes and generate virtual images. However, the fidelity of virtual images is not satisfactory due to limitation of rendering engine, so that there is a distribution gap between virtual data and real data. In our opinion, Generative Adversarial Networks (GANs) can generate more realistic images for parallel vision research. We introduce some GANs and explain their utility in parallel vision.
... Parallel vision [23]- [25] is an extension of the ACP (Artificial systems, Computational experiments, and Parallel execution) theory [26]- [30] into the computer vision field. For parallel vision, photo-realistic artificial scenes are used to model and represent complex real scenes, computational experiments are utilized to learn and evaluate a variety of vision models, and parallel execution is conducted to online optimize the vision system and realize perception and understanding of complex scenes. ...
... For parallel vision, photo-realistic artificial scenes are used to model and represent complex real scenes, computational experiments are utilized to learn and evaluate a variety of vision models, and parallel execution is conducted to online optimize the vision system and realize perception and understanding of complex scenes. The basic framework and architecture for parallel vision [23] is shown in Fig. 2. Based on the parallel vision theory, this paper constructs a large-scale virtual urban network and synthesizes a large number of realistic images. ...
... Basic framework and architecture for parallel vision[23]. ...
Conference Paper
Full-text available
Video image datasets are playing an essential role in design and evaluation of traffic vision algorithms. Nevertheless , a longstanding inconvenience concerning image datasets is that manually collecting and annotating large-scale diversified datasets from real scenes is time-consuming and prone to error. For that virtual datasets have begun to function as a proxy of real datasets. In this paper, we propose to construct large-scale artificial scenes for traffic vision research and generate a new virtual dataset called "ParallelEye". First of all, the street map data is used to build 3D scene model of Zhongguancun Area, Beijing. Then, the computer graphics, virtual reality, and rule modeling technologies are utilized to synthesize large-scale, realistic virtual urban traffic scenes, in which the fidelity and geography match the real world well. Furthermore, the Unity3D platform is used to render the artificial scenes and generate accurate ground-truth labels, e.g., semantic/instance segmentation, object bounding box, object tracking, optical flow, and depth. The environmental conditions in artificial scenes can be controlled completely. As a result, we present a viable implementation pipeline for constructing large-scale artificial scenes for traffic vision research. The experimental results demonstrate that this pipeline is able to generate photorealistic virtual datasets with low modeling time and high accuracy labeling.
... Learning-by-synthesis has been successfully applied for eye center detection where it learns the model from pure synthetic data with automatic annotations that are generated by computer graphics technique [6] . Parallel Vision (PV) [7,8] , which is extended from ACP methodology [9] to computer vision, was proposed as a unified framework including Artificial scenes, Computational experiments, and Parallel execution for perception and understanding of complex scenes and also emphasize the significance of synthesis. As pointed out in [10] , large number of artificial data make it possible to model the real distributions. ...
... Synthetic data is often not realistic enough and cannot cover the variations in appearance in real world. Wang et al. [7,8] proposed a unified framework termed as Parallel Visions (PV), which was extended from ACP (Artificial systems, Computational experiments, and Parallel execution) methodology [9] , for the image perception and understanding. From the perspective of PV, conventional vision algorithms learning from synthetic images are part of PV with respect to artificial scene and computational experiments, and it can be further improved by the parallel execution of online optimization through interaction between synthetic and real images. ...
... Wang et al. [7,8] further extended ACP framework to computer vision as PV (Parallel Vision) framework as shown in Fig. 2 . PV also consists of three major parts including artificial scenes, computational experiments and parallel execution. ...
Article
Image-based pupil detection, which aims to find the pupil location in an image, has been an active research topic in computer vision community. Learning-based approaches can achieve preferable results given large amounts of training data with eye center annotations. However, there are limited publicly available datasets with accurate eye center annotations and it is unreliable and time-consuming for manually labeling large amounts of training data. In this paper, inspired by learning from synthetic data in Parallel Vision framework, we introduce a step of parallel imaging built upon Generative Adversarial Networks (GANs) to generate adversarial synthetic images. In particular, we refine the synthetic eye images by the improved SimGAN using adversarial training scheme. For the computational experiments, we further propose a coarse-to-fine pupil detection framework based on shape augmented cascade regression models learning from the adversarial synthetic images. Experiments on benchmark databases of BioID, GI4E, and LFW show that the proposed work performs significantly better over other state-of-the-art methods by leveraging the power of cascade regression and adversarial image synthesis.
... Therefore, the effect of each component factor of the scene on vision algorithms cannot be analyzed separately. To address the above issues, Wang et al. [9] [10] proposed the theoretical framework of parallel vision. They use the computer graphics and virtual reality technology to build photorealistic artificial scenes. ...
... Parallel vision [9] [10] is an extension of the ACP (Artificial societies, Computational experiments, and Parallel execution) approach [22] [26] into the computer vision field. ...
... Fig. 2 shows the technical flowchart of parallel imaging: real images "small data" parallel imaging "big data" specific "small knowledge". The relationship between parallel vision [9][10], parallel imaging [11], and parallel learning [25] is illustrated in We have designed the ParallelEye dataset (see Fig. 4). ParallelEye [27] is synthesized by referring to the real urban network of Zhongguancun Area, Beijing. ...
Conference Paper
Full-text available
Video image dataset is playing an essential role in design and evaluation of traffic vision methods. However, there is a longstanding difficulty that manually collecting and annotating large-scale diversified dataset from real scenes is time-consuming and prone to error. In 2016, we proposed the parallel vision methodology to tackle the issues of conventional vision computing approach in data collection, model learning and evaluation. We built the ParallelEye dataset with virtual reality and the scene-specific virtual pedestrian dataset with augmented reality. In the dataset compiling process, the graphics rendering engine was used to render the artificial scenes and generate virtual images. However, the fidelity of virtual images is not satisfactory due to limitation of rendering engine, so that there is a distribution gap between virtual data and real data. In our opinion, Generative Adversarial Networks (GANs) can generate more realistic images for parallel vision research. We introduce some GANs and explain their utility in parallel vision.
... Pepik [82] Pascal 3D+ [83] R-CNN , , , , [85−86] . 2016 , [85] ACP (Artificial societies, computational experiments, and parallel execution) [87−89] , . , , , . ...
... 11 [85] Fig. 11 Basic framework of parallel vision [85] 2) (Computational experiments) , , , , . Johnson-Roberson [91] , . ...
... 11 [85] Fig. 11 Basic framework of parallel vision [85] 2) (Computational experiments) , , , , . Johnson-Roberson [91] , . ...
... Pepik [82] Pascal 3D+ [83] R-CNN , , , , [85−86] . 2016 , [85] ACP (Artificial societies, computational experiments, and parallel execution) [87−89] , . , , , . ...
... 11 [85] Fig. 11 Basic framework of parallel vision [85] 2) (Computational experiments) , , , , . Johnson-Roberson [91] , . ...
... 11 [85] Fig. 11 Basic framework of parallel vision [85] 2) (Computational experiments) , , , , . Johnson-Roberson [91] , . ...
Article
目标视觉检测是计算机视觉领域的一个重要问题, 在视频监控、自主驾驶、人机交互等方面具有重要的研究意义和应 用价值. 近年来, 深度学习在图像分类研究中取得了突破性进展, 也带动着目标视觉检测取得突飞猛进的发展. 本文综述了深 度学习在目标视觉检测中的应用进展与展望. 首先对目标视觉检测的基本流程进行总结, 并介绍了目标视觉检测研究常用的 公共数据集; 然后重点介绍了目前发展迅猛的深度学习方法在目标视觉检测中的最新应用进展; 最后讨论了深度学习方法应 用于目标视觉检测时存在的困难和挑战, 并对今后的发展趋势进行展望.
... Pepik [82] Pascal 3D+ [83] R-CNN , , , , [85−86] . 2016 , [85] ACP (Artificial societies, computational experiments, and parallel execution) [87−89] , . , , , . ...
... 11 [85] Fig. 11 Basic framework of parallel vision [85] 2) (Computational experiments) , , , , . Johnson-Roberson [91] , . ...
... 11 [85] Fig. 11 Basic framework of parallel vision [85] 2) (Computational experiments) , , , , . Johnson-Roberson [91] , . ...
Article
Visual object detection is an important topic in computer vision, and has great theoretical and practical merits in applications such as visual surveillance, autonomous driving, and human-machine interaction. In recent years, significant breakthroughs of deep learning methods in image recognition research have arisen much attention of researchers and accordingly led to the rapid development of visual object detection. In this paper, we review the current advances and perspectives on the applications of deep learning in visual object detection. Firstly, we present the basic procedure for visual object detection and introduce some newly emerging and commonly used data sets. Then we detail the applications of deep learning techniques in visual object detection. Finally, we make in-depth discussions about the difficulties and challenges brought by deep learning as applied to visual object detection, and propose some perspectives on future trends.
... In 2016, Wang et al. extended the parallel system theory and ACP approach [8]- [14] to the vision computing field, and proposed the concept, framework and key techniques of parallel vision [15], [16]. Parallel vision focuses on constructing systematic theories and methods for visual perception and understanding of complex scenes, as shown in Figure 1. ...
... With the aid of advanced computer graphics, virtual reality, and micro simulation, we construct artificial scenes to simulate and represent complex and challenging actual scenes, and then render and generate artificial image with certain style [15], [16]. There have been many open source or commercial game engine and simulation tools, such as OpenStreetMap, CityEngine, and Unity3D, which can be used for artificial scene construction and graphics rendering. ...
Conference Paper
In order to build computer vision systems with good generalization capability, one usually needs large-scale, diversified labeled image data for learning and evaluating the in-hand computer vision models. Since it is difficult to obtain satisfactory image data from real scenes, in this paper we propose a unified theoretical framework for image generation, called parallel imaging. The core component of parallel imaging is software-defined artificial imaging systems. Artificial imaging systems receive small-scale image data collected from real scenes, and then generate large amounts of artificial image data. In this paper, we survey the realization methods of parallel imaging, including graphics rendering, image style transfer, generative models, and so on. Furthermore, we compare the properties of artificial images and actual images, and discuss the domain adaptation strategies.
... Through this analysis, we are able to find research hotspots, active authors and institutions, and potential fields. We also come up with some thoughts and perspectives on future research trends of visual tracking, especially on how tracking could be integrated with the parallel vision approach [7], [8]. ...
... The tracking field has less data available for learning tracking features under diverse scenarios. The parallel vision approach [7], [8] could augment the data and improve the existing tracking effect via virtual-real interaction. ...
Article
Full-text available
Benefitting from continuous progress in computer architecture and computer vision algorithms, the visual tracking field has earned its rapid development in recent years. This paper surveys this interesting field through bibliographic analysis on the Web-of-Science literature from 1990 to 2019. Specifically, statistical analysis methods are used to obtain the most productive authors and countries/regions, the most cited papers, and so on. In order to realize an in-depth analysis, the co-authors, co-keywords and keyword-author co-occurrence networks are built to intuitively exhibit the evolution of research hotspots and the collaboration patterns among world-wide researchers. Brief introductions of the topics that occur frequently in co-keywords networks are provided as well. Furthermore, existing challenges and future research directions within the visual tracking field are discussed, revealing that tracking-by-detection and deep learning will continue receiving much attention. In addition, the parallel vision approach should be adopted for training and evaluating visual tracking models in a virtual-real interaction manner.
... In fact, with the recent development of computer graphics and virtual reality technologies, researchers are already able to build realistic and diversified artificial scenes and obtain detailed and precise annotations automatically. In 2016, Wang et al. [1], [2] proposed the theoretical framework of parallel vision, which is an extension of the ACP (artificial systems, computational experiments, and parallel execution) approach [3], [4] into the computer vision field. For parallel vision, realistic artificial scenes are used to model and represent complex real scenes, computational experiments are utilized to train and evaluate a variety of vision models, and parallel execution is conducted to optimize the vision system online and realize perception and understanding of complex scenes. ...
... Finally, it should be mentioned that we proposed the theoretical framework of virtual-real interactive parallel vision, in order to build more robust and more intelligent vision systems through the combination of artificial scenes, computational experiments, and parallel execution [1], [2]. The artificial scenes and virtual images play an essential role in parallel vision research. ...
Article
Dataset plays an essential role in the training and testing of traffic vision algorithms. However, the collection and annotation of images from the real world is time-consuming, labor-intensive, and error-prone. Therefore, more and more researchers have begun to explore the virtual dataset, to overcome the disadvantages of real datasets. In this paper, we propose a systematic method to construct large-scale artificial scenes and collect a new virtual dataset (named "ParallelEye") for the traffic vision research. The Unity3D rendering software is used to simulate environmental changes in the artificial scenes and generate ground-truth labels automatically, including semantic/instance segmentation, object bounding boxes, and so on. In addition, we utilize ParallelEye in combination with real datasets to conduct experiments. The experimental results show the inclusion of virtual data helps to enhance the per-class accuracy in object detection and semantic segmentation. Meanwhile, it is also illustrated that the virtual data with controllable imaging conditions can be used to design evaluation experiments flexibly.
... In fact, with the recent development of computer graphics and virtual reality technologies, researchers are already able to build realistic and diversified artificial scenes and obtain detailed and precise annotations automatically. In 2016, Wang et al. [1], [2] proposed the theoretical framework of parallel vision, which is an extension of the ACP (artificial systems, computational experiments, and parallel execution) approach [3], [4] into the computer vision field. For parallel vision, realistic artificial scenes are used to model and represent complex real scenes, computational experiments are utilized to train and evaluate a variety of vision models, and parallel execution is conducted to optimize the vision system online and realize perception and understanding of complex scenes. ...
... Finally, it should be mentioned that we proposed the theoretical framework of virtual-real interactive parallel vision, in order to build more robust and more intelligent vision systems through the combination of artificial scenes, computational experiments, and parallel execution [1], [2]. The artificial scenes and virtual images play an essential role in parallel vision research. ...
... With the development of game engines [4,5] and virtual reality [6][7][8], the construction of vivid artistic scenes has made great progress. Therefore, parallel vision technology [9][10][11], which combines real and virtual data to conduct the experiments, obtains better research value and broader application prospects. In the computer vision field, parallel vision [9] is the popularization and application of the ACP (artificial societies, computational experiments, and parallel execution), a theory of complex system modeling and control [12][13][14]. ...
... Therefore, parallel vision technology [9][10][11], which combines real and virtual data to conduct the experiments, obtains better research value and broader application prospects. In the computer vision field, parallel vision [9] is the popularization and application of the ACP (artificial societies, computational experiments, and parallel execution), a theory of complex system modeling and control [12][13][14]. It firstly simulates and represents complex actual scenes through artificial scenes. ...
Article
Full-text available
Autonomous driving has become a prevalent research topic in recent years, arousing the attention of many academic universities and commercial companies. As human drivers rely on visual information to discern road conditions and make driving decisions, autonomous driving calls for vision systems such as vehicle detection models. These vision models require a large amount of labeled data while collecting and annotating the real traffic data are time-consuming and costly. Therefore, we present a novel vehicle detection framework based on the parallel vision to tackle the above issue, using the specially designed virtual data to help train the vehicle detection model. We also propose a method to construct large-scale artificial scenes and generate the virtual data for the vision-based autonomous driving schemes. Experimental results verify the effectiveness of our proposed framework, demonstrating that the combination of virtual and real data has better performance for training the vehicle detection model than the only use of real data.
... Parallel Vision theory was proposed by Wang et al. [9], [10], [11] based on ACP (Artificial systems, Computational experiments, and Parallel execution) theory [12], [13], [14], [15] attempting to solve the vision problems in the real world. Parallel Vision System follow the idea of "We can only under-stand what we create ". ...
Conference Paper
As a special type of object detection, pedestrian detection in generic scenes has made a significant progress trained with large amounts of labeled training data manually. While the models trained with generic dataset work bad when they are directly used in specific scenes. With special viewpoints, flow light and backgrounds, datasets from specific scenes are much different from the datasets from generic scenes. In order to make the generic scene pedestrian detectors work well in specific scenes, the labeled data from specific scenes are needed to adapt the models to the specific scenes. While labeling the data manually spends much time and money, especially for specific scenes, each time with a new specific scene, large amounts of images must be labeled. What’s more, the labeling information is not so accurate in the pixels manually and different people make different labeling information. In this paper, we propose an ACP-based method, with augmented reality’s help, we build the virtual world of specific scenes, and make people walking in the virtual scenes where it is possible for them to appear to solve this problem of lacking labeled data and the results show that data from virtual world is helpful to adapt generic pedestrian detectors to specific scenes.
... In this subsection, we discuss the relation between GANs and parallel intelligence from three aspects. 1) GANs and Parallel Vision: Parallel vision [60] is an extension of ACP approach into the vision computing field. Fig. 8 shows the basic framework and architecture of parallel vision. ...
Article
Recently, generative adversarial networks (GANs) have become a research focus of artificial intelligence. Inspired by two-player zero-sum game, GANs comprise a generator and a discriminator, both trained under the adversarial learning idea. The goal of GANs is to estimate the potential distribution of real data samples and generate new samples from that distribution. Since their initiation, GANs have been widely studied due to their enormous prospect for applications, including image and vision computing, speech and language processing, etc. In this review paper, we summarize the state of the art of GANs and look into the future. Firstly, we survey GANs’ proposal background, theoretic and implementation models, and application fields. Then, we discuss GANs’ advantages and disadvantages, and their development trends. In particular, we investigate the relation between GANs and parallel intelligence, with the conclusion that GANs have a great potential in parallel systems research in terms of virtual-real interaction and integration. Clearly, GANs can provide substantial algorithmic support for parallel intelligence.
... It requires massive manpower and time to collect and annotate large-scale diversified data. To address this difficulty, we intend to use the Parallel Vision methodology, which was proposed by Wang et al. [68][69][70]. The core of Parallel Vision is to use artificial scenes to simulate complex real scenes and generate precise ground-truth annotations automatically. ...
... However, the latter occupies a significant position in addressing the problems of visual perception and understanding [9][10] [11]. In work [9] [10], Wang et al. proposed the theoretical framework of Parallel Vision by extending the ACP approach [12][13] [14] and elaborated the significance of virtual data. The ACP methodology establishes the foundation for parallel intelligence [15][16] [17], which provides a new insight to tackle issues in complex systems [18]. ...
Article
Full-text available
In the area of computer vision, deep learning has produced a variety of state-of-the-art models that rely on massive labeled data. However, collecting and annotating images from the real world is too demanding in terms of labor and money investments, and is usually inflexible to build datasets with specific characteristics, such as small area of objects and high occlusion level. Under the framework of Parallel Vision, this paper presents a purposeful way to design artificial scenes and automatically generate virtual images with precise annotations. A virtual dataset named ParallelEye is built, which can be used for several computer vision tasks. Then, by training the DPM (Deformable parts model) and Faster R-CNN detectors, we prove that the performance of models can be significantly improved by combining ParallelEye with publicly available real-world datasets during the training phase. In addition, we investigate the potential of testing the trained models from a specific aspect using intentionally designed virtual datasets, in order to discover the flaws of trained models. From the experimental results, we conclude that our virtual dataset is viable to train and test the object detectors.
... Parallel Vision theory was proposed by Wang et al. [9], [10], [11] based on ACP (Artificial systems, Computational experiments, and Parallel execution) theory [12], [13], [14], [15] attempting to solve the vision problems in the real world. Parallel Vision System follow the idea of "We can only under-stand what we create ". ...
Article
Full-text available
As a special type of object detection, pedestrian detection in generic scenes has made a significant progress trained with large amounts of labeled training data manually. While the models trained with generic dataset work bad when they are directly used in specific scenes. With special viewpoints, flow light and backgrounds, datasets from specific scenes are much different from the datasets from generic scenes. In order to make the generic scene pedestrian detectors work well in specific scenes, the labeled data from specific scenes are needed to adapt the models to the specific scenes. While labeling the data manually spends much time and money, especially for specific scenes, each time with a new specific scene, large amounts of images must be labeled. What's more, the labeling information is not so accurate in the pixels manually and different people make different labeling information. In this paper, we propose an ACP-based method, with augmented reality's help, we build the virtual world of specific scenes, and make people walking in the virtual scenes where it is possible for them to appear to solve this problem of lacking labeled data and the results show that data from virtual world is helpful to adapt generic pedestrian detectors to specific scenes.
... Finally, parallel execution between the artificial and real systems is expected to enable an optimal operation of these systems [20]. Although the artificial system is drawn by the prior data of real system, it will be rectified and improved by the further observation. ...
Article
Full-text available
In this paper, a new machine learning framework is developed for complex system control, called parallel reinforcement learning. To overcome data deficiency of current data-driven algorithms, a parallel system is built to improve complex learning system by self-guidance. Based on the Markov chain (MC) theory, we combine the transfer learning, predictive learning, deep learning and reinforcement learning to tackle the data and action processes and to express the knowledge. Parallel reinforcement learning framework is formulated and several case studies for real-world problems are finally introduced.
... Through training, a high-quality data confusion discriminant model is generated, which is widely used in image restoration, semantic segmentation, image prediction, etc., because GANs can flexibly perform real and virtual scenes in tasks such as artificial systems, computational experiments, and synchronous execution. Wang et al. [24,25] summarized GAN's latest technology and found that the parallel system research of functional interaction and integration has great potential. Liang et al. [26] considered the traffic state of the original traffic and the traffic state of the training samples based on GAN, and estimated the missing traffic state under spatio-temporal and flow-speed relationships. ...
Article
Full-text available
Traffic prediction is essential for advanced traffic planning, design, management, and network sustainability. Current prediction methods are mostly offline, which fail to capture the real-time variation of traffic flows. This paper establishes a sustainable online generative adversarial network (GAN) by combining bidirectional long short-term memory (BiLSTM) and a convolutional neural network (CNN) as the generative model and discriminative model, respectively, to keep learning with continuous feedback. BiLSTM constantly generates temporal candidate flows based on valuable memory units, and CNN screens out the best spatial prediction by returning the feedback gradient to BiLSTM. Multi-dimensional indicators are selected to map the multi-view fusion local trend for accurate prediction. To balance computing efficiency and accuracy, different batch sizes are pre-tested and allocated to different lanes. The models are trained with rectified adaptive moment estimation (RAdam) by dividing the dataset into the training and testing sets with a rolling time-domain scheme. In comparison with the autoregressive integrated moving average (ARIMA), BiLSTM, generating adversarial network for traffic flow (GAN-TF), and generating adversarial network for non-signal traffic (GAN-NST), the proposed improved generating adversarial network for traffic flow (IGAN-TF) successfully generates more accurate and stable flows and performs better.
... In the framework, the physically-defined machine (or called Newton machine) interacts with the softwaredefined machine (or called Merton machine) through three coupled modules, namely management and control, experiment and evaluation, and learning and training, within the cyber-physical-social spaces. This parallel execution between the physically-and software-defined machines is expected to enable an optimal operation of the machines [27], [28]. ...
Article
The emerging development of connected and automated vehicles imposes a significant challenge on current vehicle control and transportation systems. This paper proposes a novel unified approach, Parallel Driving, a cloud-based cyberphysical-social systems U+0028 CPSS U+0029 framework aiming at synergizing connected automated driving. This study first introduces the CPSS and ACP-based intelligent machine systems. Then the parallel driving is proposed in the cyber-physical-social space, considering interactions among vehicles, human drivers, and information. Within the framework, parallel testing, parallel learning and parallel reinforcement learning are developed and concisely reviewed. Development on intelligent horizon U+0028 iHorizon U+0028 and its applications are also presented towards parallel horizon. The proposed parallel driving offers an ample solution for achieving a smooth, safe and efficient cooperation among connected automated vehicles with different levels of automation in future road transportation systems.
... 1) GANs and Parallel Vision: Parallel vision [60] is an extension of ACP approach into the vision computing field. Fig. 8 shows the basic framework and architecture of parallel vision. ...
Article
Full-text available
Recently, generative adversarial networks (GANs) have become a research focus of artificial intelligence. Inspired by two-player zero-sum game, GANs comprise a generator and a discriminator, both trained under the adversarial learning idea. The goal of GANs is to estimate the potential distribution of real data samples and generate new samples from that distribution. Since their initiation, GANs have been widely studied due to their enormous prospect for applications, including image and vision computing, speech and language processing, etc. In this review paper, we summarize the state of the art of GANs and look into the future. Firstly, we survey GANs U+02BC proposal background, theoretic and implementation models, and application fields. Then, we discuss GANs U+02BC advantages and disadvantages, and their development trends. In particular, we investigate the relation between GANs and parallel intelligence, with the conclusion that GANs have a great potential in parallel systems research in terms of virtual-real interaction and integration. Clearly, GANs can provide substantial algorithmic support for parallel intelligence.
... Second, an ideal software platform with a standard environment of information processing and system control is lacking for academics and engineers. The ACP-based parallel system theory [33] and the Data Engine technology [34] can help change the interaction behaviors of the virtual environment and real system and reduce the system management and control difficulties, which may be an effective approach for the research and application of WECS. ...
Article
Full-text available
The parallel system is a kind of scientific research method based on an artificial system and computational experiments, which can not only reflect the dynamic process of the real system but also optimize its control process in real time. Given the rapid development of wind energy technology, how to shorten the development and deployment cycle and decrease the programming difficulties of wind energy conversion system (WECS) are major issues for improving the utilization of this form of energy. In this paper, the Data Engine is used as a computing environment to form a parallel WECS for studying the engineering application of WECS. With the support of the programming methods of graphical component configurations, visualization technology and dynamic reconfiguration technology, a maximum power point tracking (MPPT) computing experiment of the parallel WECS is carried out. After comparing with MATLAB simulation results, the parallel WECS is verified as having good performance. The Data Engine is an ideal computing unit for modeling and computation of the parallel system and can establish a parallel relationship between the artificial system and the real system so as to achieve the optimal control of WECS.
... In this paper, we focus on virtual testing of the visual intelligence of intelligent vehicles. Specifically, we build a virtual driving scene dataset and construct a series of challenging vision tasks based on parallel vision theory [16], [17] and our previous works [18]- [20]. In our previous work [19], we demonstrate that this pipeline is able to generate photo-realistic virtual images with low modeling time and accurate labeling. ...
Article
Virtual simulation testing is becoming indispensable for the intelligence testing of intelligent vehicles. However, even the most advanced simulation software provides rather limited test conditions. In the long run, intelligent vehicles are expected to work at SAE (Society of Automotive Engineers) level 4 or level 5. Researchers should make full use of virtual simulation scenarios to test the visual intelligence algorithms of intelligent vehicles under various imaging conditions. In this paper, we create realistic artificial scenes to simulate the self-driving scenarios, and collect a dataset of synthetic images from the virtual driving scenes, named “ParallelEye-CS”. In the artificial scenes, we can flexibly change environmental conditions and automatically acquire accurate and diverse ground-truth labels. As a result, ParallelEye-CS has six ground-truth labels and includes twenty types of tests, which are divided into normal, environmental, and difficult tasks. Furthermore, we utilize ParallelEye-CS in combination with other publicly available datasets to conduct experiments for visual object detection. The experimental results indicate that: 1) object detection algorithms of intelligent vehicles can be tested under various scenario challenges; 2) mixed dataset can improve the accuracy of object detection algorithms, but domain shift is a serious issue worthy of attention.
... Wang et al. studied parallel vision. Wang et al. also researched the generative adversarial networks (GAN) to realize the intelligent perception and understanding of complex environments by means of parallel execution [99]. Xiong et al. [100] provided the parallel transportation management and control system for subways. ...
Article
Full-text available
Based on ACP ( artificial systems, computing experiments, and parallel execution ) methodology, parallel control and management has become a popularly systematic and complete solution for the control and management of complex systems. This paper focuses on summarizing comprehensive review of the research literature of parallel control and management achieved in the recent years including the theoretical framework, core technologies, and the application demonstration. The future research, application directions, and suggestions are also discussed.
Conference Paper
Offline training and testing are playing an essential role in design and evaluation of intelligent vehicle vision algorithms. Nevertheless, long-term inconvenience concerning traditional image datasets is that manually collecting and annotating datasets from real scenes lack testing tasks and diverse environmentalconditions.Forthatvirtualdatasetscanmakeup fortheseregrets.Inthispaper,weproposetoconstructartificial scenesforevaluatingthevisualintelligenceofintelligentvehicles and generate a new virtual dataset called “ParallelEye-CS”. First of all, the actual track map data is used to build 3D scene model of Chinese Flagship Intelligent Vehicle Proving Center Area, Changshu. Then, the computer graphics and virtual reality technologies are utilized to simulate the virtual testing tasks according to the Chinese Intelligent Vehicles Future Challenge (IVFC) tasks. Furthermore, the Unity3D platform is used to generate accurate ground-truth labels and change environmental conditions. As a result, we present a viable implementation method for constructing artificial scenes for traffic vision research. The experimental results show that our method is able to generate photorealistic virtual datasets with diverse testing tasks.
Article
Object instance segmentation in traffic scenes is an important research topic. For training instance segmentation models, synthetic data can potentially complement real data, alleviating manual effort on annotating real images. However, the data distribution discrepancy between synthetic data and real data hampers the wide applications of synthetic data. In light of that, we propose a virtual-real interaction method for object instance segmentation. This method works over synthetic images with accurate annotations and real images without any labels. The virtual-real interaction guides the model to learn useful information from synthetic data while keeping consistent with real data. We first analyze the data distribution discrepancy from a probabilistic perspective, and divide it into image-level and instance-level discrepancies. Then, we design two components to align these discrepancies, i.e., global-level alignment and local-level alignment. Furthermore, a consistency alignment component is proposed to encourage the consistency between the global-level and the local-level alignment components. We evaluate the proposed approach on the real Cityscapes dataset by adapting from virtual SYNTHIA, Virtual KITTI, and VIPER datasets. The experimental results demonstrate that it achieves significantly better performance than state-of-the-art methods.
Conference Paper
Full-text available
Intelligent video surveillance technology has changed the traditional passive reception mode. It can real-time, automatically and intelligently analyze video data to improve its efficiency and reliability. This paper presents a set of efficient expression and deep analysis platform for massive traffic video, which can connect the surveillance cameras from urban transport systems, and perceive the state of people, vehicles, roads and other elements using intelligent visual perception technology, and extract relevant semantic structure information from the non-structured video content and display it in structured text language. Then, its main contents and key technologies are analyzed. Finally, an application case in Guangzhou, China is given. The system can be used for intelligent expression, deep analysis, detection, identification, archiving and management of massive traffic video data, to provide a complete information services for the managers and users.
Article
A Digital Twins System for Front End Engineering Design (FEED) in offshore oil/gas field development is introduced. Firstly, 3D digital models of typical subsea equipment like X-tree and manifold are built using techniques include Building Information Modeling (BIM). Based on these models, multiple gas/oil fields in northern South China Sea are built as digital case set enriched with real data from production. Secondly, the AI method Case-Based Reasoning (CBR) is introduced to utilize and maintain the Digital Twins system. The procedure of FEED, including visualization, training, design and evaluation is accelerated by taking advantages of accumulated knowledge and experiences.
Article
Welcome to the second issue of IEEE Transactions on Computational Social Systems (TCSS) this year. First, I am grateful to report that, as of February 7, 2021, the Citescore of TCSS has leapfrogged back to 5.8, a new high, which indicates the high quality and relevance of IEEE TCSS in the field of social computing and computational social systems research. Many thanks to all of you for your great effort and support.
Conference Paper
In order to achieve effective protection of digital image information and provide anti-attack capability for encrypted image, this paper proposed an ACP-based Approach to color image encryption using DNA sequence operation and hyper-chaotic system. By using the ACP method which is a way to solve the social computing problem, the influence of the chaotic data from the real world and the influence of the chaotic data from the simulation on the encryption were combined. First, obtaining chaotic data in reality, we made artificial random images by using cloud model; Then, chaotic data in reality were used to encrypt the artificial random image while chaotic data in simulation were used to encrypt the original image; Finally, using the method of parallel execution, combining with the influence of the chaotic data of the two groups, performing DNA-XOR operation on two groups encryption results and we get the final encrypted image. The simulation results show that the algorithm has a good encryption effect and a larger secret key space to the key. In addition, the algorithm can also resist the brute attack and differential attack, and achieve the hyper-chaotic image encryption in the disadvantages of low chaos.
Chapter
In the application of deep learning algorithms based on large-scale data sets, some problems, such as insufficient samples, imperfect sample quality, and high cost of building large data sets, emerge and restrict algorithm performance. In this paper, a target recognition framework and learning mode based on parallel images are proposed, and the application verification is carried out by taking the insulator target recognition in the transmission line as an example. This paper uses the artificial image generation technology to establish the insulator data set NCEPU-J, and then proposes the target recognition framework PITR and three learning modes, namely OriPITR, TrsPITR, and MutiPITR. The insulator strings with the piece number of 7, 11 and 14 are verified, and the recognition accuracy is significantly improved. The results show that the target recognition framework and learning mode based on parallel images are feasible and effective.
Article
生成式对抗网络 GAN (Generative adversarial networks) 目前已经成为人工智能学界一个热门的研究方向. GAN 的基本思想源自博弈论的二人零和博弈, 由一个生成器和一个判别器构成, 通过对抗学习的方式来训练. 目的是估测数据样本 的潜在分布并生成新的数据样本. 在图像和视觉计算、语音和语言处理、信息安全、棋类比赛等领域, GAN 正在被广泛研究, 具有巨大的应用前景. 本文概括了 GAN 的研究进展, 并进行展望. 在总结了 GAN 的背景、理论与实现模型、应用领域、优缺 点及发展趋势之后, 本文还讨论了 GAN 与平行智能的关系, 认为 GAN 可以深化平行系统的虚实互动、交互一体的理念, 特 别是计算实验的思想, 为 ACP (Artificial societies, computational experiments, and parallel execution) 理论提供了十分具 体和丰富的算法支持.
Chapter
With the development of social science and technology, the popularization rate of artificial intelligence is getting higher and higher. People apply artificial intelligence to all aspects of life and provide convenience for people’s lives. The management of sports competitions is an important part of sports competitions. It is necessary to ensure the quality of sports competitions on the principles of fairness, impartiality, and openness, to ensure the discipline of the competition, to maintain the order of the stadium, and the safety and security of traditional sports competition performance management systems. The management results are relatively unreliable and require more human resources. Therefore, this paper proposes a method for managing sports games based on intelligent machine calculations, and uses this method to design a management system. The system uses a network platform to transmit data. By analyzing user needs and functional requirements, the network platform is designed to implement data transmission between the system and the client, and the system is tested. The research results show that this paper is designed based on the intelligent computer computing method of sports competition management. The system results management results are reliable and the system performance is strong.
Article
Automatic diagnosis based on medical imaging necessitates both lesion segmentation and disease classification. Lesion segmentation requires pixel-level annotations while disease classification only requires image-level annotations. The two tasks are usually studied separately despite the latter problem relies on the former. Motivated by the close correlation between them, we propose a mixed-supervision guided method and a residual-aided classification U-Net model (ResCU-Net) for joint segmentation and benign-malignant classification. By coupling the strong supervision in the form of segmentation mask and weak supervision in the form of benign-malignant label through a simple annotation procedure, our method efficiently segments tumor regions while simultaneously predicting a discriminative map for identifying the benign-malignant types of tumors. Our network, ResCU-Net, extends U-Net by incorporating the residual module and the SegNet architecture to exploit multilevel information for achieving improved tissue identification. With experiments on a public mammogram database of INbreast, we validate the effectiveness of our method and achieve consistent improvements over state-of-the-art models.
Article
Parallel systems are a kind of scientific research method based on artificial society and computational experiments , which can not only reflect the dynamic process of real system but also optimize the control process of the real system in real time. The automatic container terminal is a typical complex system having numerous operating schemes and a large number of constraints. How to accomplish the container transport task with intermittent and batch features while using minimum time and energy consumption is a major issue, which involves many disciplines such as mathematics, control, management and computer. In this paper, the data engine is used as the basic computing unit of the artificial society of parallel systems, to study the information control system of the container terminal. As a computing environment for graphical configuration, the data engine is ideal for modeling and computation of complex systems. With the support of the visualization and dynamic reconfiguration technologies, 380 data engines are used to perform computational experiments on the automation process of a port system, which consists of 8 bridge cranes, 25 AGVs and 16 gantry cranes. The results indicate the effectiveness of the data engine technology for parallel systems, and the computing environment composed of multiple data engines can greatly reduce the modeling complexity of the port information control system as well as make the information management work with the control process cooperatively. The proposed parallel systems can connect to port devices directly to establish a parallel relationship between "artificial container terminal" and "physical container terminal" so as to achieve the optimal control of the port devices.
Article
Full-text available
In the past decade, benefitting from the progress in computer vision theories and computing resources, there has been a rapid development in visual object tracking. Among all the methods, the tracklet-based object tracking method has gained its popularity due to its robustness in occlusion scenarios and high computational efficiency. This paper present a comprehensive survey of research methods related to tracklet-based object tracking. First, the basic concepts, research significance and research status of visual object tracking are introduced briefly. Then, the tracklet-based tracking approach is described from four aspects, including object detection, feature extraction, tracklet generation, and tracklet association and completion. Afterwards, we propose a detailed review and analyze the characteristics of state-of-the-art tracklet-based tracking methods. Finally, potential challenges and research fields are discussed. In our opinion, more advanced object tracking models should be proposed and the parallel vision approach should be adopted to learn and evaluate tracking models in a virtual-real interactive way.
Article
In recent years, with the development of computing power and deep learning algorithms, pedestrian detection has made great progress. Nevertheless, once a detection model trained on generic datasets (such as PASCAL VOC and MS COCO) is applied to a specific scene, its precision is limited by the distribution gap between the generic data and the specific scene data. It is difficult to train the model for a specific scene, due to the lack of labeled data from that scene. Even though we manage to get some labeled data from a specific scene, the changing environmental conditions make the pre-trained model perform bad. In light of these issues, we propose a parallel vision approach to scene-specific pedestrian detection. Given an object detection model, it is trained via two sequential stages: (1) the model is pre-trained on augmented-reality data, to address the lack of scene-specific training data; (2) the pre-trained model is incrementally optimized with newly synthesized data as the specific scene evolves over time. On publicly available datasets, our approach leads to higher precision than the models trained on generic data. To tackle the dynamically changing scene, we further evaluate our approach on the webcam data collected from Church Street Market Place, and the results are also encouraging.
Article
Knowledge automation is the organic integration of intelligentization, human-machine, automation etc. From the perspective of social signals and Merton systems, we address issues related to the significance and development of knowledge automation. Key topics discussed are technical foundation for smart algorithms and knowledge robots, softwaredefined systems and processes from the viewpoint of systems engineering, and the important role played by knowledge automation in parallel systems for the control and management of complex systems.
Article
Surrounding vehicle detection is one of the most important modules for a vision-based driver assistance system (VB-DAS) or an autonomous vehicle. In this paper, we put forward a wireless panoramic camera system for real-time and seamless imaging of the 360-degree driving scene. Using an embedded FPGA design, the proposed panoramic camera system can perform fast image stitching and produce panoramic videos in real-time, which greatly relives the computation and storage burden of a traditional multi-camera-based panoramic system. For surrounding vehicle detection, we present a novel deep convolutional neural network - EZ-Net, which perceives the potential vehicles by using 13 convolutional layers and locates the vehicles by a local non-maximum suppression process. Experimental results demonstrate that, the proposed EZ-Net performs vehicle detection on the panoramic video at a speed of 140 fps while holding a competing accuracy with the state-of-the-art detectors.
Article
In the past decade, benefitting from the progress in computer vision theories and computing resources, there has been a rapid development in visual object tracking. Among all the methods, the tracklet-based object tracking method has gained its popularity due to its robustness in occlusion scenarios and high computational efficiency. This paper present a comprehensive survey of research methods related to tracklet-based object tracking. First, the basic concepts, research significance and research status of visual object tracking are introduced briefly. Then, the tracklet-based tracking approach is described from four aspects, including object detection, feature extraction, tracklet generation, and tracklet association and completion. Afterwards, we propose a detailed review and analyze the characteristics of state-of-the-art tracklet-based tracking methods. Finally, potential challenges and research fields are discussed. In our opinion, more advanced object tracking models should be proposed and the parallel vision approach should be adopted to learn and evaluate tracking models in a virtual-real interactive way.
Article
Generative Adversarial Networks (GANs) have emerged as a promising and effective mechanism for machine learning due to its recent successful applications. GANs share the same idea of producing, testing, acquiring, and utilizing data as well as knowledge based on artificial systems, computational experiments, and parallel execution of actual and virtual scenarios, as outlined in the theory of parallel transportation. Clearly, the adversarial concept is embedded implicitly or explicitly in both GANs and parallel transportation systems. In this article, we first introduce basics of GANs and parallel transportation systems, and then present an approach of using GANs in parallel transportation systems for traffic data generation, traffic modeling, traffic prediction and traffic control. Our preliminary investigation indicates that GANs have a great potential and provide specific algorithm support for implementing parallel transportation systems.
Article
为了解决复杂环境中痛风诊疗的精准决策难题,突破不同医生业务水平对于痛风诊疗的局限,提高痛风 诊断的准确率和治疗的有效性,文中提出基于ACP理论的平行痛风诊疗系统框架,称为“平行高特(Gout)冶.平行 高特通过构建人工痛风诊疗系统以模拟和表示实际痛风诊疗系统,运用计算实验进行各种痛风诊疗模型的训练与 评估,借助平行执行对实际痛风诊疗系统进行管理决策与实时优化,实现痛风诊疗过程的自动化与智能化.该平行 的诊疗过程可以帮助医生减少误诊误治,提高效率,提升水平,同时也能帮助患者做好慢病管理,远离疾病.考虑到 痛风病在当前社会的严重程度,平行高特在痛风诊疗中的应用具有重要的实际意义,是传统医疗模式走向智慧化、 平行化的有效途径和自然选择,有利于推进健康中国建设,实现更高水平的全民健康.
Article
Background subtraction is one of the key techniques in computer vision and pattern recognition. A new background subtraction algorithm is proposed, which firstly uses the median filtering algorithm for extracting background and then trains the network based on Bayesian generative adversarial network. The work uses Bayesian generative adversarial network to classify each pixel effectively, thereby addressing the issues of sudden and slow illumination changes, non-stationary background, and ghost. Deep convolutional neural networks are adopted to construct the generator and the discriminator of Bayesian generative adversarial network. Experiments show that the proposed algorithm results in better performance than others in most cases. The contribution of the work is to apply Bayesian generative adversarial network to background subtraction for the first time and achieve good experimental results.
Article
To handle the issue of preventing emergencies for motion planning in autonomous driving, we present a novel parallel motion planning framework. Artificial traffic scenes are firstly constructed based on real traffic scenes. A deep planning model which can learn from both real and artificial scenes is developed and used to make planning decisions in an end-to-end mode. To prevent emergencies, a generative adversarial networks (GAN) model is designed and learns from the artificial emergencies from artificial traffic scenes. During deployment, the well-trained GAN model is used to generate multiple virtual emergencies based on the current real scene, and the well-trained planning model simultaneously makes different planning decisions for both virtual scenes and the current scenes. The final planning decision is made by comprehensively analyzing observations and virtual emergencies. Through parallel planning, the planner can timely make rational decision without a large number of calculations when an emergency occurs.
Article
To improve the accuracy of diagnosis and the effectiveness of treatment, a framework of parallel healthcare systems (PHSs) based on the artificial systems + computational experiments + parallel execution (ACP) approach is proposed in this paper. PHS uses artificial healthcare systems to model and represent patients' conditions, diagnosis, and treatment process, then applies computational experiments to analyze and evaluate various therapeutic regimens, and implements parallel execution for decision-making support and real-time optimization in both actual and artificial healthcare processes. In addition, we combine the emerging blockchain technology with PHS, via constructing a consortium blockchain linking patients, hospitals, health bureaus, and healthcare communities for comprehensive healthcare data sharing, medical records review, and care auditability. Finally, a prototype named parallel gout diagnosis and treatment system is built and deployed to verify and demonstrate the effectiveness and efficiency of the blockchain-powered PHS framework.
Article
Full-text available
Data seems cheap to get, and in many ways it is, but the process of creating a high quality labeled dataset from a mass of data is time-consuming and expensive. With the advent of rich 3D repositories, photo-realistic rendering systems offer the opportunity to provide nearly limitless data. Yet, their primary value for visual learning may be the quality of the data they can provide rather than the quantity. Rendering engines offer the promise of perfect labels in addition to the data: what the precise camera pose is; what the precise lighting location, temperature, and distribution is; what the geometry of the object is. In this work we focus on semi-automating dataset creation through use of synthetic data and apply this method to an important task -- object viewpoint estimation. Using state-of-the-art rendering software we generate a large labeled dataset of cars rendered densely in viewpoint space. We investigate the effect of rendering parameters on estimation performance and show realism is important. We show that generalizing from synthetic data is not harder than the domain adaptation required between two real-image datasets and that combining synthetic images with a small amount of real data improves estimation accuracy.
Article
Full-text available
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.
Article
Full-text available
Rapid advances in computation, combined with latest advances in computer graphics simulations have facilitated the development of vision systems and training them in virtual environments. One major stumbling block is in certification of the designs and tuned parameters of these systems to work in real world. In this paper, we begin to explore the fundamental question: Which type of information transfer is more analogous to real world? Inspired from the performance characterization methodology outlined in the 90's, we note that insights derived from simulations can be qualitative or quantitative depending on the degree of the fidelity of models used in simulations and the nature of the questions posed by the experimenter. We adapt the methodology in the context of current graphics simulation tools for modeling data generation processes and, for systematic performance characterization and trade-off analysis for vision system design leading to qualitative and quantitative insights. In concrete, we examine invariance assumptions used in vision algorithms for video surveillance settings as a case study and assess the degree to which those invariance assumptions deviate as a function of contextual variables on both graphics simulations and in real data. As computer graphics rendering quality improves, we believe teasing apart the degree to which model assumptions are valid via systematic graphics simulation can be a significant aid to assisting more principled ways of approaching vision system design and performance modeling.
Article
Full-text available
As the computer vision matures into a systems science and engineering discipline, there is a trend in leveraging latest advances in computer graphics simulations for performance evaluation, learning, and inference. However, there is an open question on the utility of graphics simulations for vision with apparently contradicting views in the literature. In this paper, we place the results from the recent literature in the context of performance characterization methodology outlined in the 90's and note that insights derived from simulations can be qualitative or quantitative depending on the degree of fidelity of models used in simulation and the nature of the question posed by the experimenter. We describe a simulation platform that incorporates latest graphics advances and use it for systematic performance characterization and trade-off analysis for vision system design. We verify the utility of the platform in a case study of validating a generative model inspired vision hypothesis, Rank-Order consistency model, in the contexts of global and local illumination changes, and bad weather, and high-frequency noise. Our approach establishes the link between alternative viewpoints, involving models with physics based semantics and signal and perturbation semantics and confirms insights in literature on robust change detection.
Article
Full-text available
Detecting pedestrians with on-board vision systems is of paramount interest for assisting drivers to prevent vehicle-to-pedestrian accidents. The core of a pedestrian detector is its classification module, which aims at deciding if a given image window contains a pedestrian. Given the difficulty of this task, many classifiers have been proposed during the last fifteen years. Among them, the so-called (deformable) part-based classifiers including multi-view modeling are usually top ranked in accuracy. Training such classifiers is not trivial since a proper aspect clustering and spatial part alignment of the pedestrian training samples are crucial for obtaining an accurate classifier. In this paper, first we perform automatic aspect clustering and part alignment by using virtual-world pedestrians, i.e., human annotations are not required. Second, we use a mixture-of-parts approach that allows part sharing among different aspects. Third, these proposals are integrated in a learning framework which also allows to incorporate real-world training data to perform domain adaptation between virtual- and real-world cameras. Overall, the obtained results on four popular on-board datasets show that our proposal clearly outperforms the state-of-the-art deformable part-based detector known as latent SVM.
Conference Paper
Full-text available
Ground truth optical flow is difficult to measure in real scenes with natural motion. As a result, optical flow data sets are restricted in terms of size, complexity, and diversity, making optical flow algorithms difficult to train and test on realistic data. We introduce a new optical flow data set derived from the open source 3D animated short film Sintel. This data set has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects. Because the graphics data that generated the movie is open source, we are able to render scenes under conditions of varying complexity to evaluate where existing flow algorithms fail. We evaluate several recent optical flow algorithms and find that current highly-ranked methods on the Middlebury evaluation have difficulty with this more complex data set suggesting further research on optical flow estimation is needed. To validate the use of synthetic data, we compare the image- and flow-statistics of Sintel to those of real films and videos and show that they are similar. The data set, metrics, and evaluation website are publicly available.
Article
Full-text available
Video analysis often begins with background subtraction. This problem is often approached in two steps—a background model followed by a regularisation scheme. A model of the background allows it to be distinguished on a per-pixel basis from the foreground, whilst the regularisation combines information from adjacent pixels. We present a new method based on Dirichlet process Gaussian mixture models, which are used to estimate per-pixel background distributions. It is followed by probabilistic regularisation. Using a non-parametric Bayesian method allows per-pixel mode counts to be automatically inferred, avoiding over-/under- fitting. We also develop novel model learning algorithms for continuous update of the model in a principled fashion as the scene changes. These key advantages enable us to outperform the state-of-the-art alternatives on four benchmarks.
Article
Full-text available
The Tokyo Virtual Living Lab is an experimental space based on 3D Internet technology that lets researchers conduct controlled driving and travel studies, including those involving multiple users in the same shared space. This shared-use feature is crucial for analyzing interactive driving behaviors in future smart cities. The lab's novelty is two-fold: it outputs a semantically enriched graphical navigation network using free map data as input, and it includes a navigation segment agent that coordinates a multiagent traffic simulator. This simulator, which is based on the navigation network, supports the integration of user-controlled vehicles. The lab's approach can significantly reduce the effort of preparing realistic driving behavior studies. To demonstrate this, the authors built a 3D model of a part of Tokyo to perform experiments with human drivers in two conditions: normal traffic and ubiquitous eco-traffic.
Article
Full-text available
Video analysis often begins with background subtraction. This problem is often approached in two steps - a background model followed by a regularisation scheme. A model of the background allows it to be distinguished on a per-pixel basis from the foreground, whilst the regularisation combines information from adjacent pixels. We present a new method based on Dirichlet process Gaussian mixture models, which are used to estimate per-pixel background distributions. It is followed by probabilistic regularisation. Using a non-parametric Bayesian method allows per-pixel mode counts to be automatically inferred, avoiding over-/under- fitting. We also develop novel model learning algorithms for continuous update of the model in a principled fashion as the scene changes. These key advantages enable us to outperform the state-of-the-art alternatives on four benchmarks.
Article
Full-text available
Pedestrian detection is of paramount interest for many applications. Most promising detectors rely on discriminatively learnt classifiers, i.e., trained with annotated samples. However, the annotation step is a human intensive and subjective task worth to be minimized. By using virtual worlds we can automatically obtain precise and rich annotations. Thus, we face the question: can a pedestrian appearance model learnt in realistic virtual worlds work successfully for pedestrian detection in realworld images?. Conducted experiments show that virtual-world based training can provide excellent testing performance in real world, but it can also suffer the dataset shift problem as real-world based training does. Accordingly, we have designed a domain adaptation framework, V-AYLA, in which we have tested different techniques to collect a few pedestrian samples from the target domain (real world) and combine them with the many examples of the source domain (virtual world) in order to train a domain adapted pedestrian classifier that will operate in the target domain. V-AYLA reports the same detection performance than when training with many human-provided pedestrian annotations and testing with real-world images of the same domain. To the best of our knowledge, this is the first work demonstrating adaptation of virtual and real worlds for developing an object detector.
Article
Full-text available
Society is rapidly accepting the use of video cameras in many new and varied locations, but effective methods to utilize and manage the massive resulting amounts of visual data are only slowly developing. This paper presents a framework for live video analysis in which the behaviors of surveillance subjects are described using a vocabulary learned from recurrent motion patterns, for real-time characterization and prediction of future activities, as well as the detection of abnormalities. The repetitive nature of object trajectories is utilized to automatically build activity models in a 3-stage hierarchical learning process. Interesting nodes are learned through Gaussian mixture modeling, connecting routes formed through trajectory clustering, and spatio-temporal dynamics of activities probabilistically encoded using hidden Markov models. Activity models are adapted to small temporal variations in an online fashion using maximum likelihood regression and new behaviors are discovered from a periodic retraining for long-term monitoring. Extensive evaluation on various data sets, typically missing from other work, demonstrates the efficacy and generality of the proposed framework for surveillance-based activity analysis.
Article
Full-text available
We present a novel concept, Virtualized Traffic, to reconstruct and visualize continuous traffic flows from discrete spatiotemporal data provided by traffic sensors or generated artificially to enhance a sense of immersion in a dynamic virtual world. Given the positions of each car at two recorded locations on a highway and the corresponding time instances, our approach can reconstruct the traffic flows (i.e., the dynamic motions of multiple cars over time) between the two locations along the highway for immersive visualization of virtual cities or other environments. Our algorithm is applicable to high-density traffic on highways with an arbitrary number of lanes and takes into account the geometric, kinematic, and dynamic constraints on the cars. Our method reconstructs the car motion that automatically minimizes the number of lane changes, respects safety distance to other cars, and computes the acceleration necessary to obtain a smooth traffic flow subject to the given constraints. Furthermore, our framework can process a continuous stream of input data in real time, enabling the users to view virtualized traffic events in a virtual world as they occur. We demonstrate our reconstruction technique with both synthetic and real-world input.
Conference Paper
Modern computer vision algorithms typically require expensive data acquisition and accurate manual labeling. In this work, we instead leverage the recent progress in computer graphics to generate fully labeled, dynamic, and photo-realistic proxy virtual worlds. We propose an efficient real-to-virtual world cloning method, and validate our approach by building and publicly releasing a new video dataset, called Virtual KITTI (see http://www.xrce.xerox.com/Research-Development/Computer-Vision/Proxy-Virtual-Worlds), automatically labeled with accurate ground truth for object detection, tracking, scene and instance segmentation, depth, and optical flow. We provide quantitative experimental evidence suggesting that (i) modern deep learning algorithms pre-trained on real data behave similarly in real and virtual worlds, and (ii) pre-training on virtual data improves performance. As the gap between real and virtual worlds is small, virtual worlds enable measuring the impact of various weather and imaging conditions on recognition performance, all other things being equal. We show these factors may affect drastically otherwise high-performing deep models for tracking.
Article
An investigation on the impact and significance of the AlphaGo vs. Lee Sedol Go match is conducted, and concludes with a conjecture of the AlphaGo Thesis and its extension in accordance with the Church-Turing Thesis in the history of computing. It is postulated that the architecture and method utilized by the AlphaGo program provide an engineering solution for tackling issues in complexity and intelligence. Specifically, the AlphaGo Thesis implies that any effective procedure for hard decision problems such as NP-hard can be implemented with AlphaGo-like approach. Deep rule-based networks are proposed in attempt to establish an understandable structure for deep neural networks in deep learning. The success of AlphaGo and corresponding thesis ensure the technical soundness of the parallel intelligence approach for intelligent control and management of complex systems and knowledge automation.
Article
Modern computer vision algorithms typically require expensive data acquisition and accurate manual labeling. In this work, we instead leverage the recent progress in computer graphics to generate fully labeled, dynamic, and photo-realistic proxy virtual worlds. We propose an efficient real-to-virtual world cloning method, and validate our approach by building and publicly releasing a new video dataset, called Virtual KITTI (see http://www.xrce.xerox.com/Research-Development/Computer-Vision/Proxy-Virtual-Worlds), automatically labeled with accurate ground truth for object detection, tracking, scene and instance segmentation, depth, and optical flow. We provide quantitative experimental evidence suggesting that (i) modern deep learning algorithms pre-trained on real data behave similarly in real and virtual worlds, and (ii) pre-training on virtual data improves performance. As the gap between real and virtual worlds is small, virtual worlds enable measuring the impact of various weather and imaging conditions on recognition performance, all other things being equal. We show these factors may affect drastically otherwise high-performing deep models for tracking.
Article
Relating visual information to its linguistic semantic meaning remains an open and challenging area of research. The semantic meaning of images depends on the presence of objects, their attributes and their relations to other objects. But precisely characterizing this dependence requires extracting complex visual information from an image, which is in general a difficult and yet unsolved problem. In this paper, we propose studying semantic information in abstract images created from collections of clip art. Abstract images provide several advantages. They allow for the direct study of how to infer high-level semantic information, since they remove the reliance on noisy low-level object, attribute and relation detectors, or the tedious hand-labeling of images. Importantly, abstract images also allow the ability to generate sets of semantically similar scenes. Finding analogous sets of semantically similar real images would be nearly impossible. We create 1,002 sets of 10 semantically similar abstract images with corresponding written descriptions. We thoroughly analyze this dataset to discover semantically important features, the relations of words to visual features and methods for measuring semantic similarity. Finally, we study the relation between the saliency and memorability of objects and their semantic importance.
Conference Paper
The most successful 2D object detection methods require a large number of images annotated with object bounding boxes to be collected for training. We present an alternative approach that trains on virtual data rendered from 3D models, avoiding the need for manual labeling. Growing demand for virtual reality applications is quickly bringing about an abundance of available 3D models for a large variety of object categories. While mainstream use of 3D models in vision has focused on predicting the 3D pose of objects, we investigate the use of such freely available 3D models for multicategory 2D object detection. To address the issue of dataset bias that arises from training on virtual data and testing on real images, we propose a simple and fast adaptation approach based on decorrelated features. We also compare two kinds of virtual data, one rendered with real-image textures and one without. Evaluation on a benchmark domain adaptation dataset demonstrates that our method performs comparably to existing methods trained on large-scale real image domains.
Article
Performance evaluation is considered as an important part of the unmanned ground vehicle (UGV) development; it helps to discover research problems and improves driving safety. In this paper, a task-specific performance evaluation model of UGVs applied in the Intelligent Vehicle Future Challenge (IVFC) annual competitions is discussed. It is defined in functional levels with a formal evaluation process, including metrics analysis, metrics preprocessing, weights calculation, and a technique for order of preference by similarity to ideal solution and fuzzy comprehensive evaluation methods. IVFC 2012 is selected as a case study and overall performances of five UGVs are evaluated with specific analyzed autonomous driving tasks of environment perception, structural on-road driving, unstructured zone driving, and dynamic path planning. The model is proved to be helpful in IVFC serial competition UGVs performance evaluation.
Article
We present a distributed virtual vision simulator capable of simulating large-scale camera networks. Our virtual vision simulator is capable of simulating pedestrian traffic in different 3D environments. Simulated cameras deployed in these virtual environments generate synthetic video feeds that are fed into a vision processing pipeline supporting pedestrian detection and tracking. The visual analysis results are then used for subsequent processing, such as camera control, coordination, and handoff. Our virtual vision simulator is realized as a collection of modules that communicate with each other over the network. Consequently, we can deploy our simulator over a network of computers, allowing us to simulate much larger camera networks and much more complex scenes then is otherwise possible. Specifically, we show that our proposed virtual vision simulator can model a camera network, comprising more than one hundred active pan/tilt/zoom and passive wide field-of-view cameras, deployed in an upper floor of an office tower in downtown Toronto.
Article
Using massive amounts of data to recognize photos and speech, deep-learning computers are taking a big step towards true artificial intelligence.
Object video virtual video (OVVV) is a publicly available visual surveillance simulation test bed based on a commercial game engine. The tool simulates multiple synchronized video streams from a variety of camera configurations, including static, PTZ and omni-directional cameras, in a virtual environment populated with computer or player controlled humans and vehicles. To support performance evaluation, OVVV generates detailed automatic ground truth for each frame including target centroids, bounding boxes and pixel-wise foreground segmentation. We describe several realistic, controllable noise effects including pixel noise, video ghosting and radial distortion to improve the realism of synthetic video and provide additional dimensions for performance testing. Several indoor and outdoor virtual environments developed by the authors are described to illustrate the range of testing scenarios possible using OVVV. Finally, we provide a practical demonstration of using OVVV to develop and evaluate surveillance algorithms.
Article
This paper presents our research towards smart camera networks capable of carrying out advanced surveillance tasks with little or no human supervision. A unique centerpiece of our work is the combination of computer graphics, artificial life, and computer vision simulation technologies to develop such networks and experiment with them. Specifically, we demonstrate a smart camera network comprising static and active simulated video surveillance cameras that provides extensive coverage of a large virtual public space, a train station populated by autonomously self-animating virtual pedestrians. The realistically simulated network of smart cameras performs persistent visual surveillance of individual pedestrians with minimal intervention. Our innovative camera control strategy naturally addresses camera aggregation and handoff, is robust against camera and communication failures, and requires no camera calibration, detailed world model, or central controller.
Datasets are an integral part of contemporary object recognition research. They have been the chief reason for the considerable progress in the field, not just as source of large amounts of training data, but also as means of measuring and comparing performance of competing algorithms. At the same time, datasets have often been blamed for narrowing the focus of object recognition research, reducing it to a single benchmark performance number. Indeed, some datasets, that started out as data capture efforts aimed at representing the visual world, have become closed worlds unto themselves (e.g. the Corel world, the Caltech-101 world, the PASCAL VOC world). With the focus on beating the latest benchmark numbers on the latest dataset, have we perhaps lost sight of the original purpose? The goal of this paper is to take stock of the current state of recognition datasets. We present a comparison study using a set of popular datasets, evaluated based on a number of criteria including: relative data bias, cross-dataset generalization, effects of closed-world assumption, and sample value. The experimental results, some rather surprising, suggest directions that can improve dataset collection as well as algorithm evaluation protocols. But more broadly, the hope is to stimulate discussion in the community regarding this very important, but largely neglected issue.
Article
Recent advancements in local methods have significantly improved the collision avoidance behavior of virtual characters. However, existing methods fail to take into account that in real life pedestrians tend to walk in small groups, consisting mainly of pairs or triples of individuals. We present a novel approach to simulate the walking behavior of such small groups. Our model describes how group members interact with each other, with other groups and individuals. We highlight the potential of our method through a wide range of test-case scenarios. We evaluate the results from our simulations using a number of quantitative quality metrics, and also provide visual and numerical comparisons with video footages of real crowds.
Conference Paper
Image features are widely used in computer vision applications. They need to be robust to scene changes and image transformations. Designing and comparing feature descriptors requires the ability to evaluate their performance with respect to those transformations. We want to know how robust the descriptors are to changes in the lighting, scene, or viewing conditions. For this, we need ground truth data of different scenes viewed under different camera or lighting conditions in a controlled way. Such data is very difficult to gather in a real-world setting. We propose using a photorealistic virtual world to gain complete and repeatable control of the environment in order to evaluate image features. We calibrate our virtual world evaluations by comparing against feature rankings made from photographic data of the same subject matter (the Statue of Liberty). We find very similar feature rankings between the two datasets. We then use our virtual world to study the effects on descriptor performance of controlled changes in viewpoint and illumination. We also study the effect of augmenting the descriptors with depth information to improve performance.
Article
Over the last 30 years, video surveillance systems have been a key part of intelligent transportation systems (ITSs), which use various image sensors to capture visual information about vehicles and pedestrians to obtain real-time knowledge of traffic conditions. Specifically, they capture vehicles' visual appearances and support mining more information about them through ve hicle detection, localization, and classification; license plate recognition; vehicle-behavior analysis; and so forth. They also help generate overall vehicle statistics such as estimations of flow rate, average speed, and density. In addition, video surveillance systems can capture pedestrian visual information to support their detection and behavior analysis, especially their interactions with vehicles, which can help identify impending traffic accidents.
Article
Online virtual worlds, electronic environments where people can work and interact in a somewhat realistic manner, have great potential as sites for research in the social, behavioral, and economic sciences, as well as in human-centered computer science. This article uses Second Life and World of Warcraft as two very different examples of current virtual worlds that foreshadow future developments, introducing a number of research methodologies that scientists are now exploring, including formal experimentation, observational ethnography, and quantitative analysis of economic markets or social networks.
Conference Paper
We present smart camera network research in the context of a unique new synthesis of advanced computer graphics and vision simulation technologies. We design and experiment with simulated camera networks within visually and behaviorally realistic virtual environments. Specifically, we demonstrate a smart camera network comprising static and active simulated video surveillance cameras that provides perceptive coverage of a large virtual public space, a train station populated by autonomously self-animating virtual pedestrians. In the context of human surveillance, we propose a camera network control strategy that enables a collection of smart cameras to provide perceptive scene coverage and perform persistent surveillance with minimal intervention. Our novel control strategy naturally addresses camera aggregation and camera handoff, it does not require camera calibration, a detailed world model, or a central controller, and it is robust against camera failures and communication.
Tokyo Virtual Living Lab: designing smart cities based on the 3D Internet): 30−38 25 Karamouzas I, Overmars M. Simulating and evaluating the local behavior of small pedestrian groups
  • H Prendinger
  • K Gajananan
  • A B Zaki
  • A Fares
  • R Molenaar
  • Urbano D Van Lint
  • H Gomaa
24 Prendinger H, Gajananan K, Zaki A B, Fares A, Molenaar R, Urbano D, van Lint H, Gomaa W. Tokyo Virtual Living Lab: designing smart cities based on the 3D Internet. IEEE Internet Computing, 2013, 17(6): 30−38 25 Karamouzas I, Overmars M. Simulating and evaluating the local behavior of small pedestrian groups. IEEE Transactions on Visualization and Computer Graphics, 2012, 18(3):
Understanding real world indoor scenes with synthetic data The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes
  • A Handa
  • V Pˇarˇaceanpˇarˇpˇarˇacean
  • V Badrinarayanan
  • S Stent
  • R Cipolla
  • L Sellart
  • J Materzynska
  • D Vazquez
  • A López
Handa A, PˇarˇaceanPˇarˇPˇarˇacean V, Badrinarayanan V, Stent S, Cipolla R. Understanding real world indoor scenes with synthetic data. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV: IEEE, 2016. 4077−4085 34 Ros G, Sellart L, Materzynska J, Vazquez D, López A M. The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV: IEEE, 2016. 3234−3243 35 Movshovitz-Attias Y, Kanade T, Sheikh Y. How useful is photo-realistic rendering for visual learning? arXiv: 1603.08152, 2016.
Understanding real world indoor scenes with synthetic data
  • A Handa
  • V Pˇarˇaceanpˇarˇpˇarˇacean
  • V Badrinarayanan
  • S Stent
  • R Cipolla
Handa A, PˇarˇaceanPˇarˇPˇarˇacean V, Badrinarayanan V, Stent S, Cipolla R. Understanding real world indoor scenes with synthetic data. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV: IEEE, 2016. 4077−4085
Parallel Transportation Operation Demo Project available: http://www.ndrc.gov.cn/zcfb/zcfbtz
45 Qingdao " Integrated Multi-Mode " Parallel Transportation Operation Demo Project. Notice from National Development and Reform Commission. [Online], available: http://www.ndrc.gov.cn/zcfb/zcfbtz/201608/t20160805 814065.html, August 5, 2016 (,. " " . " + " [Online], http://www.ndrc.gov.cn/zcfb/zcfbtz/201608/t20160805 814065.html, August 5, 2016)
Understanding real world indoor scenes with synthetic data
  • A Handa
  • V Pǎrǎcean
  • V Badrinarayanan
  • S Stent
  • R Cipolla
Handa A, Pǎrǎcean V, Badrinarayanan V, Stent S, Cipolla R. Understanding real world indoor scenes with synthetic data. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV: IEEE, 2016. 4077−4085
Notice from National Development and Reform Commission
  • Qingdao
Qingdao "Integrated Multi-Mode" Parallel Transportation Operation Demo Project. Notice from National Development and Reform Commission. [Online], available: http://www.ndrc.gov.cn/zcfb/zcfbtz/201608/t20160805 814065.html, August 5, 2016 ( , . " " . " +" [Online], http://www.ndrc.gov.cn/zcfb/zcfbtz/201608/t20160805 814065.html, August 5, 2016)