Zhiwei Xu

Zhiwei Xu
Chinese Academy of Sciences | CAS · Institute of Computing Technology

About

173
Publications
29,454
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,651
Citations
Citations since 2017
13 Research Items
1815 Citations
2017201820192020202120222023050100150200250300
2017201820192020202120222023050100150200250300
2017201820192020202120222023050100150200250300
2017201820192020202120222023050100150200250300
Additional affiliations
January 1996 - present
Chinese Academy of Sciences
Position
  • Professor (Full)

Publications

Publications (173)
Article
Full-text available
As machine learning (ML) becomes the prominent technology for many emerging problems, dedicated ML computers are being developed at a variety of scales, from clouds to edge devices. However, the heterogeneous, parallel, and multilayer characteristics of conventional ML computers concentrate the cost of development on the software stack, namely, ML...
Article
Due to the broad successes of deep learning, many CPU-centric artificial intelligent computing systems employ specialized devices such as GPUs, FPGAs, and ASICs, which can be named as Deep Learning Processing Units (DLPUs), for processing computation-intensive deep learning tasks. The separation between the scalar control operations mapped on CPUs...
Article
Neural network (NN) processors are specially designed to handle deep learning tasks by utilizing multilayer artificial NNs. They have been demonstrated to be useful in broad application fields such as image recognition, speech processing, machine translation, and scientific computing. Meanwhile, innovative self-aware techniques, whereby a system ca...
Article
Machine learning techniques are pervasive tools for emerging commercial applications and many dedicated machine learning computers on different scales have been deployed in embedded devices, servers, and data centers. Currently, most machine learning computer architectures still focus on optimizing performance and energy efficiency instead of progr...
Article
Edge computing is a continuum that includes the computing resources from cloud to things. Ecosystem of things (EoT) is a subsystem of the ecosystem of edge computing, which potentially contains trillions of devices of things and directly interacts with the physical world. This paper surveys the state of the art of EoT by focusing on the computing i...
Conference Paper
Machine learning techniques are pervasive tools for emerging commercial applications and many dedicated machine learning computers on different scales have been deployed in embedded devices, servers, and data centers. Currently, most machine learning computer architectures still focus on optimizing performance and energy efficiency instead of progr...
Article
Computing offloading is a key challenge of new rising computing paradigms of the Internet of Things (IoT) like edge computing, which shifts computations to data sources as near as possible to gain the benefits such as low latency and energy efficiency. However, the fragmentation problem of IoT devices results in a heterogeneous and disordered ecosy...
Article
Full-text available
Mapping global shipping density, including vessel density and traffic density, is important to reveal the distribution of ships and traffic. The Automatic Identification System (AIS) is an automatic reporting system widely installed on ships initially for collision avoidance by reporting their kinematic and identity information continuously. An alg...
Article
Machine Learning (ML) tasks are becoming pervasive in a broad range of applications, and in a broad range of systems (from embedded systems to data centers). As computer architectures evolve toward heterogeneous multi-cores composed of a mix of cores and hardware accelerators, designing hardware accelerators for ML techniques can simultaneously ach...
Article
Many companies are deploying services largely based on machine-learning algorithms for sophisticated processing of large amounts of data, either for consumers or industry. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be computationally and memor...
Conference Paper
Functional connectivity, a data-driven modelling of spontaneous fluctuations in activity in spatially segregated brain regions, has emerged as a promising approach to generate hypotheses and features for prediction. The most widely used method for inferring functional connectivity is full correlation, but it cannot differentiate direct and indirect...
Article
Data warehouse systems, like Apache Hive, have been widely used in the distributed computing field. However, current generation data warehouse systems have not fully embraced High Performance Computing (HPC) technologies even though the trend of converging Big Data and HPC is emerging. For example, in traditional HPC field, Message Passing Interfac...
Article
Comparing serially acquired fMRI scans is a typical way to detect functional brain changes in different conditions. However, this approach introduces additional variation on physical and physiological conditions, which results in substantial noise. To improve sensitivity and accuracy of signal detection in such highly noisy fMRI data, potentially i...
Article
As a fundamental task in computer architecture research, performance comparison has been continuously hampered by the variability of computer performance. In traditional performance comparisons, the impact of performance variability is usually ignored (i.e., the means of performance observations are compared regardless of the variability), or in th...
Article
Full-text available
Beacon node placement, node-to-node measurement, and target node positioning are the three key steps for a localization process. However, compared with the other two steps, beacon node placement still lacks a comprehensive, systematic study in research literatures. To fill this gap, we address the Beacon Node Placment (BNP) problem that deploys bea...
Article
Full-text available
In many real world applications, the information of an object can be obtained from multiple sources. The sources may provide different point of views based on their own origin. As a consequence, conflicting pieces of information are inevitable, which gives rise to a crucial problem: how to find the truth from these conflicts. Many truth-finding met...
Article
Full-text available
Current digital currency schemes provide instantaneous exchange on precise commodity, in which "precise" means a buyer can possibly verify the function of the commodity without error. However, imprecise commodities, e.g. statistical data, with error existing are abundant in digital world. Existing digital currency schemes do not offer a mechanism t...
Article
Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computati...
Article
A ring topology is a common solution of network-on-chip (NoC) in industry, but is frequently criticized to have poor scalability. In this paper, we present a novel type of multi-ring NoC called isolated multi-ring (IMR), which can even support chip multi-processors (CMPs) with 1,024 cores. In IMR, any pair of cores are connected via at least one is...
Article
Cloud database usually refers to a database based on the cloud computing technology. However, as far as we know, pre-existing solutions of cloud database cannot integrate the data from multi-sourced heterogeneous databases, only supplying an isolated homogeneous database cluster. This paper presents a new implementation approach for cloud database:...
Article
Architectural Design Space Exploration (DSE) is a notoriously difficult problem due to the exponentially large size of the design space and long simulation times. Previously, many studies proposed to formulate DSE as a regression problem which predicts architecture responses (e.g., time, power) of a given architectural configuration. Several of the...
Conference Paper
This contribution presents the application of Dempster-Shafer theory to the prediction of China’s stock market. To be specific, we predicted the most promising industry in the next month every trading day. This prediction can help investors to select stocks, but is rarely seen in previous literatures. Instead of predicting the fluctuation of the st...
Conference Paper
Amdahl's second law has been seen as a useful guideline for designing and evaluating balanced computer systems for decades. This law has been mainly used for hardware systems and peak capacities. This paper utilizes Amdahl's second law from a new angle, i.e., Evaluating the influence on systems performance and balance of the application framework s...
Conference Paper
Architectural Design Space Exploration (DSE) is a notoriously difficult problem due to the exponentially large size of the design space and long simulation times. Previously, many studies proposed to formulate DSE as a regression problem which predicts architecture responses (e.g., time, power) of a given architectural configuration. Several of the...
Conference Paper
MPI has been widely used in High Performance Computing. In contrast, such efficient communication support is lacking in the field of Big Data Computing, where communication is realized by time consuming techniques such as HTTP/RPC. This paper takes a step in bridging these two fields by extending MPI to support Hadoop-like Big Data Computing jobs,...
Article
Full-text available
Signalling network inference is a central problem in system biology. Previous studies investigate this problem by independently inferring local signalling networks and then linking them together via crosstalk. Since a cellular signalling system is in fact indivisible, this reductionistic approach may have an impact on the accuracy of the inference...
Conference Paper
Full-text available
Apache Hadoop and Spark are gaining prominence in Big Data processing and analytics. Both of them are widely deployed on Internet companies. On the other hand, high-performance data analysis requirements are causing academical and industrial communities to adopt state-of-the-art technologies in HPC to solve Big Data problems. Recently, we have prop...
Article
Given a directed graph G and a threshold L(r)L(r) for each node r, the rule of deterministic threshold cascading is that a node r fails if and only if it has at least L(r)L(r) failed in-neighbors. The cascading failure minimization problem is to find at most k edges to delete, such that the number of failed nodes is minimized. We prove an n1−ϵn1−ϵ...
Article
As a main subfield of cloud computing applications, internet services require large-scale data computing. Their workloads can be divided into two classes: customer-facing query-processing interactive tasks that serve hundreds of millions of users within a short response time and backend data analysis batch tasks that involve petabytes of data. Hado...
Article
Full-text available
In the cloud age, heterogeneous application modes on large-scale infrastructures bring about the challenges on resource utilization and manageability to data centers. Many resource and runtime management systems are developed or evolved to address these challenges and relevant problems from different perspectives. This paper tries to identify the m...
Article
Full-text available
The ability of discovering neighboring nodes, namely neighbor discovery, is essential for the self-organization of wireless ad hoc networks. In this paper, we first propose a history-aware adaptive backoff algorithm for neighbor discovery using collision detection and feedback mechanisms. Given successful discovery feedbacks, undiscovered nodes can...
Chapter
As a main subfield of cloud computing applications, internet services require large-scale data computing. Their workloads can be divided into two classes: customer-facing query-processing interactive tasks that serve hundreds of millions of users within a short response time and backend data analysis batch tasks that involve petabytes of data. Hado...
Conference Paper
Checkpointing is the predominant storage driver in today's petascale supercomputers and is expected to remain as such in tomorrow's exascale supercomputers. Users typically prefer to checkpoint into a shared file yet parallel file systems often perform ...
Conference Paper
In this paper, we propose an online search system based on Key-Value Store which aims to provide real-time k-NN (k-Nearest Neighbor) search in large-scale high-dimensional vector spaces. Through an improved indexing method based on KD-tree, the vector space is divided into a number of fixed-size heaps, only vectors of a specified heap need to do k-...
Article
With the advent of Internet services, big data and cloud computing, high-throughput computing has generated much research interest, especially on high-throughput cloud servers. However, three basic questions are still not satisfactorily answered: (1) What are the basic metrics (what throughput and high-throughput of what)? (2) What are the main fac...
Article
Full-text available
The first challenge to address is the sobering fact that IT market growth appears to have reached a point of stagnation. The IT market size is measured by the total expenditure on computer and network hardware, software, and services. The second challenge is that inertial and incremental technology progress faces limitations. The International Tech...
Conference Paper
Multi-dimensional range queries are fundamental requirements in large scale Internet applications using Distributed Ordered Tables. Apache Cassandra is a Distributed Ordered Table when it employs order-preserving hashing as data partitioner. Cassandra supports multi-dimensional range queries with poor performance and with a limitation that there mu...
Conference Paper
This paper formulates and studies the problem of accurately acquiring energy consumption information of physical objects influenced by human behavior into the cyber space. We formulate this input-sensing problem within the ternary computing framework, which allows real-world problem instances to be studied, with different constraints on human effor...
Conference Paper
The Message Passing Interface (MPI) standard and its implementations (such as MPICH and OpenMPI) have been widely used in the high-performance computing area to provide an efficient communication infrastructure. This paper investigates whether MPI can be adapted to the data intensive computing area to substantially speed up Hadoop and MapReduce app...
Conference Paper
Full-text available
In this paper, we study in-vehicle digital network systems. We propose a switch-based architecture for in-vehicle networks and focus on the critical related issue: routing schemes. We note that in-vehicle networks are fundamentally different from many other switch-based networks (e.g., the Internet). This is because within an in-vehicle network, me...
Conference Paper
Full-text available
In large organizations or IDCs, different departments always occupy and maintain dedicated resources to satisfy their or their customers' heterogeneous application loads. This situation easily makes the infrastructure management a repeated and inefficient work. Even worse, it is difficult to share the resources owned by different departments even w...
Conference Paper
Full-text available
MapReduce-based data warehouse systems are playing important roles of supporting big data analytics to understand quickly the dynamics of user behavior trends and their needs in typical Web service providers and social network sites (e.g., Facebook). In such a system, the data placement structure is a critical factor that can affect the warehouse p...
Conference Paper
Full-text available
Ever-increasing design complexity and advances of technology impose great challenges on the design of modern microprocessors. One such challenge is to determine promising microprocessor configurations to meet specific design constraints, which is called Design Space Exploration (DSE). In the computer architecture community, supervised learning tech...
Conference Paper
Currently, major Internet services are designed for consumer/personal usage and it tightly depends on service provider, which shows trends on monopolization. In this paper, we propose a new Internet service model, PHCMM, based on systematic and thorough summary of contents' lifecycle. Equipped with PHCMM, we identify and analyze existing Internet s...
Conference Paper
Massive scale distributed database like Google’s BigTable and Yahoo!’s PNUTS can be modeled as Distributed Ordered Table, or DOT, which partitions data regions and supports range queries on key. Multi-dimensional range queries on DOTs are fundamental requirements; however, none of existing schemes work well while considering three critical issues:...
Conference Paper
Full-text available
In this paper, we propose and analyze a new GPS positioning algorithm. Our algorithm uses the direct linearization technique to reduce the computation time overhead. We invoke the general least squares method in order to achieve optimality in the situation when the trilateration system of equations becomes over-determined. We systematically evaluat...
Article
Server utilization is typically low (10%-30%) in today’s datacenters (or clouds), especially when executing computational jobs with deadlines. Previous studies have shown that it is difficult to improve utilization above 20% without significantly increasing the failure rate of job execution. It is still unknown how to increase utilization while mai...
Article
This paper focuses on the problem of improving throughput of distributed query processing in an RDBMS-based data integration system. Although a buffer pool can be used in an RDBMS to cache disk pages in memory to reduce disk accesses, it cannot be used for data integration queries since its foundation, the memory-disk hierarchy, does not exist. The...
Article
This paper reviews the programming landscape for parallel and network computing systems, focusing on four styles of concurrent programming models, and example languages/libraries. The four styles correspond to four scales of the targeted systems. At the smallest coprocessor scale, Single Instruction Multiple Thread (SIMT) and Compute Unified Device...
Conference Paper
Message-based debugging facilities for Web or Grid Services are separated from an infrastructure of source level debugging and can work in a self-identifying and coexisting mode within a normal services container. In this paper, we discuss problems for services debugging and approaches we take. We present the operational model and context inspectio...
Article
The rendezvous is a type of distributed decision tasks including many well-known tasks such as set agreement, simplex agreement, and approximation agreement. An n-dimensional rendezvous task, n≥1, allows n+2 distinct input values, and each execution produces at most n+2 distinct output values. A rendezvous task is said to implement another if an in...
Article
As we enter the 21st century, a profound transformation is emerging in the field of computer science and technology, this is also true for the subfield of computer systems. The main characteristic of this transformation is the leap from man-Computer symbiosis to man-cyber-physical society (a tri-world of people, computers, and things). This raises...
Article
GSML is a programming language that has been designed for grid end-users to overcome the programming hurdle and the high learning curve associated with Grid infrastructures that are complex distributed computing systems. This paper defines its formal semantics in terms of a chemical programming language called HOCL. This translation of GSML program...
Article
Full-text available
A sustainable market-like computational grid has two characteristics: it must allow resource providers and resource consumers to make autonomous scheduling decisions, and both parties of providers and consumers must have sufficient incentives to stay and play in the market. In this paper, we formulate this intuition of optimizing incentives for bot...
Conference Paper
Multi-attribute range queries on top of P2P networks have attracted much attention. Such research has direct application in grid resource monitoring and discovery. In existing research, the overheads (number of hops and number of messages required) of query algorithms depend on both the size of range to be queried and the number of peers, and a hig...
Chapter
This paper presents research work conducted at the Chinese Academy of Sciences, on the Vega Grid technology and dynamic geometry technology, and how the two can integrate to provide a dynamic geometry education system based on grid technology. Such an approach could help solve the interconnect problem, the performance problem and the intellectual p...
Conference Paper
Full-text available
A long-term trend in computing platform innovation is the appearance of a new class of platform every 15 years or so, that drastically reduces barriers and expands user base. We have seen this trend in computer’s 60-year history several times, with inventions like mainframe, personal computer (PC), Internet, and Web. To explore opportunities brough...
Conference Paper
Full-text available
In this paper, we first introduce some issues that are encountered in building a service debugger and briefly describe our approach to addressing them. Next, we outline some debugging modes and components of a simple composite debugger. Then, we mainly describe its message-based front-end and back-end, which are a co-existing, self-identifying, and...
Article
The International Nucleotide Sequence Database Collaboration (INSDC) exchanges sequence data on a daily basis across its three member organizations in the USA, UK and Japan. This paper studies how this sequence database in MySQL can best take advantage of the increased transfer bandwidth of a Grid-optimized data communication protocol. Within the c...
Conference Paper
An asynchronous distributed system consisting of a collection of processes interacting via accessing shared services or variables. Failure-tolerant computability for such systems is an important issue, but too little attention has been paid to the case where the services themselves can fail. Recently, it’s proved that consensus problem can’t be (f+...
Conference Paper
Trend towards providing heterogeneous services concurrently by ISPs and low utilization of servers make it necessary to consolidate various services computing into a single platform. In such a shared environment, meeting application-level QoS goals and ...
Conference Paper
Anycast routing is very useful for many applications such as resource discovery in delay tolerant networks (DTNs). In this paper, based on a new DTN model, we first analyze the any-cast semantics for DTNs. Then we present a novel metric named EMDDA (expected multi-destination delay for anycast) and a corresponding routing algorithm for anycast rout...
Conference Paper
In this talk, the speaker will review the history and trends of high-performance computing from the users’ viewpoint. Evolutional milestones in workload, usage modes, programming models and systems architectures will be identified. Essential challenges and bottlenecks will be analyzed. He will highlight the newly formed e-Nation strategy for China...
Article
With the rapid development of high-speed wide-area networks and powerful yet low-cost computational resources, Grid computing has emerged as an attractive computing paradigm. In typical Grid environments, there are two distinct parties, resource consumers ...
Article
With the rapid development of high-speed wide-area networks and powerful yet low-cost computational resources, Grid computing has emerged as an attractive computing paradigm. In typical Grid environments, there are two distinct parties, resource consumers and resource providers. Enabling an effective interaction between the two parties (i.e. schedu...
Conference Paper
Service-computing is the computing paradigm that utilizes services as building blocks for developing applications or solutions. Because simplicity of services, applications, and their interaction are critical for low cost and high productivity of service-computing, this paper tries to explore a proper bound of the simplicity. By proving that any Tu...
Conference Paper
Full-text available
This paper presents a language-based approach to service deployment. The language is called Abacus, which is a service-oriented programming language for grid applications. In Abacus, a service is abstracted as a basic language construct, and service deployment is expressed by a deployment statement. This approach allows an Abacus application to aut...
Conference Paper
In this paper, we propose four general queueing models based on input and server distributions, to analyze a special grid system, VEGA grid system version 1.1 (VEGA1.1). The mean queue lengths and mean waiting times of these models are deduced. The two classic applications, the computing-oriented application (blast computing) and online transaction...
Conference Paper
Full-text available
Grid and service computing technologies have been explored by enterprises to promote integration, sharing, and collaboration. However, quick response to business environment changes is still a challenging issue. For end users, developing, customizing, and reengineering applications remain a difficult and timeconsuming task. Users still need to deal...
Conference Paper
Full-text available
The China National Grid project developed and deployed a suite of grid system software called CNGrid Software. This paper presents the features and implementation of the software suite from the viewpoints of grid system deployment, grid application developers, grid resource providers, grid system administrators, and the end users.
Conference Paper
For end users, building applications with current Grid programming paradigms still remains a difficult and time-consuming task by dealing with excessive low-level details of provided APIs. We present a high-level application description language called Grid Service Markup Language (GSML) and its supporting development environment, to facilitate end...
Conference Paper
With the dramatic development of grid technologies, performance analysis and prediction of grid systems is increasingly significant to develop a variety of new grid technologies. The VEGA grid, a new grid infrastructure developed by Institute of Computing Technology, CAS, views a grid as a distributed computer system. In this paper, we propose some...