About
173
Publications
29,454
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,651
Citations
Citations since 2017
Introduction
Additional affiliations
January 1996 - present
Publications
Publications (173)
As machine learning (ML) becomes the prominent technology for many emerging problems, dedicated ML computers are being developed at a variety of scales, from clouds to edge devices. However, the heterogeneous, parallel, and multilayer characteristics of conventional ML computers concentrate the cost of development on the software stack, namely, ML...
Due to the broad successes of deep learning, many CPU-centric artificial intelligent computing systems employ specialized devices such as GPUs, FPGAs, and ASICs, which can be named as Deep Learning Processing Units (DLPUs), for processing computation-intensive deep learning tasks. The separation between the scalar control operations mapped on CPUs...
Neural network (NN) processors are specially designed to handle deep learning tasks by utilizing multilayer artificial NNs. They have been demonstrated to be useful in broad application fields such as image recognition, speech processing, machine translation, and scientific computing. Meanwhile, innovative self-aware techniques, whereby a system ca...
Machine learning techniques are pervasive tools for emerging commercial applications and many dedicated machine learning computers on different scales have been deployed in embedded devices, servers, and data centers. Currently, most machine learning computer architectures still focus on optimizing performance and energy efficiency instead of progr...
Edge computing is a continuum that includes the computing resources from cloud to things. Ecosystem of things (EoT) is a subsystem of the ecosystem of edge computing, which potentially contains trillions of devices of things and directly interacts with the physical world. This paper surveys the state of the art of EoT by focusing on the computing i...
Machine learning techniques are pervasive tools for emerging commercial applications and many dedicated machine learning computers on different scales have been deployed in embedded devices, servers, and data centers. Currently, most machine learning computer architectures still focus on optimizing performance and energy efficiency instead of progr...
Computing offloading is a key challenge of new rising computing paradigms of the Internet of Things (IoT) like edge computing, which shifts computations to data sources as near as possible to gain the benefits such as low latency and energy efficiency. However, the fragmentation problem of IoT devices results in a heterogeneous and disordered ecosy...
Mapping global shipping density, including vessel density and traffic density, is important to reveal the distribution of ships and traffic. The Automatic Identification System (AIS) is an automatic reporting system widely installed on ships initially for collision avoidance by reporting their kinematic and identity information continuously. An alg...
Machine Learning (ML) tasks are becoming pervasive in a broad range of applications, and in a broad range of systems (from embedded systems to data centers). As computer architectures evolve toward heterogeneous multi-cores composed of a mix of cores and hardware accelerators, designing hardware accelerators for ML techniques can simultaneously ach...
Many companies are deploying services largely based on machine-learning algorithms for sophisticated processing of large amounts of data, either for consumers or industry. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be computationally and memor...
Functional connectivity, a data-driven modelling of spontaneous fluctuations in activity in spatially segregated brain regions, has emerged as a promising approach to generate hypotheses and features for prediction. The most widely used method for inferring functional connectivity is full correlation, but it cannot differentiate direct and indirect...
Data warehouse systems, like Apache Hive, have been widely used in the distributed computing field. However, current generation data warehouse systems have not fully embraced High Performance Computing (HPC) technologies even though the trend of converging Big Data and HPC is emerging. For example, in traditional HPC field, Message Passing Interfac...
Comparing serially acquired fMRI scans is a typical way to detect functional brain changes in different conditions. However, this approach introduces additional variation on physical and physiological conditions, which results in substantial noise. To improve sensitivity and accuracy of signal detection in such highly noisy fMRI data, potentially i...
As a fundamental task in computer architecture research, performance comparison has been continuously hampered by the variability of computer performance. In traditional performance comparisons, the impact of performance variability is usually ignored (i.e., the means of performance observations are compared regardless of the variability), or in th...
Beacon node placement, node-to-node measurement, and target node positioning
are the three key steps for a localization process. However, compared with the
other two steps, beacon node placement still lacks a comprehensive, systematic
study in research literatures. To fill this gap, we address the Beacon Node
Placment (BNP) problem that deploys bea...
In many real world applications, the information of an object can be obtained
from multiple sources. The sources may provide different point of views based
on their own origin. As a consequence, conflicting pieces of information are
inevitable, which gives rise to a crucial problem: how to find the truth from
these conflicts. Many truth-finding met...
Current digital currency schemes provide instantaneous exchange on precise
commodity, in which "precise" means a buyer can possibly verify the function of
the commodity without error. However, imprecise commodities, e.g. statistical
data, with error existing are abundant in digital world. Existing digital
currency schemes do not offer a mechanism t...
Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computati...
A ring topology is a common solution of network-on-chip (NoC) in industry, but is frequently criticized to have poor scalability. In this paper, we present a novel type of multi-ring NoC called isolated multi-ring (IMR), which can even support chip multi-processors (CMPs) with 1,024 cores. In IMR, any pair of cores are connected via at least one is...
Cloud database usually refers to a database based on the cloud computing technology. However, as far as we know, pre-existing solutions of cloud database cannot integrate the data from multi-sourced heterogeneous databases, only supplying an isolated homogeneous database cluster. This paper presents a new implementation approach for cloud database:...
Architectural Design Space Exploration (DSE) is a notoriously difficult problem due to the exponentially large size of the design space and long simulation times. Previously, many studies proposed to formulate DSE as a regression problem which predicts architecture responses (e.g., time, power) of a given architectural configuration. Several of the...
This contribution presents the application of Dempster-Shafer theory to the prediction of China’s stock market. To be specific, we predicted the most promising industry in the next month every trading day. This prediction can help investors to select stocks, but is rarely seen in previous literatures. Instead of predicting the fluctuation of the st...
Amdahl's second law has been seen as a useful guideline for designing and evaluating balanced computer systems for decades. This law has been mainly used for hardware systems and peak capacities. This paper utilizes Amdahl's second law from a new angle, i.e., Evaluating the influence on systems performance and balance of the application framework s...
Architectural Design Space Exploration (DSE) is a notoriously difficult problem due to the exponentially large size of the design space and long simulation times. Previously, many studies proposed to formulate DSE as a regression problem which predicts architecture responses (e.g., time, power) of a given architectural configuration. Several of the...
MPI has been widely used in High Performance Computing. In contrast, such efficient communication support is lacking in the field of Big Data Computing, where communication is realized by time consuming techniques such as HTTP/RPC. This paper takes a step in bridging these two fields by extending MPI to support Hadoop-like Big Data Computing jobs,...
Signalling network inference is a central problem in system biology. Previous studies investigate this problem by independently inferring local signalling networks and then linking them together via crosstalk. Since a cellular signalling system is in fact indivisible, this reductionistic approach may have an impact on the accuracy of the inference...
Apache Hadoop and Spark are gaining prominence in Big Data processing and
analytics. Both of them are widely deployed on Internet companies. On the other
hand, high-performance data analysis requirements are causing academical and
industrial communities to adopt state-of-the-art technologies in HPC to solve
Big Data problems. Recently, we have prop...
Given a directed graph G and a threshold L(r)L(r) for each node r, the rule of deterministic threshold cascading is that a node r fails if and only if it has at least L(r)L(r) failed in-neighbors. The cascading failure minimization problem is to find at most k edges to delete, such that the number of failed nodes is minimized. We prove an n1−ϵn1−ϵ...
As a main subfield of cloud computing applications, internet services require large-scale data computing. Their workloads can be divided into two classes: customer-facing query-processing interactive tasks that serve hundreds of millions of users within a short response time and backend data analysis batch tasks that involve petabytes of data. Hado...
In the cloud age, heterogeneous application modes on large-scale infrastructures bring about the challenges on resource utilization and manageability to data centers. Many resource and runtime management systems are developed or evolved to address these challenges and relevant problems from different perspectives. This paper tries to identify the m...
The ability of discovering neighboring nodes, namely neighbor discovery, is essential for the self-organization of wireless ad hoc networks. In this paper, we first propose a history-aware adaptive backoff algorithm for neighbor discovery using collision detection and feedback mechanisms. Given successful discovery feedbacks, undiscovered nodes can...
As a main subfield of cloud computing applications, internet services require large-scale data computing. Their workloads can be divided into two classes: customer-facing query-processing interactive tasks that serve hundreds of millions of users within a short response time and backend data analysis batch tasks that involve petabytes of data. Hado...
Checkpointing is the predominant storage driver in today's petascale supercomputers and is expected to remain as such in tomorrow's exascale supercomputers. Users typically prefer to checkpoint into a shared file yet parallel file systems often perform ...
In this paper, we propose an online search system based on Key-Value Store which aims to provide real-time k-NN (k-Nearest Neighbor) search in large-scale high-dimensional vector spaces. Through an improved indexing method based on KD-tree, the vector space is divided into a number of fixed-size heaps, only vectors of a specified heap need to do k-...
With the advent of Internet services, big data and cloud computing, high-throughput computing has generated much research interest, especially on high-throughput cloud servers. However, three basic questions are still not satisfactorily answered: (1) What are the basic metrics (what throughput and high-throughput of what)? (2) What are the main fac...
The first challenge to address is the sobering fact that IT market growth appears to have reached a point of stagnation. The IT market size is measured by the total expenditure on computer and network hardware, software, and services. The second challenge is that inertial and incremental technology progress faces limitations. The International Tech...
Multi-dimensional range queries are fundamental requirements in large scale Internet applications using Distributed Ordered Tables. Apache Cassandra is a Distributed Ordered Table when it employs order-preserving hashing as data partitioner. Cassandra supports multi-dimensional range queries with poor performance and with a limitation that there mu...
This paper formulates and studies the problem of accurately acquiring energy consumption information of physical objects influenced by human behavior into the cyber space. We formulate this input-sensing problem within the ternary computing framework, which allows real-world problem instances to be studied, with different constraints on human effor...
The Message Passing Interface (MPI) standard and its implementations (such as MPICH and OpenMPI) have been widely used in the high-performance computing area to provide an efficient communication infrastructure. This paper investigates whether MPI can be adapted to the data intensive computing area to substantially speed up Hadoop and MapReduce app...
In this paper, we study in-vehicle digital network systems. We propose a switch-based architecture for in-vehicle networks and focus on the critical related issue: routing schemes. We note that in-vehicle networks are fundamentally different from many other switch-based networks (e.g., the Internet). This is because within an in-vehicle network, me...
In large organizations or IDCs, different departments always occupy and maintain dedicated resources to satisfy their or their customers' heterogeneous application loads. This situation easily makes the infrastructure management a repeated and inefficient work. Even worse, it is difficult to share the resources owned by different departments even w...
MapReduce-based data warehouse systems are playing important roles of supporting big data analytics to understand quickly the dynamics of user behavior trends and their needs in typical Web service providers and social network sites (e.g., Facebook). In such a system, the data placement structure is a critical factor that can affect the warehouse p...
Ever-increasing design complexity and advances of technology impose great challenges on the design of modern microprocessors. One such challenge is to determine promising microprocessor configurations to meet specific design constraints, which is called Design Space Exploration (DSE). In the computer architecture community, supervised learning tech...
Currently, major Internet services are designed for consumer/personal usage and it tightly depends on service provider, which shows trends on monopolization. In this paper, we propose a new Internet service model, PHCMM, based on systematic and thorough summary of contents' lifecycle. Equipped with PHCMM, we identify and analyze existing Internet s...
Massive scale distributed database like Google’s BigTable and Yahoo!’s PNUTS can be modeled as Distributed Ordered Table,
or DOT, which partitions data regions and supports range queries on key. Multi-dimensional range queries on DOTs are fundamental
requirements; however, none of existing schemes work well while considering three critical issues:...
Wei Li Deng Li Shuhui Yang- [...]
Wei Zhao
In this paper, we propose and analyze a new GPS positioning algorithm. Our algorithm uses the direct linearization technique to reduce the computation time overhead. We invoke the general least squares method in order to achieve optimality in the situation when the trilateration system of equations becomes over-determined. We systematically evaluat...
Server utilization is typically low (10%-30%) in today’s datacenters (or clouds), especially when executing computational jobs with deadlines. Previous studies have shown that it is difficult to improve utilization above 20% without significantly increasing the failure rate of job execution. It is still unknown how to increase utilization while mai...
This paper focuses on the problem of improving throughput of distributed query processing in an RDBMS-based data integration system. Although a buffer pool can be used in an RDBMS to cache disk pages in memory to reduce disk accesses, it cannot be used for data integration queries since its foundation, the memory-disk hierarchy, does not exist. The...
This paper reviews the programming landscape for parallel and network computing systems, focusing on four styles of concurrent
programming models, and example languages/libraries. The four styles correspond to four scales of the targeted systems. At
the smallest coprocessor scale, Single Instruction Multiple Thread (SIMT) and Compute Unified Device...
Message-based debugging facilities for Web or Grid Services are separated from an infrastructure of source level debugging and can work in a self-identifying and coexisting mode within a normal services container. In this paper, we discuss problems for services debugging and approaches we take. We present the operational model and context inspectio...
The rendezvous is a type of distributed decision tasks including many well-known tasks such as set agreement, simplex agreement, and approximation agreement. An n-dimensional rendezvous task, n≥1, allows n+2 distinct input values, and each execution produces at most n+2 distinct output values. A rendezvous task is said to implement another if an in...
As we enter the 21st century, a profound transformation is emerging in the field of computer science and technology, this is also true for the subfield of computer systems. The main characteristic of this transformation is the leap from man-Computer symbiosis to man-cyber-physical society (a tri-world of people, computers, and things). This raises...
GSML is a programming language that has been designed for grid end-users to overcome the programming hurdle and the high learning curve associated with Grid infrastructures that are complex distributed computing systems. This paper defines its formal semantics in terms of a chemical programming language called HOCL. This translation of GSML program...
A sustainable market-like computational grid has two characteristics: it must allow resource providers and resource consumers to make autonomous scheduling decisions, and both parties of providers and consumers must have sufficient incentives to stay and play in the market. In this paper, we formulate this intuition of optimizing incentives for bot...
Multi-attribute range queries on top of P2P networks have attracted much attention. Such research has direct application in grid resource monitoring and discovery. In existing research, the overheads (number of hops and number of messages required) of query algorithms depend on both the size of range to be queried and the number of peers, and a hig...
This paper presents research work conducted at the Chinese Academy of Sciences, on the Vega Grid technology and dynamic geometry technology, and how the two can integrate to provide a dynamic geometry education system based on grid technology. Such an approach could help solve the interconnect problem, the performance problem and the intellectual p...
A long-term trend in computing platform innovation is the appearance of a new class of platform every 15 years or so, that
drastically reduces barriers and expands user base. We have seen this trend in computer’s 60-year history several times, with
inventions like mainframe, personal computer (PC), Internet, and Web. To explore opportunities brough...
In this paper, we first introduce some issues that are encountered in building a service debugger and briefly describe our approach to addressing them. Next, we outline some debugging modes and components of a simple composite debugger. Then, we mainly describe its message-based front-end and back-end, which are a co-existing, self-identifying, and...
The International Nucleotide Sequence Database Collaboration (INSDC) exchanges sequence data on a daily basis across its three
member organizations in the USA, UK and Japan. This paper studies how this sequence database in MySQL can best take advantage
of the increased transfer bandwidth of a Grid-optimized data communication protocol. Within the c...
An asynchronous distributed system consisting of a collection of processes interacting via accessing shared services or variables.
Failure-tolerant computability for such systems is an important issue, but too little attention has been paid to the case
where the services themselves can fail. Recently, it’s proved that consensus problem can’t be (f+...
Trend towards providing heterogeneous services concurrently by ISPs and low utilization of servers make it necessary to consolidate various services computing into a single platform. In such a shared environment, meeting application-level QoS goals and ...
Anycast routing is very useful for many applications such as resource discovery in delay tolerant networks (DTNs). In this paper, based on a new DTN model, we first analyze the any-cast semantics for DTNs. Then we present a novel metric named EMDDA (expected multi-destination delay for anycast) and a corresponding routing algorithm for anycast rout...
In this talk, the speaker will review the history and trends of high-performance computing from the users’ viewpoint. Evolutional
milestones in workload, usage modes, programming models and systems architectures will be identified. Essential challenges
and bottlenecks will be analyzed. He will highlight the newly formed e-Nation strategy for China...
With the rapid development of high-speed wide-area networks and powerful yet low-cost computational resources, Grid computing has emerged as an attractive computing paradigm. In typical Grid environments, there are two distinct parties, resource consumers ...
With the rapid development of high-speed wide-area networks and powerful yet low-cost computational resources, Grid computing has emerged as an attractive computing paradigm. In typical Grid environments, there are two distinct parties, resource consumers and resource providers. Enabling an effective interaction between the two parties (i.e. schedu...
Service-computing is the computing paradigm that utilizes services as building blocks for developing applications or solutions. Because simplicity of services, applications, and their interaction are critical for low cost and high productivity of service-computing, this paper tries to explore a proper bound of the simplicity. By proving that any Tu...
This paper presents a language-based approach to service deployment. The language is called Abacus, which is a service-oriented programming language for grid applications. In Abacus, a service is abstracted as a basic language construct, and service deployment is expressed by a deployment statement. This approach allows an Abacus application to aut...
In this paper, we propose four general queueing models based on input and server distributions, to analyze a special grid system, VEGA grid system version 1.1 (VEGA1.1). The mean queue lengths and mean waiting times of these models are deduced. The two classic applications, the computing-oriented application (blast computing) and online transaction...
Grid and service computing technologies have been explored by enterprises to promote integration, sharing, and collaboration. However, quick response to business environment changes is still a challenging issue. For end users, developing, customizing, and reengineering applications remain a difficult and timeconsuming task. Users still need to deal...
The China National Grid project developed and deployed a suite of grid system software called CNGrid Software. This paper
presents the features and implementation of the software suite from the viewpoints of grid system deployment, grid application
developers, grid resource providers, grid system administrators, and the end users.
For end users, building applications with current Grid programming paradigms still remains a difficult and time-consuming
task by dealing with excessive low-level details of provided APIs. We present a high-level application description language
called Grid Service Markup Language (GSML) and its supporting development environment, to facilitate end...
With the dramatic development of grid technologies, performance analysis and prediction of grid systems is increasingly significant
to develop a variety of new grid technologies. The VEGA grid, a new grid infrastructure developed by Institute of Computing
Technology, CAS, views a grid as a distributed computer system. In this paper, we propose some...