Jie Tao

Jie Tao
Karlsruhe Institute of Technology | KIT · Steinbuch Centre for Computing

phD

About

134
Publications
30,076
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,929
Citations
Citations since 2017
3 Research Items
1077 Citations
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
Additional affiliations
October 2003 - present
Karlsruhe Institute of Technology
Position
  • Senior researcher, project leader
Description
  • Research & development, project planing, project proposal coordination, Ph.D/master supervising, software development in: Cloud computing, Big Data, Parallel and distributed systems, HPC, Parallel programming models, Performance tools, Simulation.
April 1999 - September 2003
Technische Universität München
Position
  • Doktoral researcher, research associate
July 1989 - March 1999
Jilin University
Position
  • lecturer, associate professor
Education
April 1999 - October 2002
Technische Universität München
Field of study
  • computer science, phD
September 1982 - July 1989
Jilin University
Field of study
  • computer science, bachlor & master

Publications

Publications (134)
Article
Full-text available
Resource sharing can gain economies of scale and increase utilization of cloud infrastructure, a critical challenge of which is how to design efficient resource sharing solutions among self-interested cloud providers. Cloud federation can realize resource sharing, but the existing methods of forming federation need complex computation to guarantee...
Article
The selection of cloud resources is important to users which influences their utility directly. To improve the users' utility of purchased resources, according to the historical bidding information, we present data-driven cloud resource procurement (CRP) auctions which can help the resource buyer make an optimized selection for cloud resources. Fir...
Article
Full-text available
Distributed data storage has received more attention due to its advantages in reliability, availability and scalability, and it brings both opportunities and challenges for distributed data storage transaction. The traditional transaction system of storage resources, which generally runs in a centralized mode, results in high cost, vendor lock-in a...
Article
During the last decade, cloud-technology has presented considerable opportunities for high-performance computing(HPC). In addition, technical computing data centers have been able to maximize their return on investment(ROI). HPC system managers can leverage the benefits of a cloud model for their traditional HPC environments to improve scalability,...
Article
Today, we are observing a transition of science paradigms from the computational science to data-intensive science. With the exponential increase of input and intermediate data, more applications are developed using the MapReduce programming model, which is regarded as an appropriate programming model for analysing large data sets. A MapReduce fram...
Article
Internet of Things (IoT), a part of Future Internet, comprises many billions of Internet connected Objects (ICOs) or ‘things’ where things can sense, communicate, compute and potentially actuate as well as have intelligence, multi-modal interfaces, physical/ virtual identities and attributes. The IoT vision has recently given rise to emerging IoT b...
Article
Cloud computing provided as a utility has emerged in recent years. Resource allocation mechanisms play a critical role toward the success of cloud computing. Maximization of social welfare is the reasonable objective of private clouds. Cloud resources are expiring goods, and users come to cloud randomly. However, because of the dynamic behavior of...
Article
Full-text available
Dynamic VMs allocation plays an important role in resource allocation of cloud computing. In general, a cloud provider needs both to maximize the efficiency of resource and to improve the satisfaction of in-house users simultaneously. However, industrial experience has often shown only maximizing the efficiency of resources and providing poor or li...
Article
Monitoring of the system performance in highly distributed computing environments is a wide research area. In cloud and grid computing, it is usually restricted to the utilization and reliability of the resources. However, in today’s Computational Grids (CGs) and Clouds (CCs), the end users may define the special personal requirements and preferenc...
Article
With the increasing complexity of both data structures and computer architectures, the performance of applications needs fine tuning in order to achieve the expected runtime execution time. Performance tuning is traditionally based on the analysis of performance data. The analysis results may not be accurate, depending on the quality of the data an...
Article
The fact that cloud computing is widely accepted results in an increasing number of cloud providers. Customers are now burdened with the task of deciding which provider to choose for serving their requirements. This work developed a broker-based framework capable of automatically selecting cloud services based on user-defined requirement parameters...
Conference Paper
Full-text available
A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of developing parallel applications, because no explicit communication has to be...
Article
Computing Clouds offer a new way of using IT facilities including the hardware, storage, applications and networks. The huge resource pool on the Cloud forms an appropriate platform for running applications with both computing and data intensity, like the DNA sequencing workflows. This paper studies the topic of running scientific workflows on mult...
Article
Full-text available
Scheduling virtual machines is a major research topic for cloud computing, because it directly influences the performance, the operation cost and the quality of services. A large cloud center is normally equipped with several hundred thousand physical machines. The mission of the scheduler is to select the best one to host a virtual machine. This i...
Conference Paper
Full-text available
DASH is a realization of the PGAS (partitioned global address space) model in the form of a C++ template library. Operator overloading is used to provide global-view PGAS semantics without the need for a custom PGAS (pre-)compiler. The DASH library is implemented on top of our runtime system DART, which provides an abstraction layer on top of exist...
Article
MapReduce is regarded as an adequate programming model for large-scale data-intensive applications. The Hadoop framework is a well-known MapReduce implementation that runs the MapReduce tasks on a cluster system. G-Hadoop is an extension of the Hadoop MapReduce framework with the functionality of allowing the MapReduce tasks to run on multiple clus...
Conference Paper
Full-text available
Task scheduling and resource allocations are the key issues for computational grids. Distributed resources usually work at different autonomous domains with their own access and security policies that impact successful job executions across the domain boundaries. In this paper, we propose an Artificial Neural Network (ANN) approach for supporting t...
Conference Paper
Large scale applications are emerged as one of the important applications in distributed computing. Today, the economic and technical benefits offered by the Cloud computing technology encouraged many users to migrate their applications to Cloud. On the other hand, the variety of the existing Clouds requires them to make decisions about which provi...
Article
Cloud Computing introduces a novel computing paradigm that allows the users to run their applications on a customized environment using on-demand resources. This novel computing concept is enabled by several technologies including the Web, virtualization, distributed file systems as well as parallel programming models. For parallel computing on the...
Article
The virtualisation technology has been widely used today in various research fields, including high performance computing, grid computing, cloud computing as well as server-client systems. The virtualisation introduces advantages such as on-demand customised resource provision, easy management and support for multiple Operating Systems, etc. Howeve...
Conference Paper
Full-text available
Virtualization is one of the key technologies that enable Cloud Computing, a novel computing paradigm aiming at provisioning on-demand computing capacities as services. With the special features of self-service and pay-as-you-use, Cloud Computing is attracting not only personal users but also small and middle enterprises. By running applications on...
Article
The Software as a Service (SaaS) methodology is a key paradigm of Cloud computing. In this paper, we focus on an interesting topic—to dynamically host services on existing production Grid infrastructures. In general, production Grids normally employ a Job-Submission-Execution (JSE) model with rigid access interfaces. In this paper, we implement the...
Conference Paper
Full-text available
The data sets produced in our daily life is getting larger and larger. How to manage and analyze such big data is currently a grand challenge for scientists in various research fields. MapReduce is regarded as an appropriate programming model for processing such big data. However, the users or developers still need to efficiently program appropriat...
Conference Paper
Performance tuning is a common topic in the research domain High Performance Computing. Currently, various tools have been developed to help programmers understand the runtime execution behavior of their applications. It is clear that such tools are also required for performance analysis on virtual machines, where applications, together with their...
Article
Recently, the computational requirements for large-scale data-intensive analysis of scientific data have grown significantly. In High Energy Physics (HEP) for example, the Large Hadron Collider (LHC) produced 13 petabytes of data in 2010. This huge amount of data is processed on more than 140 computing centers distributed across 34 countries. The M...
Conference Paper
Computational science workflows have been successfully run on traditional HPC systems like clusters and Grids for many years. Today, users are interested to execute their workflow applications in the Cloud to exploit the economic and technical benefits of this new emerging technology. The deployment and management of workflows over the current exis...
Article
Recently, the computational requirements for large-scale data-intensive analysis of scientific data have grown significantly. In High Energy Physics (HEP) for example, the Large Hadron Collider (LHC) produced 13 petabytes of data in 2010. This huge amount of data are processed on more than 140 computing centers distributed across 34 countries. The...
Conference Paper
Remote Sensing (RS) data processing is characterized by massive remote sensing images and increasing amount of algorithms of higher complexity. Parallel programming for data-intensive applications like massive remote sensing image processing on parallel systems is bound to be especially trivial and challenging. We propose a C++ template mechanism e...
Conference Paper
Full-text available
The lack of common standards in a fast emerging Cloud computing market over the last years resulted in “vendor lock in” and interoperability issues across heterogeneous Cloud platforms. Therefore, the Cloud user is facing now a challenging problem of selecting the Cloud provider that fits his needs. A new promising research approach is the use...
Article
Using Virtual Machines (VMs) as a computing resource within a Service Oriented Architecture (SOA) creates a variety of new issues and challenges. Traditionally parallel task scheduling algorithms only focus on handling CPU resources but with use of a VM there are many more resources properties to monitor and manage. The objective of this paper is t...
Conference Paper
The RIVER architecture is a run-time configurable and programmable fabric for parallel stream processing on FPGAs. RIVER's memory architecture has been designed to support non-trivial data flows efficiently and in real-time. The individual data processing cores are called Dynamic Streaming Engines (DSE). Our cloud computing supported design flow ge...
Conference Paper
Cloud Computing is a novel computing paradigm. It provides on-demand computing power in a simple way that users can access the resources via Internet with a friendly interface based on Web services. Grid Computing is another computing paradigm that has been established for more than a dozen of years. In contrast to Cloud Computing, Grid Computing a...
Conference Paper
We present a scalable run-time configurable and programmable signal processing architecture for real-time applications which covers a wide performance spectrum. Our approach goes beyond conventional special purpose signal processing engines. Scalability has multiple dimensions: on core- and network-level. We base our novel architecture on programma...
Conference Paper
Full-text available
The fast emerging Cloud computing market over the last years resulted in a variety of heterogeneous and less interoperable Cloud infrastructures. This leads to a challenging and urgent problem for Cloud users when selecting their best fitting Cloud provider and hence it ties them to a particular provider. A new growing research paradigm, which envi...
Article
Cloud computing has promoted the widespread use of virtualized machines. A question arises: How does virtualization influence the performance of running applications? The answer must be a common interest of application dev elopers and users. This paper describes the results of our performance evaluation on a virtualized multicore ma chine. We teste...
Article
Full-text available
This short paper introduces the workshop on Tools for program development and analysis in computational science, a special section of the conference ICCS 2011. It describes the goal of the workshop, followed by a brief introduction of the accepted papers.
Article
Full-text available
Cloud computing introduces a novel computing paradigm that provisions on-demand computational capacity as a service. Increasing numbers of users are migrating their applications to the computing Clouds to remove or reduce the costs on resource investment and management. However, individual Cloud platforms, either private or public, provide their ow...
Article
Full-text available
This paper presents a methodology for analyzing communication of multi-threaded applications. Previous work relies on more or less accurate architectural models. Our measurement methodology has been designed to be completely architecture independent, since we want architects to have an undistorted view of the communication behavior. One part of our...
Article
Full-text available
To reduce the energy consumption and build a sustainable computer infrastructure now becomes a major goal of the high performance community. A number of research projects have been carried out in the field of energy-aware high performance computing. This paper is devoted to categorize energy-aware computing methods for the high-end computing infras...
Article
Grid computing uses a job submission model that requires the users to perform a set of interactive operations for executing an application on the Grid. Grid users are therefore burdened with the tasks of understanding the basic concept of Grid computing and the details of job management. Cloud computing, on the other hand, applies a utility model t...
Article
The Cloud computing becomes an innovative computing paradigm, which aims to provide reliable, customized and QoS guaranteed computing infrastructures for users. This paper presents our early experience of Cloud computing based on the Cumulus project for compute centers. In this paper, we give the Cloud computing definition and Cloud computing funct...
Conference Paper
Full-text available
Virtualization technology has been applied to a variety of areas including server consolidation, High Performance Computing, as well as Grid and Cloud computing. Due to the fact that applications do not run directly on the hardware of a host machine, virtualization generally causes a performance loss for both sequential and parallel applications....
Conference Paper
It has been widely known that various benefits can be achieved by reducing energy consumption for high end computing. This paper aims to develop power aware scheduling heuristics for parallel tasks in a cluster with the DVFS technique. In this paper, formal models are presented for precedence-constrained parallel tasks, DVFS enabled clusters, and e...
Article
Full-text available
Distributed virtual machines can help to build scalable, manageable, and efficient grid infrastructures. The work proposed in this paper focuses on employing virtual machines for grid computing. In order to efficiently run grid applications, virtual machine resource information should be provided. This paper first discusses the system architecture...
Conference Paper
Full-text available
The Software as a Service (SaaS) methodology is a key paradigm of Cloud computing. In this paper, we focus on an interesting topic - to implement a Cloud computing functionality, the SaaS model, on existing production Grid infrastructures. In general, production Grids employ a Job-Submission-Execution (JSE) model with rigid access interfaces. In th...
Conference Paper
This paper describes a toolkit developed for supporting Grid users in the task of application deployment on computing resources. The toolkit presents a graphical interface where users give required information, simply with context menu and mouse actions. More importantly, the jobs for starting the deployment process are automatically created and su...
Article
With modern techniques that allow billions of transis- tors on a chip, microprocessor design is going to a world of multicore. A cluster of multicores will be commonly used as an efficient computational platform for high performance computing in the near future. Correspondingly, the resource providers, who share their computing elements and storage...
Conference Paper
Filter caches have been proposed to decrease the energy consumption on embedded systems. However, the achievement in energy is usually acquired with a loss in performance. This work investigates a novel filter cache architecture that outperforms a traditional one, while maintaining the energy advantages. The performance gain is achieved by only all...
Article
Full-text available
The use of supercomputing technology, parallel and distributed processing, and sophisticated algorithms is of major importance for computational scientists. Yet, the scientists' goals are to solve their challenging scientific problems, not the software engineering tasks associated with it. For that reason, computational science and engineering must...
Article
Full-text available
The Cloud computing emerges as a new computing paradigm which aims to provide reliable, customized and QoS guaranteed dynamic computing environments for end-users. In this paper, we study the Cloud computing paradigm from various aspects, such as denitions, distinct features, and enabling technologies. This paper brings an introductional review on...
Article
Grid users always expect to meet some challenges to employ Grid resources, such as customized computing environment and QoS support. In this paper, we propose a new methodology for Grid computing – to use virtual machines as computing resources and provide Virtual Distributed Environments (VDE) for Grid users. It is declared that employing virtual...
Article
The tools for program development and analysis in computational science are used as supercomputing technology thereby relying on dedicated support from program development and analysis tools. The developed mainly focus to provide developers with the possibility to demonstrate the way their tools support scientists and engineers during program devel...
Article
Processor speed is increasing exponentially, while the increase in memory speed is relatively slow. This results in the fact that the overall performance of a computing system is increasingly contained by the memory performance. This paper describes an approach for improving the cache hit ratio and thereby the efficiency of the memory system. The a...
Article
Full-text available
This paper presents the work of building a Grid workflow system on distributed virtual machines. A Grid Virtualisation Engine (GVE) is implemented to manage virtual machines as computing resources for Grid applications. The Virtual Data System (VDS) functions as a Grid workflow engine. This paper designs and implements the VDS on distributed virtua...
Conference Paper
Full-text available
Virtual machines offer unique advantages to the scientific computing community, such as Quality of Service(QoS) guarantee, performance isolation, easy resource management, and the on-demand deployment of computing environments. Using virtual machines as a computing resource within a distributed environment, such as Service Oriented Architecture (SO...
Article
Full-text available
Virtual machines offer various advantages such as easy configuration, management, development and deployment of computing resources for cyberinfrastructures. Recent advances of employing virtual machines for Grid computing can help Grid communities to solve research issues, for example, qualities of service (QoS) provision and computing environment...
Conference Paper
Full-text available
Cyberinfrastructure offers a vision of advanced knowledge infrastructure for research and education It integrates diverse resources across geographically distributed resources and human communities. Cy-beraide is a service oriented architecture and abstraction framework that allows us to access cyberinfrastructure through Web 2.0 technologies, whic...
Conference Paper
Increasing computing clouds are delivered to customers. Each cloud, however, provides an individual, non-standard user interface. The difference in cloud interfaces must burden the users when they work with several clouds for acquiring the services with expected price. This paper introduces an integrated framework that can be used by cloud users to...
Conference Paper
Data cache is a commodity in modern microprocessor systems. It is a fact that the size of data caches keeps growing up, however, the increase in application size goes faster. As a result, it is usually not possible to store the complete working set in the cache memory. This paper proposes an approach that allows the data access of some load/store...
Article
Full-text available
The Cloud computing becomes an innovative computing paradigm, which aims to provide reliable, customized and QoS guaranteed computing infrastructures for users. This paper presents our early experience of Cloud computing based on the Cumulus project for compute centers. In this paper, we introduce the Cumulus project with its various aspects, such...
Conference Paper
Today, computers and computational methods are increasingly important and powerful tools for science and engineering. Yet, using them effectively and efficiently requires both, expert knowledge of the respective application domain as well as solid experience applying the technologies. Only the combination allows new and faster advancement in the ar...
Conference Paper
Full-text available
The overall performance of a computing system increasingly depends on the efficient use of the cache memories. Traditional approaches for cache tuning deploy performance tools to help the user optimize the source program towards a better runtime data locality. Following this conventional way, we developed a set of such toolkits including data profi...
Conference Paper
Cloud computing emerges as a new computing paradigm which aims to provide reliable, customized and QoS guaranteed computing dynamic environments for end-users. This paper reviews recent advances of Cloud computing, identifies the concepts and characters of scientific Clouds, and finally presents an example of scientific Cloud for data centers
Article
Grid computing has now become the de facto standard for distributed computing and provides huge computing resources for high-performance scientific and engineering applications. Recent advantages of virtual computing technologies make it possible to efficiently employ computing resources on-demand. Distributed virtual machines (VMs) can help to bui...
Conference Paper
A typical e-Science infrastructure contains GridSAM as Grid middleware, ActiveBPEL engine as workflow engine to build a workflow system. Grid Virtualization Service (GVS) is used to provide virtual machines as computing resources for the e-Science infrastructure. This paper emphasizes that a virtual e-Science infrastructure could be dynamically bui...
Conference Paper
Cache prefetching is a basic technique for removing cache misses and the resulting access penalty. This work proposes a kind of guided prefetching which uses the access pattern of an application to prohibit from loading data which are not required. The access pattern is achieved with a data analyzer capable of discovering the affinity and regularit...
Conference Paper
The use of supercomputing technology, parallel and distributed processing, and sophisticated algorithms is of major importance for computational scientists. Yet, the scientists' goals are to solve their challenging scientific problems, not the software engineering tasks associated with it. For this reason, computational science and engineering must...
Article
Full-text available
With the trends of microprocessor design towards multicore, cache performance becomes more important because an off-chip access would be increasingly expensive due to the competition across the processor cores. A question arises: How to design the cache architecture to prevent a performance bottleneck caused by data accesses? This work studies a re...
Conference Paper
Microprocessor architecture for both commercial and academical purpose is coming into a new generation: multiprocessors on a chip. Together with this novel architecture, questions and research topics also arise. For example, how to design the on-chip caches to avoid memory operations becoming the performance bottleneck? In this work, we study the i...