
Gregor von LaszewskiUniversity of Virginia | UVa
Gregor von Laszewski
Ph.D.
Cyberinfrastructure for Deep Learning, Benchmarking, Scientific Impact, Distributed Computing, Parallel Computing
About
255
Publications
62,305
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,239
Citations
Citations since 2017
Introduction
Additional affiliations
July 2009 - present
Indiana University Bloomington
Position
- Assit. DIrector, Associate Professor Computer Science Department
Description
- Research on cloud computing, Software architect of FutureGrid
January 2001 - December 2006
University of Chicago
Education
September 1990 - November 1996
September 1989 - August 1990
September 1984 - November 1989
Publications
Publications (255)
In this paper we report on the features of the Java Commodity Grid Kit (Java CoG Kit). The Java CoG Kit provides middleware for accessing Grid functionality from the Java framework. Java CoG Kit middleware is general enough to design a variety of advanced Grid applications with quite different user requirements. Access to the Grid is established vi...
FutureGrid provides novel computing capabilities that enable reproducible experiments while simultaneously supporting dynamic provisioning. This paper describes the FutureGrid experiment management framework to create and execute large scale scientific experiments for researchers around the globe. The experiments executed are performed by the vario...
Future Grid (FG) is an experimental, high-performance test bed that supports HPC, cloud and grid computing experiments for both application and computer scientist. Future Grid includes the use of virtualization technology to allow the support of a wide range of operating systems in order to include a test bed for various cloud computing infrastruct...
With machine learning (ML) becoming a transformative tool for science, the scientific community needs a clear catalogue of ML techniques, and their relative benefits on various scientific problems, if they were to make significant advances in science using AI. Although this comes under the purview of benchmarking, conventional benchmarking initiati...
In this paper, we summarize our effort to create and utilize a simple framework to coordinate computational analytics tasks with the help of a workflow system. Our design is based on a minimalistic approach while at the same time allowing to access computational resources offered through the owner's computer, HPC computing centers, cloud resources,...
Digitization is changing our world, creating innovative finance channels and emerging technology such as cryptocurrencies, which are applications of blockchain technology. However, cryptocurrency price volatility is one of this technology’s main trade-offs. In this paper, we explore a time series analysis using deep learning to study the volatility...
In this paper we apply neural networks and Artificial Intelligence (AI) to historical records of high-risk cryptocurrency coins to train a prediction model that guesses their price. This paper's code contains Jupyter notebooks, one of which outputs a timeseries graph of any cryptocurrency price once a CSV file of the historical data is inputted int...
Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definitio...
Data-intensive applications impact many domains, and their steadily increasing size and complexity demands highperformance, highly usable environments. We integrate a set of ideas developed in various data science and data engineering frameworks. They employ a set of operators on specific data abstractions that include vectors, matrices, tensors, g...
Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definitio...
Data-intensive applications impact many domains, and their steadily increasing size and complexity demands high-performance, highly usable environments. We integrate a set of ideas developed in various data science and data engineering frameworks. They employ a set of operators on specific data abstractions that include vectors, matrices, tensors,...
The SPIDAL (Scalable Parallel Interoperable Data Analytics Library) project was begun in Fall 2014 and has reached a technical completion in Fall 2020 with outreach activities continuing in 2021. The February Poster summarizes the 2020 status and activity very well with previous work through September 2018 summarized in a book chapter with extensiv...
The COVID-19 pandemic has profound global consequences on health, economic , social, political, and almost every major aspect of human life. Therefore, it is of great importance to model COVID-19 and other pandemics in terms of the broader social contexts in which they take place. We present the architecture of AICov, which provides an integrative...
Our project is at the interface of Big Data and HPC – High-Performance Big Data computing and this paper describes a collaboration between 7 collaborating Universities at Arizona State, Indiana (lead), Kansas, Rutgers, Stony Brook, Virginia Tech, and Utah. It addresses the intersection of High-performance and Big Data computing with several differe...
The Comet petascale system is an XSEDE resource with the goal of serving a large user community. The Comet project has served a large number of users while using traditional supercomputing as well as science gateways. In addition to these offerings, Comet also includes a non traditional virtual machine framework that allows users to access entire V...
Our project is at Interface Big Data and HPC -- High-Performance Big Data computing and this paper describes a collaboration between 7 collaborating Universities at Arizona State, Indiana (lead), Kansas, Rutgers, Stony Brook, Virginia Tech, and Utah. It addresses the intersection of High-performance and Big Data computing with several different app...
We use the bibliometrics approach to evaluate the scientific impact of XSEDE. By utilizing publication data from various sources, e.g., ISI Web of Science and Microsoft Academic Graph, we calculate the impact metrics of XSEDE publications and show how they compare with non-XSEDE publication from the same field of study, or non-XSEDE peers from the...
The Comet petascale supercomputer was put into production as an XSEDE resource in early 2015 with the goal of serving a much larger user community than HPC systems of similar size. The Comet project set an audacious goal of reaching over 10,000 users in its four years of planned operation. That goal was achieved in less than two years, due in large...
Two major trends in computing systems are the growth in high performance computing (HPC) with an international exascale initiative, and the big data phenomenon with an accompanying cloud infrastructure of well publicized dramatic and increasing size and sophistication. This tutorial weaves these trends together using some key building blocks. The f...
Status of NSF 1443054 Project -------------------------------------------------- Big Data Application Analysis identifies features of data intensive applications that need to be supported in software and represented in benchmarks. This analysis was started for proposal and has been extended to support HPC-Simulations-Big Data convergence. The proje...
This covers Streaming workshops held, IoTCloud for cloud control of robots, SPIDAL project, HPC-ABDS, WebPlotviz visualization and Stock Market data, Scientific paper impact analysis for XSEDE
This poster covers the Harp HPC Hadoop plugin, RaPyDLI deep learning system, Virtual Clusters on XSEDE Comet system, Cloudmesh to defer Ansible Big data applications, Big Data Ogres and Diamonds to converge HPC and Big Data, Performance of Flink on machine learning
This poster introduces all of DSC projects below and covers 1) 3) 4) 5) 1) Digital Science Center Facilities 2) RaPyDLI Deep Learning Environment 3) SPIDAL Scalable Data Analytics Library and applications including Bioinformatics and Polar Remote Sensing Data Analysis 4) MIDAS Big Data Software; Harp for HPC-ABDS 5) Big Data Ogres Classification an...
This is a 21-month progress report on an NSFfunded project NSF14-43054 started October 1, 2014 and involving a collaboration between university teams at Arizona, Emory, Indiana (lead), Kansas, Rutgers, Virginia Tech, and Utah. The project is constructing data building blocks to address major cyberinfrastructure challenges in seven different communi...
Hardware virtualization has been gaining a significant share of computing time in the last years. Using virtual machines (VMs) for parallel computing is an attractive option for many users. A VM gives users a freedom of choosing an operating system, software stack and security policies, leaving the physical hardware, OS management, and billing to p...
We present a framework that compares the publication impact based on a comprehensive peer analysis of papers produced by scientists using XSEDE and NCAR resources. The analysis is introducing a percentile ranking based approach of citations of the XSEDE and NCAR papers compared to peer publications in the same journal that do not use these resource...
The Technology Audit Service has developed, XDMoD, a resource management tool. This paper utilizes XDMoD and the XDMoD data warehouse that it draws from to provide a broad overview of several aspects of XSEDE users and their usage. Some important trends include: 1) in spite of a large yearly turnover, there is a core of users persisting over many y...
In this chapter introduce you to FutureGrid, which provides a testbed to conduct research for Cloud, Grid, and High Performance Computing. Although FutureGrid has only a modest number of compute cores (about 4,500 regular cores and 14,000 GPU cores) it provides an ideal playground to test out various frameworks that may be useful for users to consi...
SUMMARY The important role high-performance computing (HPC) resources play in science and engineering research, coupled with its high cost (capital, power and manpower), short life and oversubscription, requires us to optimize its usage – an outcome that is only possible if adequate analytical data are collected and used to drive systems management...
We present a framework that (a) integrates publication and citation data retrieval, (b) allows scientific impact metrics generation at different aggregation levels, and (c) provides correlation analysis of impact metrics based on publication and citation data with resource allocation for a computing facility. Furthermore, we use this framework to c...
We present the design of a toolkit that can be employed by users and administrators to manage virtual machines on multi-cloud environments. It can be run by individual users or offered as a service to a shared user community. We have practically demonstrated its use as part of a Future- Grid service, allowing users of FutureGrid to utilize such a s...
The Software as a Service (SaaS) methodology is a key paradigm of Cloud computing. In this paper, we focus on an interesting
topic—to dynamically host services on existing production Grid infrastructures. In general, production Grids normally employ
a Job-Submission-Execution (JSE) model with rigid access interfaces. In this paper, we implement the...
Heterogeneous parallel systems with multi processors and accelerators are becoming ubiquitous due to better cost-performance and energy-efficiency. These heterogeneous processor architectures have different instruction sets and are optimized for either task-latency or throughput purposes. Challenges occur in regard to programmability and performanc...
The XDMoD auditing tool provides, for the first time, a comprehensive tool to measure both utilization and performance of high-end cyberinfrastructure (CI), with initial focus on XSEDE. Here, we demonstrate, through several case studies, its utility for providing important metrics regarding resource utilization and performance of TeraGrid/XSEDE tha...
We present the design of a dynamic provisioning system that is able to manage the resources of a federated cloud environment by focusing on their utilization. With our framework, it is not only possible to allocate resources at a particular time to a specific Infrastructure as a Service framework, but also to utilize them as part of a typical HPC e...
The ability to conduct consistent, controlled, and repeatable large-scale experiments in all areas of computer science related to parallel, large-scale, or distributed computing and networking is critical to the future and development of computer science. Yet conducting such experiments is still too often a challenge for researchers, students, and...
Historically high-end cyberinfrastructure planning at the campus, regional and national levels has been based on episodic analysis of limited data and/or projections of demand with minimal supporting comprehensive usage data. However, a repository of usage data for the TeraGrid and the follow-on XSEDE program provides a unique source of extensive d...
We present the design of a dynamic provisioning system that is able to manage the resources of a federated cloud environment by focusing on their utilization. With our framework, it is not only possible to allocate resources at a particular time to a specific Infrastructure as a Service framework, but also to utilize them as part of a typical HPC e...
Many cloud infrastructure as a service frameworks exist and users, developers and administrators have to make a decision, which environment is best suited for them. Unfortunately, the comparison of such frameworks is difficult as users may not have access to all of them, or are comparing the performance of such systems on different resources making...
Today, many cloud Infrastructure as a Service (IaaS) frameworks exist. Users, developers, and administrators have to make a decision about which environment is best suited for them. Unfortunately, the comparison of such frameworks is difficult because either users do not have access to all of them or they are comparing the performance of such syste...
Cloud computing has become an important driver for delivering infrastructure as a service (IaaS) to users with on-demand requests for customized environments and sophisti-cated software stacks. Within the FutureGrid (FG) project, we offer different IaaS frameworks as well as high performance computing infrastructures by allowing users to explore th...
Through the development of advanced middleware, Grid computing has evolved to a mature technology that scientists and researchers can leverage to gain knowledge that was previously unobtainable in a wide variety of scientific
This paper describes a comprehensive auditing framework, XDMoD, for use by high performance computing centers to readily provide metrics regarding resource utilization (CPU hours, job size, wait time, etc), resource performance, and the center's impact in terms of scholarship and research. This role-based auditing framework is designed to meet the...
Electronic Health Records (EHRs) have many potential advantages over traditional paper records, such as wide scale access, error checking, and protection from physical damage to a record. As with any medical record, paper or electronic, both the patient's privacy and the document's integrity must be guaranteed. With initiatives such as Integrating...
High temperatures within a data center can cause a number of problems, such as increased cooling costs and increased hardware
failure rates. To overcome this problem, researchers have shown that workload management, focused on a data center’s thermal
properties, effectively reduces temperatures within a data center. In this paper, we propose a meth...
The Cloud computing becomes an innovative computing paradigm, which aims to provide reliable, customized and QoS guaranteed computing infrastructures for users. This paper presents our early experience of Cloud computing based on the Cumulus project for compute centers. In this paper, we give the Cloud computing definition and Cloud computing funct...
As Cloud computing emerges as a dominant paradigm in distributed systems, it is important to fully understand the underlying technologies that make Clouds possible. One technology, and perhaps the most important, is virtualization. Recently virtualization, through the use of hyper visors, has become widely used and well understood by many. However,...
In this paper, we briefly outline the current design of a generic image management service for FutureGrid. The service is intended to generate, store, and verify images while interfacing with different localized cloud IaaS image. Additionally, we will also use the service to generate images for traditional bare-metal deployments.
With the advent of Cloud computing, a wide variety of In-frastructure as a Service models have grown to provide users with one of the greatest benefits of Clouds: a customized system environment. These services, while extremely use-ful, often suffer from their ability to interoperate and com-municate across administratively separate domains. Within...
We present a workflow-based algorithm for identifying threads to an urban water management system. Through Grid computing we provide the necessary high-performance computing resources to deliver quickly solutions to the problem. We prototyped a new middleware called cyberaide, that enables easy access to Grid resources through portals or the comman...
It has been widely known that various benefits can be achieved by reducing energy consumption for high end computing. This paper aims to develop power aware scheduling heuristics for parallel tasks in a cluster with the DVFS technique. In this paper, formal models are presented for precedence-constrained parallel tasks, DVFS enabled clusters, and e...
Analysis of neural signals (such as EEG) has long been a hot topic in neuroscience community due to neural signals’ nonlinear and non-stationary features. Recent advances of experimental methods and neuroscience research have made neural signals constantly massive and analysis of these signals highly compute-intensive. Analysis of neural signals ha...
Distributed virtual machines can help to build scalable, manageable, and efficient grid infrastructures. The work proposed in this paper focuses on employing virtual machines for grid computing. In order to efficiently run grid applications, virtual machine resource information should be provided. This paper first discusses the system architecture...