Michael E. Papka

Michael E. Papka
Northern Illinois University · Department of Computer Science

Ph.D. - University of Chicago

About

242
Publications
23,839
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,440
Citations
Additional affiliations
August 2017 - present
Northern Illinois University
Position
  • Professor
August 2017 - present
Northern Illinois University
Position
  • Professor
August 2012 - August 2017
Northern Illinois University
Position
  • Professor (Associate)
Education
August 1995 - December 2009
University of Chicago
Field of study
  • Computer Science
August 1991 - December 1994
University of Illinois at Chicago
Field of study
  • Computer Science
June 1986 - December 1990
Northern Illinois University
Field of study
  • Physics

Publications

Publications (242)
Preprint
Full-text available
Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Such online analy...
Article
Cluster schedulers are crucial in high-performance computing (HPC). They determine when and which user jobs should be allocated to available system resources. Existing cluster scheduling heuristics are developed by human experts based on their experience with specific HPC systems and workloads. However, the increasing complexity of computing system...
Preprint
Traditionally, on-demand, rigid, and malleable applications have been scheduled and executed on separate systems. The ever-growing workload demands and rapidly developing HPC infrastructure trigger the interest of converging these applications on a single HPC system. Although allocating the hybrid workloads within one system could potentially impro...
Preprint
Supercomputer FCFS-based scheduling policies result in many transient idle nodes, a phenomenon that is only partially alleviated by backfill scheduling methods that promote small jobs to run before large jobs. Here we describe how to realize a novel use for these otherwise wasted resources, namely, deep neural network (DNN) training. This important...
Article
Color encoding is foundational to visualizing quantitative data. Guidelines for colormap design have traditionally emphasized perceptual principles, such as order and uniformity. However, colors also evoke cognitive and linguistic associations whose role in data interpretation remains underexplored. We study how two linguistic factors, name salienc...
Preprint
Full-text available
Massive upgrades to science infrastructure are driving data velocities upwards while stimulating adoption of increasingly data-intensive analytics. While next-generation exascale supercomputers promise strong support for I/O-intensive workflows, HPC remains largely untapped by live experiments, because data transfers and disparate batch-queueing po...
Conference Paper
Full-text available
Cluster scheduler is crucial in high-performance computing (HPC). It determines when and which user jobs should be allocated to available system resources. Existing cluster scheduling heuristics are developed by human experts based on their experience with specific HPC systems and workloads. However, the increasing complexity of computing systems a...
Preprint
Color encoding is foundational to visualizing quantitative data. Guidelines for colormap design have traditionally emphasized perceptual principles, such as order and uniformity. However, colors also evoke cognitive and linguistic associations whose role in data interpretation remains underexplored. We study how two linguistic factors, name salienc...
Article
Surround-view panoramic images and videos have become a popular form of media for interactive viewing on mobile devices and virtual reality headsets. Viewing such media provides a sense of immersion by allowing users to control their view direction and experience an entire environment. When using a virtual reality headset, the level of immersion ca...
Article
viewSq is a Visual Molecular Dynamics (VMD) module for calculating structure factors (S(q)) and partial structure factors for any user-selected atomic selections (Ssel1,sel2(q)) derived from computer simulation trajectories, as well as quantifying, analyzing, and visualizing atomic contributions to them. viewSq offers radial distribution functions...
Article
Our exploratory work finds that the SambaNova Reconfigurable Dataflow Architecture (RDA) along with the SambaFlow software stack provides for an attractive system and solution to accelerate AI for science workloads. We have observed the efficacy of using the system with a diverse set of science applications and reasoned their suitability for perfor...
Preprint
Full-text available
Cluster scheduler is crucial in high-performance computing (HPC). It determines when and which user jobs should be allocated to available system resources. Existing cluster scheduling heuristics are developed by human experts based on their experience with specific HPC systems and workloads. However, the increasing complexity of computing systems a...
Preprint
Full-text available
High performance computing (HPC) is undergoing significant changes. The emerging HPC applications comprise both compute- and data-intensive applications. To meet the intense I/O demand from emerging data-intensive applications, burst buffers are deployed in production systems. Existing HPC schedulers are mainly CPU-centric. The extreme heterogeneit...
Article
U.S. computing leaders, including Department of Energy National Laboratories, have partnered with universities, government agencies, and the private sector to research responses to COVID-19, providing an unprecedented collection of resources that include some of the fastest computers in the world. For HPC users, these leadership machines will drive...
Article
The Chicago Array of Things (AoT) project, funded by the US National Science Foundation, created an experimental, urban-scale measurement capability to support diverse scientific studies. Initially conceived as a traditional sensor network, collaborations with many science communities guided the project to design a system that is remotely programma...
Conference Paper
In-situ data analysis and visualization is a promising technique to handle the enormous amount of data an extreme-scale application produces. One challenge users often face in adopting in-situ techniques is setting the right environment on a target machine. Platforms such as SENSEI require complex software stacks that consist of various analysis pa...
Preprint
We introduce the Balsam service to manage high-throughput task scheduling and execution on supercomputing systems. Balsam allows users to populate a task database with a variety of tasks ranging from simple independent tasks to dynamic multi-task workflows. With abstractions for the local resource scheduler and MPI environment, Balsam dynamically p...
Preprint
Color mapping is a commonly used technique for visualizing scalar fields. While there exists advice for choosing effective colormaps, it is unclear if current guidelines apply equally across task types. We study the perception of gradients and evaluate the effectiveness of three colormaps at depicting gradient magnitudes. In a crowd- sourced experi...
Conference Paper
Full-text available
We report on our experiences deploying and operating Petrel, a data service designed to support science projects that must organize and distribute large quantities of data. Building on a high-performance 3.2 PB parallel file system and embedded in Argonne National Laboratory's 100+ Gbps network fabric, Petrel leverages Science DMZ concepts and Glob...
Conference Paper
Full-text available
High performance computing (HPC) is undergoing significant changes. The emerging HPC applications comprise both compute- and data-intensive applications. To meet the intense I/O demand from emerging data-intensive applications, burst buffers are deployed in production systems. Existing HPC schedulers are mainly CPU-centric. The extreme heterogeneit...
Article
The Argonne Leadership Computing Facility is deploying Singularity to allow HPC resources to adopt “containers”—a technology that has benefitted non-HPC resources like cloud computing servers for a few years now. For HPC users, containerization will allow them to easily migrate their software stack between resources with minimal effort. For HPC fac...
Article
Performing analysis or generating visualizations concurrently with high performance simulations can yield great benefits compared to post-processing data. Writing and reading large volumes of data can be reduced or eliminated, thereby producing an I/O cost savings. One such method for concurrent simulation and analysis is in transit – streaming dat...
Conference Paper
Full-text available
As simulations grow in scale, the need for in situ analysis methods to handle the large data produced grows correspondingly. One desirable approach to in situ visualization is in transit visualization. By decoupling the simulation and visualization code, in transit approaches alleviate common difficulties with regard to the scalability of the analy...
Article
Full-text available
While performance remains a major objective in the field of high-performance computing (HPC), future systems will have to deliver desired performance under both reliability and energy constraints. Although a number of resilience methods and power management techniques have been presented to address the reliability and energy concerns, the trade-off...
Article
For the first time, an automatically triggered, between-pulse fusion science analysis code was run on-demand at a remotely located supercomputer at Argonne Leadership Computing Facility (ALCF, Lemont, Illinois) in support of in-process experiments being performed at DIII-D (San Diego, California). This represents a new paradigm for combining geogra...
Conference Paper
Analysis of scientific simulation data enables scientists to glean insight from simulations. In situ analysis, which can be simultaneously executed with the simulation, mitigates I/O bottlenecks and can accelerate discovery of new phenomena. However, in typical modes of operation, this requires either stalling simulation during analysis phase or tr...
Conference Paper
Large experimental collaborations, such as those at the Large Hadron Collider at CERN, have developed large job management systems running hundreds of thousands of jobs across worldwide computing grids. HPC facilities are becoming more important to these data-intensive workflows and integrating them into the experiment job management system is non-...
Poster
Full-text available
Visualizing large real-world networks, such as social networks and scientific collaboration networks, is challenging not only because they contain large numbers of nodes and links but also due to their multivariate nature. Applications that analyze such datasets tend to focus on problems related to visualizing either multiple attributes on nodes or...
Conference Paper
Full-text available
Current and upcoming supercomputers have more than thousands of compute nodes interconnected with high-dimensional networks and complex network topologies for improved performance. Application developers are required to write scalable parallel programs in order to achieve high throughput on these machines. Application performance is largely determi...
Conference Paper
Full-text available
Job runtime estimates provided by users are widely acknowledged to be overestimated and runtime overestimation can greatly degrade job scheduling performance. Previous studies focus on improving accuracy of job runtime estimates by reducing runtime overestimation, but fail to address the underestimation problem (i.e., the underestimation of job run...
Conference Paper
Full-text available
In this study we performed an initial investigation and evaluation of altmetrics and their relationship with public policy citation of research papers. We examined methods for using altmetrics and other data to predict whether a research paper is cited in public policy and applied receiver operating characteristic curve on various feature groups in...
Article
Full-text available
Scientific publications and other genres of research output are increasingly being cited in policy documents. Citations in documents of this nature could be considered a critical indicator of the significance and societal impact of the research output. In this study, we built classification models that predict whether a particular research work is...
Conference Paper
Ab initio molecular dynamics (AIMD) simulations are increasingly useful in modeling, optimizing and synthesizing materials in energy sciences. In solving Schrödinger’s equation, they generate the electronic structure of the simulated atoms as a scalar field. However, methods for analyzing these volume data are not yet common in molecular visualizat...
Conference Paper
Efficient RDF, graph based queries are becoming more pertinent based on the increased interest in data analytics and its intersection with large, unstructured but connected data. Many commercial systems have adopted distributed RDF graph systems in order to handle increasing dataset sizes and complex queries. This paper introduces a distribute grap...
Conference Paper
The scale of scientific data generated by experimental facilities and simulations on high-performance computing facilities has been growing rapidly. In many cases, this data needs to be transferred rapidly and reliably to remote facilities for storage, analysis, sharing etc. At the same time, users want to verify the integrity of the data by doing...
Article
We present a system for interactive in situ visualization of large particle simulations, suitable for general CPU-based HPC architectures. As simulations grow in scale, in situ methods are needed to alleviate IO bottlenecks and visualize data at full spatio-temporal resolution. We use a lightweight loosely-coupled layer serving distributed data fro...
Conference Paper
The analysis of scientific simulation data enables scientists to derive insights from their simulations. This analysis of the simulation output can be performed at the same execution site as the simulation using the same resources or can be done at a different site. The optimal output frequency is challenging to decide and is often chosen empirical...
Conference Paper
Classic visual analysis relies on a single medium for displaying and interacting with data. Large-scale tiled display walls, virtual reality using head-mounted displays or CAVE systems, and collaborative touch screens have all been utilized for data exploration and analysis. We present our initial findings of combining numerous display environments...
Article
The power consumption of state of the art supercomputers, because of their complexity and unpredictable workloads, is extremely difficult to estimate. Accurate and precise results, as are now possible with the latest generation of IBM Blue Gene/Q, are therefore a welcome addition to the landscape. Only recently have end users been afforded the abil...
Article
The high-performance computing centers of the future will expand their roles as service providers, and as the machines scale up, so should the sizes of the communities they serve. National facilities must cultivate their users as much as they focus on operating machines reliably. The authors present five interrelated topic areas that are essential...
Article
Empirical evaluation methods for visualizations have traditionally focused on assessing the outcome of the visual analytic process as opposed to characterizing how that process unfolds. There are only a handful of methods that can be used to systematically study how people use visualizations, making it difficult for researchers to capture and chara...
Article
Full-text available
Demand for Grid resources is expected to double during LHC Run II as compared to Run I; the capacity of the Grid, however, will not double. The HEP community must consider how to bridge this computing gap by targeting larger compute resources and using the available compute resources as efficiently as possible. Argonne's Mira, the fifth fastest sup...