Michael E. Papka

Michael E. Papka
University of Illinois Chicago | UIC · Department of Computer Science

Ph.D. - University of Chicago

About

286
Publications
32,832
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,384
Citations
Additional affiliations
August 2017 - present
Northern Illinois University
Position
  • Professor
August 2012 - August 2017
Northern Illinois University
Position
  • Professor (Associate)
May 2012 - present
Argonne National Laboratory
Position
  • Senior Researcher
Education
August 1995 - December 2009
University of Chicago
Field of study
  • Computer Science
August 1991 - December 1994
University of Illinois Chicago
Field of study
  • Computer Science
June 1986 - December 1990
Northern Illinois University
Field of study
  • Physics

Publications

Publications (286)
Article
People commonly utilize visualizations not only to examine a given dataset, but also to draw generalizable conclusions about the underlying models or phenomena. Prior research has compared human visual inference to that of an optimal Bayesian agent, with deviations from rational analysis viewed as problematic. However, human reliance on non-normati...
Article
Scientists often explore and analyze large-scale scientific simulation data by leveraging two- and three-dimensional visualizations. The data and tasks can be complex and therefore best supported using myriad display technologies, from mobile devices to large high-resolution display walls to virtual reality headsets. Using a simulation of neuron co...
Article
Implicit neural representations (INRs) have emerged as a powerful tool for compressing large-scale volume data. This opens up new possibilities for in situ visualization. However, the efficient application of INRs to distributed data remains an underexplored area. In this work, we develop a distributed volumetric neural representation and optimiz...
Preprint
People commonly utilize visualizations not only to examine a given dataset, but also to draw generalizable conclusions about the underlying models or phenomena. Prior research has compared human visual inference to that of an optimal Bayesian agent, with deviations from rational analysis viewed as problematic. However, human reliance on non-normati...
Article
September 2023 marks the 50th Anniversary of the Electronic Visualization Laboratory (EVL) at University of Illinois Chicago (UIC). EVL's introduction of the CAVE Automatic Virtual Environment in 1992, the first widely replicated, projection-based, walk-in, virtual-reality (VR) system in the world, put EVL at the forefront of collaborative, immersi...
Article
We seek to transform how new and emergent variants of pandemic-causing viruses, specifically SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pre-training on over 110 million proka...
Article
Full-text available
Most real‐world networks are both dynamic and multivariate in nature, meaning that the network is associated with various attributes and both the network structure and attributes evolve over time. Visualizing dynamic multivariate networks is of great significance to the visualization community because of their wide applications across multiple doma...
Article
Full-text available
Exploratory analysis of the chemical space is an important task in the field of cheminformatics. For example, in drug discovery research, chemists investigate sets of thousands of chemical compounds in order to identify novel yet structurally similar synthetic compounds to replace natural products. Manually exploring the chemical space inhabited by...
Preprint
Full-text available
The ability to monitor and interpret of hardware system events and behaviors are crucial to improving the robustness and reliability of these systems, especially in a supercomputing facility. The growing complexity and scale of these systems demand an increase in monitoring data collected at multiple fidelity levels and varying temporal resolutions...
Preprint
Full-text available
In situ visualization and steering of computational modeling can be effectively achieved using reactive programming, which leverages temporal abstraction and data caching mechanisms to create dynamic workflows. However, implementing a temporal cache for large-scale simulations can be challenging. Implicit neural networks have proven effective in co...
Article
Full-text available
Increasing diversity in STEM disciplines has been a goal at scientific institutions for many decades. Black representation in STEM, however, has remained critically low at all levels (high school, undergraduate, graduate, and professional) for over 40 years, highlighting the need for innovative strategies that promote and retain Black students and...
Chapter
Reinforcement learning (RL) is exploited for cluster scheduling in the field of high-performance computing (HPC). One of the key challenges for RL driven scheduling is state representation for RL agent (i.e., capturing essential features of dynamic scheduling environment for decision making). Existing state encoding approaches either lack critical...
Chapter
We present a containerized workflow demonstrating in situ analysis of simulation data rendered by a ParaView/Catalyst adapter for the generic SENSEI in situ interface, then streamed to a remote site for visualization. We use Cinema, a database approach for navigating the metadata produced in situ. We developed a web socket tool, cinema_transfer, fo...
Chapter
Virtual reality offers unique affordances that can benefit the scientific discovery process. However, virtual reality applications must maintain very high frame rates to provide immersion and prevent adverse events such as visual fatigue and motion sickness. Maintaining high frame rates can be challenging when visualizing scientific data that is la...
Article
Cluster schedulers are crucial in high-performance computing (HPC). They determine when and which user jobs should be allocated to available system resources. Existing cluster scheduling heuristics are developed by human experts based on their experience with specific HPC systems and workloads. However, the increasing complexity of computing system...
Preprint
Full-text available
Our work seeks to transform how new and emergent variants of pandemic causing viruses, specially SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pretraining on over 110 million pr...
Article
Full-text available
Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Thus, methods are...
Chapter
This chapter describes methodologies to perform in situ computations at desired intervals along with the simulations for different execution modes. This needs to be done in a way such that the simulation throughput is minimally impacted and the analysis output is available immediately within desired intervals. We describe the formulation of optimal...
Preprint
Full-text available
Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Such online analy...
Preprint
Full-text available
Flooding is one of the most dangerous weather events today. Between $2015-2019$, on average, flooding has caused more than $130$ deaths every year in the USA alone. The devastating nature of flood necessitates the continuous monitoring of water level in the rivers and streams to detect the incoming flood. In this work, we have designed and implemen...
Article
Chicago's Array of Things (AoT) project is aptly described as a technology experiment or a -smart city- prototype. The concept of such an extensible -instrument- arose within a larger translational research vision applying computer science and engineering research for the multidimensional benefit of people and communities in cities. The AoT project...
Conference Paper
Full-text available
I. TOPIC We discuss challenges with data management, provenance, and data curation for science on large-scale scientific high performing computing (HPC) facilities. We explore promising research avenues to explore and tackle challenges ranging from support for automated metadata management complying with FAIR principles, intuitive interfaces for ac...
Preprint
Traditionally, on-demand, rigid, and malleable applications have been scheduled and executed on separate systems. The ever-growing workload demands and rapidly developing HPC infrastructure trigger the interest of converging these applications on a single HPC system. Although allocating the hybrid workloads within one system could potentially impro...
Article
Full-text available
Color encoding is foundational to visualizing quantitative data. Guidelines for colormap design have traditionally emphasized perceptual principles, such as order and uniformity. However, colors also evoke cognitive and linguistic associations whose role in data interpretation remains underexplored. We study how two linguistic factors, name salienc...
Preprint
Supercomputer FCFS-based scheduling policies result in many transient idle nodes, a phenomenon that is only partially alleviated by backfill scheduling methods that promote small jobs to run before large jobs. Here we describe how to realize a novel use for these otherwise wasted resources, namely, deep neural network (DNN) training. This important...
Preprint
Full-text available
Massive upgrades to science infrastructure are driving data velocities upwards while stimulating adoption of increasingly data-intensive analytics. While next-generation exascale supercomputers promise strong support for I/O-intensive workflows, HPC remains largely untapped by live experiments, because data transfers and disparate batch-queueing po...
Conference Paper
Full-text available
Cluster scheduler is crucial in high-performance computing (HPC). It determines when and which user jobs should be allocated to available system resources. Existing cluster scheduling heuristics are developed by human experts based on their experience with specific HPC systems and workloads. However, the increasing complexity of computing systems a...
Preprint
Color encoding is foundational to visualizing quantitative data. Guidelines for colormap design have traditionally emphasized perceptual principles, such as order and uniformity. However, colors also evoke cognitive and linguistic associations whose role in data interpretation remains underexplored. We study how two linguistic factors, name salienc...
Article
Surround-view panoramic images and videos have become a popular form of media for interactive viewing on mobile devices and virtual reality headsets. Viewing such media provides a sense of immersion by allowing users to control their view direction and experience an entire environment. When using a virtual reality headset, the level of immersion ca...
Article
viewSq is a Visual Molecular Dynamics (VMD) module for calculating structure factors (S(q)) and partial structure factors for any user-selected atomic selections (Ssel1,sel2(q)) derived from computer simulation trajectories, as well as quantifying, analyzing, and visualizing atomic contributions to them. viewSq offers radial distribution functions...
Article
Our exploratory work finds that the SambaNova Reconfigurable Dataflow Architecture (RDA) along with the SambaFlow software stack provides for an attractive system and solution to accelerate AI for science workloads. We have observed the efficacy of using the system with a diverse set of science applications and reasoned their suitability for perfor...
Preprint
Full-text available
Cluster scheduler is crucial in high-performance computing (HPC). It determines when and which user jobs should be allocated to available system resources. Existing cluster scheduling heuristics are developed by human experts based on their experience with specific HPC systems and workloads. However, the increasing complexity of computing systems a...
Preprint
Full-text available
High performance computing (HPC) is undergoing significant changes. The emerging HPC applications comprise both compute- and data-intensive applications. To meet the intense I/O demand from emerging data-intensive applications, burst buffers are deployed in production systems. Existing HPC schedulers are mainly CPU-centric. The extreme heterogeneit...
Article
U.S. computing leaders, including Department of Energy National Laboratories, have partnered with universities, government agencies, and the private sector to research responses to COVID-19, providing an unprecedented collection of resources that include some of the fastest computers in the world. For HPC users, these leadership machines will drive...
Article
The Chicago Array of Things (AoT) project, funded by the US National Science Foundation, created an experimental, urban-scale measurement capability to support diverse scientific studies. Initially conceived as a traditional sensor network, collaborations with many science communities guided the project to design a system that is remotely programma...
Conference Paper
In-situ data analysis and visualization is a promising technique to handle the enormous amount of data an extreme-scale application produces. One challenge users often face in adopting in-situ techniques is setting the right environment on a target machine. Platforms such as SENSEI require complex software stacks that consist of various analysis pa...
Preprint
We introduce the Balsam service to manage high-throughput task scheduling and execution on supercomputing systems. Balsam allows users to populate a task database with a variety of tasks ranging from simple independent tasks to dynamic multi-task workflows. With abstractions for the local resource scheduler and MPI environment, Balsam dynamically p...
Preprint
Color mapping is a commonly used technique for visualizing scalar fields. While there exists advice for choosing effective colormaps, it is unclear if current guidelines apply equally across task types. We study the perception of gradients and evaluate the effectiveness of three colormaps at depicting gradient magnitudes. In a crowd- sourced experi...
Conference Paper
Full-text available
We report on our experiences deploying and operating Petrel, a data service designed to support science projects that must organize and distribute large quantities of data. Building on a high-performance 3.2 PB parallel file system and embedded in Argonne National Laboratory's 100+ Gbps network fabric, Petrel leverages Science DMZ concepts and Glob...
Conference Paper
Full-text available
High performance computing (HPC) is undergoing significant changes. The emerging HPC applications comprise both compute- and data-intensive applications. To meet the intense I/O demand from emerging data-intensive applications, burst buffers are deployed in production systems. Existing HPC schedulers are mainly CPU-centric. The extreme heterogeneit...
Article
The Argonne Leadership Computing Facility is deploying Singularity to allow HPC resources to adopt “containers”—a technology that has benefitted non-HPC resources like cloud computing servers for a few years now. For HPC users, containerization will allow them to easily migrate their software stack between resources with minimal effort. For HPC fac...
Article
Performing analysis or generating visualizations concurrently with high performance simulations can yield great benefits compared to post-processing data. Writing and reading large volumes of data can be reduced or eliminated, thereby producing an I/O cost savings. One such method for concurrent simulation and analysis is in transit – streaming dat...
Conference Paper
Full-text available
As simulations grow in scale, the need for in situ analysis methods to handle the large data produced grows correspondingly. One desirable approach to in situ visualization is in transit visualization. By decoupling the simulation and visualization code, in transit approaches alleviate common difficulties with regard to the scalability of the analy...
Article
Full-text available
While performance remains a major objective in the field of high-performance computing (HPC), future systems will have to deliver desired performance under both reliability and energy constraints. Although a number of resilience methods and power management techniques have been presented to address the reliability and energy concerns, the trade-off...