# Thomas D. Uram's research while affiliated with Argonne National Laboratory and other places

## Publications (60)

Article
In this paper we introduce the Farpoint simulation, the latest member of the Hardware/Hybrid Accelerated Cosmology Code (HACC) gravity-only simulation family. The domain covers a volume of (1000 h ⁻¹ Mpc) ³ and evolves close to two trillion particles, corresponding to a mass resolution of m p ∼ 4.6 × 10 ⁷ h ⁻¹ M ⊙ . These specifications enable comp...
Conference Paper
Chapter
Electron microscopy (EM) enables the reconstruction of neural circuits at the level of individual synapses, which has been transformative for scientific discoveries. However, due to the complex morphology, an accurate reconstruction of cortical axons has become a major challenge. Worse still, there is no publicly available large-scale EM dataset fr...
Preprint
Full-text available
In this paper we introduce the Farpoint simulation, the latest member of the Hardware/Hybrid Accelerated Cosmology Code (HACC) gravity-only simulation family. The domain covers a volume of (1000$h^{-1}$Mpc)$^3$ and evolves close to two trillion particles, corresponding to a mass resolution of $m_p\sim 4.6\cdot 10^7 h^{-1}$M$_\odot$. These specifica...
Preprint
The synapse is a central player in the nervous system serving as the key structure that permits the relay of electrical and chemical signals from one neuron to another. The anatomy of the synapse contains important information about the signals and the strength of signal it transmits. Because of their small size, however, electron microscopy (EM) i...
Preprint
Full-text available
Electron microscopy (EM) enables the reconstruction of neural circuits at the level of individual synapses, which has been transformative for scientific discoveries. However, due to the complex morphology, an accurate reconstruction of cortical axons has become a major challenge. Worse still, there is no publicly available large-scale EM dataset fr...
Preprint
Full-text available
Massive upgrades to science infrastructure are driving data velocities upwards while stimulating adoption of increasingly data-intensive analytics. While next-generation exascale supercomputers promise strong support for I/O-intensive workflows, HPC remains largely untapped by live experiments, because data transfers and disparate batch-queueing po...
Preprint
Full-text available
Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upco...
Article
Full-text available
We describe the simulated sky survey underlying the second data challenge (DC2) carried out in preparation for analysis of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) by the LSST Dark Energy Science Collaboration (LSST DESC). Significant connections across multiple science domains will be a hallmark of LSST; the DC2 program...
Article
The Last Journey is a large-volume, gravity-only, cosmological N -body simulation evolving more than 1.24 trillion particles in a periodic box with a side length of 5.025 Gpc. It was implemented using the HACC simulation and analysis framework on the BG/Q system Mira. The cosmological parameters are chosen to be consistent with the results from the...
Preprint
Full-text available
In preparation for cosmological analyses of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST), the LSST Dark Energy Science Collaboration (LSST DESC) has created a 300 deg$^2$ simulated survey as part of an effort called Data Challenge 2 (DC2). The DC2 simulated sky survey, in six optical bands with observations following a refer...
Preprint
Full-text available
We present a fully modular and scalable software pipeline for processing electron microscope (EM) images of brain slices into 3D visualization of individual neurons and demonstrate an end-to-end segmentation of a large EM volume using a supercomputer. Our pipeline scales multiple packages used by the EM community with minimal changes to the origina...
Preprint
Full-text available
We describe the simulated sky survey underlying the second data challenge (DC2) carried out in preparation for analysis of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) by the LSST Dark Energy Science Collaboration (LSST DESC). Significant connections across multiple science domains will be a hallmark of LSST; the DC2 program...
Preprint
The Last Journey is a large-volume, gravity-only, cosmological N-body simulation evolving more than 1.24 trillion particles in a periodic box with a side-length of 5.025Gpc. It was implemented using the HACC simulation and analysis framework on the BG/Q system, Mira. The cosmological parameters are chosen to be consistent with the results from the...
Preprint
We construct an emulator for the halo mass function over group and cluster mass scales for a range of cosmologies, including the effects of dynamical dark energy and massive neutrinos. The emulator is based on the recently completed Mira-Titan Universe suite of cosmological $N$-body simulations. The main set of simulations spans 111 cosmological mo...
Preprint
We introduce the Balsam service to manage high-throughput task scheduling and execution on supercomputing systems. Balsam allows users to populate a task database with a variety of tasks ranging from simple independent tasks to dynamic multi-task workflows. With abstractions for the local resource scheduler and MPI environment, Balsam dynamically p...
Preprint
We describe the Outer Rim cosmological simulation, one of the largest high-resolution N-body simulations performed to date, aimed at promoting science to be carried out with large-scale structure surveys. The simulation covers a volume of (4.225Gpc)^3 and evolves more than one trillion particles. It was executed on Mira, a BlueGene/Q system at the...
Preprint
Full-text available
We describe the first major public data release from cosmological simulations carried out with Argonne's HACC code. This initial release covers a range of datasets from large gravity-only simulations. The data products include halo information for multiple redshifts, down-sampled particles, and lightcone outputs. We provide data from two very large...
Article
For the first time, an automatically triggered, between-pulse fusion science analysis code was run on-demand at a remotely located supercomputer at Argonne Leadership Computing Facility (ALCF, Lemont, Illinois) in support of in-process experiments being performed at DIII-D (San Diego, California). This represents a new paradigm for combining geogra...
Conference Paper
Large experimental collaborations, such as those at the Large Hadron Collider at CERN, have developed large job management systems running hundreds of thousands of jobs across worldwide computing grids. HPC facilities are becoming more important to these data-intensive workflows and integrating them into the experiment job management system is non-...
Article
Full-text available
The use of high-quality simulated sky catalogs is essential for the success of cosmological surveys. The catalogs have diverse applications, such as investigating signatures of fundamental physics in cosmological observables, understanding the effect of systematic uncertainties on measured signals and testing mitigation strategies for reducing thes...
Article
The high-performance computing centers of the future will expand their roles as service providers, and as the machines scale up, so should the sizes of the communities they serve. National facilities must cultivate their users as much as they focus on operating machines reliably. The authors present five interrelated topic areas that are essential...
Article
Full-text available
HEP's demand for computing resources has grown beyond the capacity of the Grid, and these demands will accelerate with the higher energy and luminosity planned for Run II. Mira, the ten petaFLOPs supercomputer at the Argonne Leadership Computing Facility, is a potentially significant compute resource for HEP research. Through an award of fifty mill...
Article
Full-text available
Demand for Grid resources is expected to double during LHC Run II as compared to Run I; the capacity of the Grid, however, will not double. The HEP community must consider how to bridge this computing gap by targeting larger compute resources and using the available compute resources as efficiently as possible. Argonne's Mira, the fifth fastest sup...
Article
As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpge...
Conference Paper
Large format displays are commonplace for viewing large scientific datasets. These displays often find their way into collaborative spaces, allowing for multiple individuals to be collocated with the display, though multi-modal interaction with the displayed content remains a challenge. We have begun development of a tablet-based interaction mode f...
Article
Full-text available
A number of HEP software packages used by the ATLAS experiment, including GEANT4, ROOT and ALPGEN, have been adapted to run on the IBM Blue Gene supercomputers at the Argonne Leadership Computing Facility. These computers use a non-x86 architecture and have a considerably less rich operating environment than in common use in HEP, but also represent...
Conference Paper
Full-text available
PDACS (Portal for Data Analysis Services for Cosmological Simulations) is a Web-based analysis portal that provides access to large simulations and large-scale parallel analysis tools to the research community. It provides opportunities to access, transfer, manipulate, search, and record simulation data, as well as to contribute applications and ca...
Article
Proton computed tomography (pCT) is an imaging modality that has been in development to support targeted dose delivery in proton therapy. It aims to accurately map the distribution of relative stopping power. Because protons traverse material media in non-linear paths, pCT requires individual proton processing. Image reconstruction then becomes a t...
Conference Paper
Proton computed tomography (pCT) is an imaging modality being developed to support targeted dose delivery in proton therapy. It aims to accurately map the distribution of relative stopping power in the imaged body. Because protons traverse material in non-linear paths, pCT requires individual proton processing and image reconstruction becomes a tim...
Conference Paper
We propose GROPHECY, a GPU performance projection framework that can estimate the performance benefit of GPU acceleration without actual GPU programming or hardware. Users need only to skeletonize pieces of CPU code that are targets for GPU acceleration. Code skeletons are automatically transformed in various ways to mimic tuned GPU codes with char...
Article
Full-text available
Science gateways have dramatically simplified the work required by science communities to run their codes on TeraGrid resources. Gateway development typically spans the duration of a particular grant, with the first production runs occurring some months after the award and concluding near the end of the project. Scientists use gateways as a means t...
Conference Paper
Full-text available
A significant obstacle to building usable, web-based interfaces for computational science in a Grid environment is how to deploy scientific applications on computational resources and expose these applications as web services. To streamline the development of these interfaces, we propose a new application framework that can deliver user-defined sci...
Article
Full-text available
A Science Gateway is a computational web portal that includes a community-developed set of tools, applications, and data customized to enable scientists to run scientific simulations, data analysis, and visualization through their web browsers. The major problem of building a science gateway on a Grid environment such as TeraGrid is how to deploy s...
Article
Full-text available
In multi-party collaborative environments, a group of users can share multiple media streams via IP multicasting. However, despite of the efficiency of IP multicast, it is not widely available and alternative application-layer multicast approaches are introduced. Application-layer multicast is advantageous, however, it incurs additional processing...
Article
Full-text available
The Computer Supported Collaborative Work research community has identified that the technology used to support distributed teams of researchers, such as email, instant messaging, and conferencing environments, are not enough. Building from a list of areas where it is believed technology can help support distributed teams, we have divided our effor...
Conference Paper
Full-text available
The Social Informatics Data Grid (SIDGrid) is a new cyberinfrastructure designed to transform how social and behavioral scientists collect and annotate data, collaborate and share data, and analyze and mine large data repositories. The major design goals for the SIDGrid are to integrate those commonly used social and behavior science tools and prov...
Article
Full-text available
Increasingly massive datasets produced by simulations beg the question How will we connect this data to the computational and display resources that support visualization and analysis? This question is driving research into new approaches to allocating computational, storage, and network resources. In this paper we explore potential solutions that...
Conference Paper
Full-text available
Problem solving environments (PSEs) are increasingly important for scientific discovery. Today's most challenging problems often require multi-disciplinary teams, the ability to analyze very large amounts of data, and the need to rely on infrastructure built by others rather than reinventing solutions for each science team. The TeraGrid Science Gat...
Article
Full-text available
Connecting expensive and scarce visual data analysis resources to end-users is a major challenge today. We describe a flexible mechanism for meeting this challenge based on commodity compression technologies for streaming video. The advantages of this approach include simplified application development, access to generic client components for viewi...
Conference Paper
Full-text available
Collaboration is often an afterthought to a project or development. In this paper we describe and analyze our experiences in developing collaborative technologies, most often involving the sharing of visual information. We have often developed these in a context that required us to retrofit existing analysis applications with collaboration capabili...
Article
Full-text available
The Social Informatics Data Grid is a new infrastructure designed to transform how social and behavioral scientists collect and annotate data, collaborate and share data, and analyze and mine large data repositories. An important goal of the project is to be compatible with existing databases and tools that support the sharing, storage and retrieva...
Conference Paper
Advanced collaborative computing environments are one of the most important tools for integrating high-performance computers and computations and for interacting with colleagues around the world. However, heterogeneous characteristics such as network transfer rates, computational abilities, and hierarchical systems make the seamless integration of...
Conference Paper
Full-text available
Distributed computing middleware needs to support a wide range of resources, such as diverse software components, various hardware devices, and heterogeneous operating systems and architectures. Current technologies are unable to implement a maintenance-free platform to be compatible with such different computing environments. This situation is pre...
Article
Web services, using WSDL and SOAP and following the WS-I's Basic Profile 1.0, have become the lingua franca for building service-oriented systems. Until recently, the development of tools for Web service took a significant amount of time, hindering the deployment and thus the adoption of Web services as a real technology. With the advent of the Web...
Article
An Access Grid Venue Server provides access to individual Virtual Venues, virtual spaces where users can collaborate using the Access Grid Venue Client software. This manual describes the Venue Server component of the Access Grid Toolkit, version 2.3. Covered here are the basic operations of starting a venue server, modifying its configuration, and...

## Citations

... Many projects have developed customized solutions for linking various scientific instruments with HPC, for example in biomedicine [55], environmental science [56,57], and disaster response [58,59]; using HPC to analyze large data [60]; and providing on-demand access to HPC [61,62]. Such applications have motivated the development of specialized interfaces for remote job submission [63,64] and for managing workloads across systems [65,66,67,68]. The LBNL superfacility project has studied requirements for linking instruments with HPC [69] and proposed an OAuth-based asynchronous API [70] that is similar to our action provider interface. ...
... "Anton" [615] is a family of supercomputers designed from scratch to solve precisely this one problem; recall from Fig. 5 that MD is one subset of the broader spatiotemporal space needed to simulate dynamics of our world. Recently, researchers at Argonne National Laboratory developed an AI-driven simulation framework for solving the same MD problem, yielding 50x speedup in time to solutions over the traditional HPC method [616]. And some work suggests these approaches need not be mutually exclusive: it has been shown in the context of computational fluid dynamics that traditional HPC workloads can be run alongside AI training to provide accelerated data-feed paths [617]. ...
... Medical image datasets are generally annotated by professional physicians (Demner-Fushman et al., 2016;Almazroa et al., 2017;Johnson et al., 2019;Zhang et al., 2019;Lin et al., 2021;Ma et al., 2021;Wei et al., 2021). To construct an annotated dataset for edge detection or image segmentation tasks, annotators often need to annotate points and connect them into an object outline. ...
... AlignTK (Bock et al., 2011) implements scalable deformable 2D stitching and serial section alignment for large serial section datasets using local cross-correlation. An end-to-end pipeline to perform volume assembly and segmentation using existing tools was developed by R. Vescovi, 2020 and was designed to run on varied computational systems. The pipeline was shown to process smaller datasets through supercomputers efficiently. ...
... Next we use the Dark Energy Science Collaboration's (DESC; Abolfathi et al. 2021) Core Cosmology Library (CCL; Chisari et al. 2019) to predict theory values for the convergence and clustering power spectra C . CCL takes the redshift distributions n(z) and cosmological parameters and computes C , which we pass with the n(z) to the Full-sky Lognormal Astro-fields Simulation Kit (FLASK; Xavier et al. 2016) to simulate the convergence and den- sity maps. ...
... Meanwhile, numerical simulation is essential to understand or interpret the large body of observational data because of the nonlinear nature of Cosmic structure formation and evolution. Recent Cosmological simulations not only are able to predict abundance and clustering of galaxies and their dark matter haloes Boylan-Kolchin et al. 2009;Klypin et al. 2011;Prada et al. 2012;Angulo et al. 2012;Gao et al. 2012;Heitmann et al. 2015;Ishiyama et al. 2015;Klypin et al. 2016;Habib et al. 2016;Makiya et al. 2016;Potter et al. 2017;Garrison et al. 2018;Heitmann et al. 2019;Vogelsberger et al. 2020;Heitmann et al. 2021;Maksimova et al. 2021;Frontiere et al. 2021;Ishiyama et al. 2021;Angulo & Hahn 2022), but also their internal properties, for example, morphological types, metalicity as well as some gaseous properties (Vogelsberger et al. 2014;Schaye et al. 2015;Sĳacki et al. 2015;McAlpine et al. 2016). ...
... Emulators also have the advantage of being able to model complex trends in the data, so that they are being progressively constructed for many more kinds of large-scale structure statistics. For instance, emulators have been used in modelling the nonlinear matter power spectrum Lawrence et al. 2017;DeRose et al. 2019;Euclid Collaboration 2019;Angulo et al. 2021) even beyond KCDM (Winther et al. 2019;Ramachandra et al. 2021;Arnold et al. 2021) and for baryonic effects (Aricò et al. 2021b;Giri and Schneider 2021), the galaxy power spectrum and correlation function (Kwan et al. 2015;Zhai et al. 2019), weak lensing peak counts and power spectra Petri et al. 2015), the 21-cm power spectrum (Jennings et al. 2019), the halo mass function Bocquet et al. 2020), halo clustering statistics (Nishimichi et al. 2019;Kobayashi et al. 2020). Traditionally, emulators have been built with Gaussian processes (GP), but more recently feed forward neural networks are gaining popularity as they allow to deal with larger datasets and a high number of dimensions. ...
... Recently, a variety of scientific applications adopt deep learning methods to solve their classification and regression problems [1]- [4]. However, training deep and large networks is an extremely time-consuming task that can take hours or even days. ...
... Many projects have developed customized solutions for linking various scientific instruments with HPC, for example in biomedicine [55], environmental science [56,57], and disaster response [58,59]; using HPC to analyze large data [60]; and providing on-demand access to HPC [61,62]. Such applications have motivated the development of specialized interfaces for remote job submission [63,64] and for managing workloads across systems [65,66,67,68]. The LBNL superfacility project has studied requirements for linking instruments with HPC [69] and proposed an OAuth-based asynchronous API [70] that is similar to our action provider interface. ...
... The CosmoDC2 extragalactic catalog (CosmoDC2, Kovacs et al. (2022b)) was produced using a data-driven approach to semianalytic modeling of the galaxy population, applied to the largevolume 'Outer Rim' simulation (Heitmann et al. 2019). By using empirical methods for comparing the simulated galaxy population with real observational datasets, the results of the semianalytic model Galacticus (Benson 2012) were resampled (Hearin et al. 2020) to provide a galaxy population that met the simulation specifications for use of the catalog for cosmological science cases, including realistic effects such as variation in galaxy properties with redshift, and an appropriate level of galaxy blending. ...