Elizabeth R. Jessup’s research while affiliated with University of Colorado Boulder and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (49)


Making Root Cause Analysis Feasible for Large Code Bases: A Solution Approach for a Climate Model
  • Conference Paper

June 2019

·

23 Reads

·

12 Citations

·

·

Dorit M. Hammerling

·

[...]

·

Thomas Hauser

Large-scale simulation codes that model complicated science and engineering applications typically have huge and complex code bases. For such simulation codes, where bit-for-bit comparisons are too restrictive, finding the source of statistically significant discrepancies (e.g., from a previous version, alternative hardware or supporting software stack) in output is non-trivial at best. Although there are many tools for program comprehension through debugging or slicing, few (if any) scale to a model as large as the Community Earth System Model (CESM#8482;), which consists of more than 1.5 million lines of Fortran code. Currently for the CESM, we can easily determine whether a discrepancy exists in the output using a by now well-established statistical consistency testing tool. However, this tool provides no information as to the possible cause of the detected discrepancy, leaving developers in a seemingly impossible (and frustrating) situation. Therefore, our aim in this work is to provide the tools to enable developers to trace a problem detected through the CESM output to its source. To this end, our strategy is to reduce the search space for the root cause(s) to a tractable size via a series of techniques that include creating a directed graph of internal CESM variables, extracting a subgraph (using a form of hybrid program slicing), partitioning into communities, and ranking nodes by centrality. Runtime variable sampling then becomes feasible in this reduced search space. We demonstrate the utility of this process on multiple examples of CESM simulation output by illustrating how sampling can be performed as part of an efficient parallel iterative refinement procedure to locate error sources, including sensitivity to CPU instructions. By providing CESM developers with tools to identify and understand the reason for statistically distinct output, we have positively impacted the CESM software development cycle and, in particular, its focus on quality assurance.


Figure 1: Process flow schematic of our methods.
Figure 2: Example statement as source code and its directed graph representation.
Figure 3: Converting Fortran files into a metagraph.
Figure 5: RAND-MT first iteration. Variables computed using numbers generated by the Mersenne Twister PRNG are larger red nodes. Larger orange nodes indicate those with the largest eigenvector in-centrality.
Figure 6: RAND-MT second iteration. Variables computed using numbers generated by the Mersenne Twister PRNG are larger red nodes. Subfigure (a) is the result of algorithm 5.4 step 8a (no different values detected), subfigure (b) colors members of each community detected by step 5, and subfigure (c) represents the output of step 7 for the community containing the bugs. Larger orange nodes indicate those with the largest eigenvector in-centrality that can be sampled at runtime. We choose three here given the small size of the subgraph.

+7

Making root cause analysis feasible for large code bases: a solution approach for a climate model
  • Preprint
  • File available

October 2018

·

218 Reads

For large-scale simulation codes with huge and complex code bases, where bit-for-bit comparisons are too restrictive, finding the source of statistically significant discrepancies (e.g., from a previous version, alternative hardware or supporting software stack) in output is non-trivial at best. Although there are many tools for program comprehension through debugging or slicing, few (if any) scale to a model as large as the Community Earth System Model (CESM; trademarked), which consists of more than 1.5 million lines of Fortran code. Currently for the CESM, we can easily determine whether a discrepancy exists in the output using a by now well-established statistical consistency testing tool. However, this tool provides no information as to the possible cause of the detected discrepancy, leaving developers in a seemingly impossible (and frustrating) situation. Therefore, our aim in this work is to provide the tools to enable developers to trace a problem detected through the CESM output to its source. To this end, our strategy is to reduce the search space for the root cause(s) to a tractable size via a series of techniques that include creating a directed graph of internal CESM variables, extracting a subgraph (using a form of hybrid program slicing), partitioning into communities, and ranking nodes by centrality. Runtime variable sampling then becomes feasible in this reduced search space. We demonstrate the utility of this process on multiple examples of CESM simulation output by illustrating how sampling can be performed as part of an efficient parallel iterative refinement procedure to locate error sources, including sensitivity to CPU instructions. By providing CESM developers with tools to identify and understand the reason for statistically distinct output, we have positively impacted the CESM software development cycle and, in particular, its focus on quality assurance.

Download

Table 2 . These CLM experiments show agreement between CAM- ECT and UF-CAM-ECT as well as with the expected outcome. The CAM-ECT column is the result of a single ECT test on three runs. The UF-CAM-ECT column represents EET failure rates from 30 runs. 
Figure 3. Box plot of EET failure rate distributions as a function of ensemble size. The distributions are generated by randomly selecting a number of simulations (ensemble size) from a set of 801 simulations to compute PC loadings. From the remaining set, 30 simulations are chosen at random. These simulations are projected into the PC space of the ensemble and evaluated via EET. For each ensemble size, 100 ensembles are created and 100 experimental sets are selected and evaluated. Thus each distribution contains 10 000 EET results (40 600 000 total tests per distribution). The red horizontal line indicates the chosen false positive rate of 0.5 %. 
Table 3 . These experiments represent disagreement between UF- CAM-ECT and fail CAM-ECT. Shown are the EET failure rates from 30 runs. 
Figure 4. Each box plot represents the statistical distribution of the difference between the global mean of each variable and the unperturbed, ensemble global mean, then scaled by the unperturbed, ensemble global mean for both the 30 ensemble members and 30 CLM_ALBICE_00 members. The plots on the left (a) are generated from nine time step simulations, while those on the right (b) are from one simulation year. 
Nine time steps: ultra-fast statistical consistency testing of the Community Earth System Model (pyCECT v3.0)

February 2018

·

102 Reads

·

24 Citations

The Community Earth System Model Ensemble Consistency Test (CESM-ECT) suite was developed as an alternative to requiring bitwise identical output for quality assurance. This objective test provides a statistical measurement of consistency between an accepted ensemble created by small initial temperature perturbations and a test set of CESM simulations. In this work, we extend the CESM-ECT suite with an inexpensive and robust test for ensemble consistency that is applied to Community Atmospheric Model (CAM) output after only nine model time steps. We demonstrate that adequate ensemble variability is achieved with instantaneous variable values at the ninth step, despite rapid perturbation growth and heterogeneous variable spread. We refer to this new test as the Ultra-Fast CAM Ensemble Consistency Test (UF-CAM-ECT) and demonstrate its effectiveness in practice, including its ability to detect small-scale events and its applicability to the Community Land Model (CLM). The new ultra-fast test facilitates CESM development, porting, and optimization efforts, particularly when used to complement information from the original CESM-ECT suite of tools.


Figure 1. Representation of effects of initial CAM temperature perturbation over 11 time steps (including t = 0). CAM variables are listed on the vertical axis, and the horizontal axis records the simulation time step. The color bar designates equality of the corresponding variables between the unperturbed and perturbed simulations' area weighted global means after being rounded to n significant digits (n is the color) at each time step. Time steps where the corresponding variable was not computed (subcycled variables) are colored black. White indicates equality of greater than 9 significant digits (i.e. 10-17). Red variable names are not used by UF-CAM-ECT.
the test results of the experiments. The pass and fail results in this table reflect our high confidence in
Nine time steps: ultra-fast statistical consistency testing of the Community Earth System Model (pyCECT v3.0)

April 2017

·

56 Reads

·

3 Citations

Geoscientific Model Development Discussions

The Community Earth System Model Ensemble Consistency Test (CESM-ECT) suite was developed as an alternative to requiring bitwise identical output for quality assurance. This objective test provides a statistical measurement of consistency between an accepted ensemble created by small initial temperature perturbations and a test set of CESM simulations. In this work, we extend the CESM-ECT suite by the addition of an inexpensive and robust test for ensemble consistency that is applied to Community Atmospheric Model (CAM) output after only nine model time steps. We demonstrate that adequate ensemble variability is achieved with instantaneous variable values at the ninth step, despite rapid perturbation growth and heterogeneous variable spread. We refer to this new test as the Ultra-Fast CAM Ensemble Consistency Test (UF-CAM-ECT) and demonstrate its effectiveness in practice, including its ability to detect small-scale events and its applicability to the Community Land Model (CLM). The new ultra-fast test facilitates CESM development, porting, and optimization efforts, particularly when used to complement information from the original CESM-ECT suite of tools.


Figure 1: Exhaustive failure percentages for code modifications from Sect. 3 against original size 151 ensembles from [2].
Figure 3: EET failure percentage grouped by code modification experiment Sect. 3. Colors and hatching indicate ensemble used in comparison. For example, sz300-r1 is 100 Intel-r1, 100 GNU-r1, and 100 PGI-r1 combined. The failure rates of these experiments against the sz453 ensembles are close to 0.5%.
Towards Characterizing the Variability of Statistically Consistent Community Earth System Model Simulations

December 2016

·

150 Reads

·

27 Citations

Procedia Computer Science

Large, complex codes such as earth system models are in a constant state of development, requiring frequent software quality assurance. The recently developed Community Earth System Model (CESM) Ensemble Consistency Test (CESM-ECT) provides an objective measure of statistical consistency for new CESM simulation runs, which has greatly facilitated error detection and rapid feedback for model users and developers. CESM-ECT determines consistency based on an ensemble of simulations that represent the same earth system model. Its statistical distribution embodies the natural variability of the model. Clearly the composition of the employed ensemble is critical to CESM-ECT's effectiveness. In this work we examine whether the composition of the CESM-ECT ensemble is adequate for characterizing the variability of a consistent climate. To this end, we introduce minimal code changes into CESM that should pass the CESM-ECT, and we evaluate the composition of the CESM-ECT ensemble in this context. We suggest an improved ensemble composition that better captures the accepted variability induced by code changes, compiler changes, and optimizations, thus more precisely facilitating the detection of errors in the CESM hardware or software stack as well as enabling more in-depth code optimization and the adoption of new technologies.


Optimizing Weather Model Radiative Transfer Physics for Intel’s Many Integrated Core (MIC) Architecture

December 2016

·

31 Reads

·

13 Citations

Parallel Processing Letters

Large numerical weather prediction (NWP) codes such as the Weather Research and Forecast (WRF) model and the NOAA Nonhydrostatic Multiscale Model (NMM-B) port easily to Intel's Many Integrated Core (MIC) architecture. But for NWP to significantly realize MIC's one- to two-TFLOP/s peak computational power, we must expose and exploit thread and fine-grained (vector) parallelism while overcoming memory system bottlenecks that starve oating-point performance. We report on our work to improve the Rapid Radiative Transfer Model (RRTMG), responsible for 10-20 percent of total NMM-B run time. We isolated a standalone RRTMG benchmark code and workload from NMM-B and then analyzed performance using hardware performance counters and scaling studies. We restructured the code to improve vectorization, thread parallelism, locality, and thread contention. The restructured code ran three times faster than the original on MIC and, also importantly, 1.3x faster than the original on the host Xeon Sandybridge.


A survey on software methods to improve the energy efficiency of parallel computing

September 2016

·

304 Reads

·

46 Citations

The International Journal of High Performance Computing Applications

Energy consumption is one of the top challenges for achieving the next generation of supercomputing. Codesign of hardware and software is critical for improving energy efficiency (EE) for future large-scale systems. Many architectural power-saving techniques have been developed, and most hardware components are approaching physical limits. Accordingly, parallel computing software, including both applications and systems, should exploit power-saving hardware innovations and manage efficient energy use. In addition, new power-aware parallel computing methods are essential to decrease energy usage further. This article surveys software-based methods that aim to improve EE for parallel computing. It reviews the methods that exploit the characteristics of parallel scientific applications, including load imbalance and mixed precision of floating-point (FP) calculations, to improve EE. In addition, this article summarizes widely used methods to improve power usage at different granularities, such as the whole system and per application. In particular, it describes the most important techniques to measure and to achieve energy-efficient usage of various parallel computing facilities, including processors, memories, and networks. Overall, this article reviews the state-of-the-art of energy-efficient methods for parallel computing to motivate researchers to achieve optimal parallel computing under a power budget constraint.


Fig. 2. Scopes for counter association  
Table 2 The data movement for a single iteration of preconditioned conjugate gradient solver in POP using the test grid. The values of W SL P and W SL M are in Kbytes
Fig. 3. Translation of code in Figure 2
Table 4 Source code lines for MATLAB prototype. 
Table 5 Barotropic execution time for 20 timesteps of POP in seconds using the test grid on a single processor
SLAMM – Automating Memory Analysis for Numerical Algorithms

September 2010

·

56 Reads

·

1 Citation

Electronic Notes in Theoretical Computer Science

Memory efficiency is overtaking the number of floating-point operations as a performance determinant for numerical algorithms. Integrating memory efficiency into an algorithm from the start is made easier by computational tools that can quantify its memory traffic. The Sparse Linear Algebra Memory Model (SLAMM) is implemented by a source-to-source translator that accepts a MATLAB specification of an algorithm and adds code to predict memory traffic.Our tests on numerous small kernels and complete implementations of algorithms for solving sparse linear systems show that SLAMM accurately predicts the amount of data loaded from the memory hierarchy to the L1 cache to within 20% error on three different compute platforms. SLAMM allows us to evaluate the memory efficiency of particular choices rapidly during the design phase of an iterative algorithm, and it provides an automated mechanism for tuning exisiting implementations. It reduces the time to perform a priori memory analysis from as long as several days to 20 minutes.


Figure 6: L1 misses per flop for the fused outer loops and fully fused versions of multiple matrix-vector multiplies. nvecs is number of multiplies. 
Figure 8: Fully Fused Assembly 
Understanding Memory Effects in the Automated Generation of Optimized Matrix Algebra Kernels

May 2010

·

53 Reads

·

2 Citations

Procedia Computer Science

Efficient implementation of matrix algebra is important to the performance of many large and complex physical models. Among important tuning techniques is loop fusion which can reduce the amount of data moved between memory and the processor. We have developed the Build to Order (BTO) compiler to automate loop fusion for matrix algebra kernels. In this paper, we present BTO’s analytic memory model which substantially reduces the number of loop fusion options considered by the compiler. We introduce an example that motivates the inclusion of registers in the model. We demonstrate how the model’s modular design facilitates the addition of register allocation to the model’s set of memory components, improving its accuracy.



Citations (23)


... In future work we plan to develop better tools for analyzing test failures when they do occur. The RUANDA tool (Milroy et al., 2019;Ahn et al., 2021) makes significant progress toward the goal of providing insight into the cause of a test failure, but more work is needed to better enable its use in practice. Deeper investigation into the effects of FMA or precision changes will help further our understanding of how non-scientific model changes affect output distributions. ...

Reference:

The ensemble consistency test: from CESM to MPAS and beyond
Making Root Cause Analysis Feasible for Large Code Bases: A Solution Approach for a Climate Model
  • Citing Conference Paper
  • June 2019

... Keeping the simulations short, is motivated by the cost that long ensemble simulations cause. The works from Baker et al. and Milroy et al., for instance, focus on simulation lengths of 1 year or shorter (Baker et al., 2015;Milroy et al., 2018). ...

Nine time steps: ultra-fast statistical consistency testing of the Community Earth System Model (pyCECT v3.0)

... The annual results for Summit in Table 4 were produced by running sets of 30 yearly simulations of CESM with the same conguration as in [2]. We also ran 30 "ultra-fast" simulations (nine model time steps) on Summit with the same con guration as in [12]. Both the annual and ultra-fast runs were compiled with the Intel 17 and Intel MPI 17 compiler suites. ...

Nine time steps: ultra-fast statistical consistency testing of the Community Earth System Model (pyCECT v3.0)

Geoscientific Model Development Discussions

... In one case, a modern radiation scheme was made roughly 3 times faster by combining a refactoring of the solver with replacing the gas optics module with a NN version (Ukkonen et al., 2020). In another, refactoring the RRTMG radiation scheme also gave a threefold speed-up on targeted Intel hardware (Michalakes et al., 2016). In many legacy codes, the baseline performance may be much worse (Michalakes et al., 2016). ...

Optimizing Weather Model Radiative Transfer Physics for Intel’s Many Integrated Core (MIC) Architecture
  • Citing Article
  • December 2016

Parallel Processing Letters

... While this correlation is helpful, it is not always typical. In scenarios involving dynamically changing frequency, parallelization, and memory-bound workloads, energy consumption can exhibit different trends compared to execution time [19]. Therefore, the relationship between these two factors warrants further investigation. ...

A survey on software methods to improve the energy efficiency of parallel computing
  • Citing Article
  • September 2016

The International Journal of High Performance Computing Applications

... The term "consistency" emphasizes the fact that some statistical form of correctness is still possible without BFB equivalence. A major practical improvement occurred by moving to ultrafast (hence UF-ECT) model runs, as described in Milroy et al. (2018). For UF-ECT, models were run for just 4.5 simulation hours (or nine time steps) compared to the 1-year-long runs in the original ECT work (in this work we will focus exclusively on the UF-ECT method). ...

Towards Characterizing the Variability of Statistically Consistent Community Earth System Model Simulations

Procedia Computer Science

... As pointed out by [18] the tridiagonal problem can be the computational bottleneck for large problems taking nearly 70 ∼80% of the total time to solve the entire dense problem. As a result, numerous methods exist for the numerical computation of the eigenvalues of a real tridiagonal matrix to high accuracy, see, e.g., [2,10,11]. To find eigenvalues of a symmetric tridiagonal matrix typically requires O(N 2 ) operations [8], although fast algorithms exist which require O(N ln N) [6]. ...

Solving the Symmetric Tridiagonal Eigenvalue Problem on the Hypercube
  • Citing Article
  • March 1990

SIAM Journal on Scientific and Statistical Computing

... As a brief aside, for real matrices, assuming standard matrix multiplication algorithms, the highest-order terms for the operation counts of dense Gaussian Elimination and an MRRRbased [34] Hermitian eigensolver are (2/3)n 3 and (10/3)n 3 [35]. But the coefficient for the Hermitian eigensolver is misleadingly small, as the initial phase of a Hermitian eigensolver traditionally involves a unitary reduction to Hermitian tridiagonal form that, due to only modest potential for data reuse, executes significantly less efficiently than traditional dense factorizations. ...

Toward an Efficient Parallel Eigensolver for Dense Symmetric Matrices
  • Citing Article
  • January 1998

SIAM Journal on Scientific Computing

... The development to f numerical processes was greatly aided by Newton and Leibniz's invention of calculus, which produced precise mathematical representations of physical reality first and then in other disciplines such as engineering, healthcare, and commerce. Typically, these mathematical models cannot be solved explicitly; therefore, approximate solutions must be obtained using numerical approaches [3][4] [5].In this paper, Python Programming is used to solve system of linear equations by Gauss Elimination and Gauss Jordan method. ...

An Introduction to High-Performance Scientific Computing
  • Citing Article
  • December 1996

Physics Today