# Hofmann HeikeIowa State University | ISU · Department of Statistics

Hofmann Heike

PhD

## About

126

Publications

42,073

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

3,235

Citations

Introduction

Additional affiliations

January 2002 - present

January 2000 - January 2001

## Publications

Publications (126)

When wires are cut, the tool produces striations on the cut surface; as in other forms of forensic analysis, these striation marks are used to connect the evidence to the source that created them. Here, we argue that the practice of comparing two wire cut surfaces introduces complexities not present in better-investigated forensic examination of to...

In the past decade, and in response to the recommendations set forth by the National Research Council Committee on Identifying the Needs of the Forensic Sciences Community (2009), scientists have conducted several black-box studies that attempt to estimate the error rates of firearm examiners. Most of these studies have resulted in vanishingly smal...

Defining error rates for firearms evidence ★ Impact of inconclusive decisions on error rates ★ Predictive probabilities and errors

Statistical inference provides the protocols for conducting rigorous science, but data plots provide the opportunity to discover the unexpected. These disparate endeavors are bridged by visual inference, where a lineup protocol can be employed for statistical testing. Human observers are needed to assess the lineups, typically using a crowd‐sourcin...

It has been approximately 100 years since the very first formal experimental evaluations of statistical charts were conducted. In that time, technological changes have impacted both our charts and our testing methods, resulting in a dizzying array of charts, many different taxonomies to classify graphics, and several different philosophical approac...

Recent advances in microscopy have made it possible to collect 3D topographic data, enabling more precise virtual comparisons based on the collected 3D data as a supplement to traditional comparison microscopy and 2D photography. Automatic comparison algorithms have been introduced for various scenarios, such as matching cartridge cases [1,2] or ma...

Land engraved areas (LEAs) provide evidence to address the same source–different source problem in forensic firearms examination. Collecting 3D images of bullet LEAs requires capturing portions of the neighboring groove engraved areas (GEAs). Analyzing LEA and GEA data separately is imperative to accuracy in automated comparison methods such as the...

The 2009 National Academy of Sciences report found pattern‐evidence disciplines to be rife with subjectivity. In the decade since, machine learning methods have been developed to try to address that issue. By Alicia Carriquiry, Heike Hofmann, Xiao Hui Tai and Susan VanderPlas.

The Statistical Atlases published by the Census Bureau in the late 1800s utilized a number of novel methods for displaying data. In this paper, we examine the use of framed spine and mosaic plots used in two plates of the Statistical Atlas of 1870. We use forensic statistics to recreate the data using available census information, and then use that...

Q-Q plots allow us to assess univariate distributional assumptions by comparing a set of quantiles from the empirical and the theoretical distributions in the form of a scatterplot. To aid in the interpretation of Q-Q plots, reference lines and confidence bands are often added. We can also detrend the Q-Q plot so the vertical comparisons of interes...

The same‐source problem remains a major challenge in forensic toolmark and firearm examination. Here, we investigate the applicability of the Chumbley method (J Forensic Sci, 2018, 63, 849; J Forensic Sci, 2010, 55, 953) (10,12), developed for screwdriver markings, for same‐source identification of striations on bullet LEAs. The Hamby datasets 44 a...

In 2009, the National Academy of Sciences published a report questioning the scientific validity of many forensic methods including firearm examination. Firearm examination is a forensic tool used to help the court determine whether two bullets were fired from the same gun barrel. During the firing process, rifling, manufacturing defects, and impur...

Bulletmatching is a process used todeterminewhether twobulletsmayhave beenfired fromthe same gun barrel.Historically, this has been amanual process performed by trained forensic examiners. Recent work, however, has shown that it is possible to add statistical validity and objectivity to the procedure. In this article, we build upon the algorithms e...

Graphics play a crucial role in statistical analysis and data mining. Being able to quantify structure in data that is visible in plots, and how people read the structure from plots is an ongoing challenge. The lineup protocol provides a formal framework for data plots, making inference possible. The data plot is treated like a test statistic, and...

The detection of community structures within network data is a type of graph analysis with increasing interest across a broad range of disciplines. In a network, communities represent clusters of nodes that exhibit strong intra-connections or relationships among nodes in the cluster. Current methodology for community detection often involves an alg...

This paper explores three different approaches to visualize networks by building on the grammar of graphics framework implemented in the ggplot2 package. The goal of each approach is to provide the user with the ability to apply the flexibility of ggplot2 to the visualization of network data, including through the mapping of network attributes to s...

The complexity of linear mixed-effects (LME) models means that traditional diagnostics are rendered less effective. This is due to a breakdown of asymptotic results, boundary issues, and visible patterns in residual plots that are introduced by the model fitting process. Some of these issues are well known and adjustments have been proposed. Workin...

Boxplots (Tukey, 1977) are useful displays that convey rough information about the distribution of a variable. Boxplots were designed to be drawn by hand and work best for small data sets, where detailed estimates of tail behavior beyond the quartiles may not be trustworthy. Larger data sets afford more precise estimates of tail behavior, but boxpl...

This study examines the relationship between managerial gender diversity (MGD) and firm performance. It outlines how extremely low and extremely high levels of MGD can trigger group processes that can impede the attainment of the performance benefits associated with moderate levels of MGD. Findings from longitudinal panel data from financial servic...

Graphics are very effective for communicating numerical information quickly and efficiently, but many of the design choices we make are based on subjective measures, such as personal taste or conventions of the discipline rather than objective criteria. We briefly introduce perceptual principles such as preattentive features and gestalt heuristics,...

In 2009, the National Academy of Sciences published a report questioning the
scientific validity of many forensic methods including firearm examination.
Firearm examination is a forensic tool used to determine whether two bullets
were fired from the same gun barrel. During the firing process, rifling,
manufacturing defects, and impurities in the ba...

A library of common geometric shapes can be used to train our brains for understanding data structure in high-dimensional Euclidean space. This article describes the methods for producing cubes, spheres, simplexes, and tori in multiple dimensions. It also describes new ways to define and generate high-dimensional tori. The algorithms are described,...

Graphical representations have to be true to the data they display. Computational tools ensure this on a technical level. But we also need to take “flaws” of the human perceptual system into account. The sine illusion provides an example where human perception leads to systematic bias in the assessment of the optical stimulus, with a particularly n...

Missing values are common in data, and usually require attention in order to conduct the statistical analysis. One of the first steps is to explore the structure of the missing values, and how missingness relates to the other collected variables. This article describes an R package, that provides a graphical user interface (GUI) designed to help ex...

Overall attendance at DinoFun World is characterized in figure 2. The number of moves park-goers make is charted for each minute of the day along a horizontal time axis. We can learn a couple of things from this plot: (1) the Scott Jones show was held from 10 to 11 during all mornings and from 3 to 4 in the afternoons of Friday and Saturday. We can...

Two IDs are notable for their large volume of messages: 1278894 and 839736. These IDs are responsible for almost 80% of the message volume. Both these ids are stationary, sending messages from the Entry Corridor only. From the pattern of messages sent and received we are able to identify these ids as the park's help line (839736) and the Cindysauru...

We encounter hierarchical data structures in a wide range of applications. Regular linear models are extended by random effects to address correlation between observations in the same group. Inference for random effects is sensitive to distributional mis-specifications of the model, making checks for (distributional) assumptions particularly import...

Graphics convey numerical information very efficiently, but rely on a different set of mental processes than tabular displays. Here, we present a study relating demographic characteristics and visual skills to perception of graphical lineups. We conclude that lineups are essentially a classification test in a visual domain, and that performance on...

A prominent issue in statistics education is the sometimes large disparity between the theoretical and the computational coursework. discreteRV is an R package for manipulation of discrete random variables which uses clean and familiar syntax similar to the mathematical notation in introductory probability courses. The package offers functions that...

Visualization can help in model building, diagnosis, and in developing an understanding about how a model summarizes data. This paper proposes three strategies for visualizing statistical models: (i) display the model in the data space, (ii) look at all members of a collection, and (iii) explore the process of model fitting, not just the end result...

Libraries of randomised peptides displayed on phages or viral particles are essential tools in a wide spectrum of applications. However, there is only limited understanding of a library's fundamental dynamics and the influences of encoding schemes and sizes on their quality. Numeric properties of libraries, such as the expected number of different...

Statistical graphics play an important role in exploratory data analysis, model checking and diagnosis. With high dimensional data, this often means plotting low-dimensional projections, for example, in classification tasks projection pursuit is used to find low-dimensional projections that reveal differences between labelled groups. In many contem...

In statistical modeling we strive to specify models that resemble data
collected in studies or observed from processes. Consequently, distributional
specification and parameter estimation are central to parametric models.
Graphical procedures, such as the quantile-quantile (Q-Q) plot, are arguably
the most widely used method of distributional asses...

Linear mixed-effects (LME) models are versatile models that account for dependence structures when data are composed of groups. The additional flexibility of random effects models comes at the cost of complicating model exploration and validation due to the breakdown of asymptotic results, boundary issues, and patterns in diagnostic plots inherent...

Temporal data is information measured in the context of time. This contextual
structure provides components that need to be explored to understand the data
and that can form the basis of interactions applied to the plots. In
multivariate time series we expect to see temporal dependence, long term and
seasonal trends and cross-correlations. In longi...

Three-dimensional orientation data, with observations as rotation matrices, have applications in areas such as computer science, kinematics and materials sciences, where it is often of interest to estimate a central orientation parameter represented by a rotation matrix. A well-known estimator of this parameter is the projected arithmetic mean and,...

Graphics play a crucial role in statistical analysis and data mining. This
paper describes metrics developed to assist the use of lineups for making
inferential statements. Lineups embed the plot of the data among a set of null
plots, and engage a human observer to select the plot that is most different
from the rest. If the data plot is selected i...

Visual statistical inference is a way to determine significance of patterns
found while exploring data. It is dependent on the evaluation of a lineup, of a
data plot among a sample of null plots, by human observers. Each individual is
different in their cognitive psychology and judiciousness, which can affect the
visual inference. The usual way to...

Objective:
This study aims to assess the overall safety and potential endometrium-stimulating effects of soy isoflavone tablets consumed (3 y) by postmenopausal women and to determine endometrial thickness response to treatment among compliant women, taking into account hormone concentrations and other hypothesized modifying factors.
Methods:
We...

In this article we introduce the rotations package which provides users with the ability to simulate, analyze and visualize three-dimensional rotation data. More specifically it includes four commonly used distributions from which to simulate data, four estimators of the central orientation, six confidence region estimation procedures and two appro...

One of the big challenges of developing interactive statistical applications is the management of the data pipeline, which controls transformations from data to plot. The user’s interactions needs to be propagated through these modules and reflected in the output representation at a fast pace. Each individual module may be easy to develop and manag...

This paper investigates some of the immediate impacts of the Deepwater Horizon oil spill of 2010 on the environment using graphical means. The exploration focuses on the effects of the oil discharge on wildlife, the chemical pollution in the area following the spill, and salinity levels in the aftermath of the spill. Thousands of animals including...

A first step in exploring population structure in crop plants and other
organisms is to define the number of subpopulations that exist for a given data
set. The genetic marker data sets being generated have become increasingly
large over time and commonly are of the high-dimension, low sample size (HDLSS)
situation. An algorithm for deciding the nu...

Over the last twenty years there have been numerous developments in diagnostic pro- cedures for hierarchical linear models; however, these procedures are not widely imple- mented in statistical software packages, and those packages that do contain a complete framework for model assessment are not open source. The lack of availability of diagnostic...

Visualizations are great tools of communications-they summarize findings and quickly convey main messages to our audience. As designers of charts we have to make sure that information is shown with a minimum of distortion. We have to also consider illusions and other perceptual limitations of our audience. In this paper we discuss the effect and st...

Data as three-dimensional rotations have application in computer science, kinematics, and materials sciences, among other areas. Estimating the central orientation from a sample of such data is an important problem, which is complicated by the fact that several different approaches exist for this, motivated by various geometrical and decision-theor...

Graphics are good for showing the information in datasets and for complementing modelling. Sometimes graphics show information models miss, sometimes graphics help to make model results more understandable, and sometimes models show whether information from graphics has statistical support or not. It is the interplay of the two approaches that is v...

Hierarchical structures are omnipresent in today's society—this is reflected in the data that we collect on all aspects of this society. Hierarchical linear models allow a representation of structural levels in a statistical modeling framework. Diagnostic tools are used to assess the quality of model estimation and explore features of the data not...

Lineups [4, 28] have been established as tools for visual testing similar to standard statistical inference tests, allowing us to evaluate the validity of graphical findings in an objective manner. In simulation studies [12] lineups have been shown as being efficient: the power of visual tests is comparable to classical tests while being much less...

The multivariate spatio-temporal nature of climate data makes it difficult to draw all of the aspects simultaneously. This paper describes the conceptualization and construction of a type of display, a glyph-map, that can show these multiple aspects. Glyph-maps are a specialization of multivariate glyph plots. Each spatial location is displayed wit...

This paper develops a generalization of the scatterplot matrix based on the recognition that most data sets include both categorical and quantitative information. Traditional grids of scatterplots often obscure important features of the data when one or more variables are categorical but coded as numerical. The generalized pairs plot offers a range...

Statistical graphics play a crucial role in exploratory data analysis, model checking, and diagnosis. The lineup protocol enables statistical significance testing of visual findings, bridging the gulf between exploratory and inferential statistics. In this article, inferential methods for statistical graphics are developed further by refining the t...

We propose a new framework for visualising tables of counts, proportions and probabilities. We call our framework product plots, alluding to the computation of area as a product of height and width, and the statistical concept of generating a joint distribution from the product of conditional and marginal distributions. The framework, with extensio...

The short paper describes the major findings of the ISU Statistical Graphics working group on airline traffic in the USA. Flight volumes at major airports are increasing. Delays decreased after structural changes in 2002-2003 but have been increasing again since and delays build up during the day reaching a peak in the early evening hours. There is...

This paper describes an R package which produces tours of multivariate data. The package includes functions for creating different types of tours, including grand, guided, and little tours, which project multivariate data (p-D) down to 1, 2, 3, or, more generally, d (≤ p) dimensions. The projected data can be rendered as densities or histograms, sc...

Soy isoflavones exert inconsistent bone density-preserving effects, but the bone strength-preserving effects in humans are unknown. Our double-blind randomized controlled trial examined 2 soy isoflavone doses (80 or 120mg/d) vs placebo tablets on volumetric bone mineral density (vBMD) and strength (by means of peripheral quantitative computed tomog...

Most of the scientific journals require published microarray experiments to meet Minimum Information About a Microarray Experiment (MIAME) standards. This ensures that other researchers have the necessary information to interpret the results or reproduce them. Required MIAME information includes raw experimental data, processed data, and data proce...

How do we know if what we see is really there? When visualizing data, how do we avoid falling into the trap of apophenia where we see patterns in random noise? Traditionally, infovis has been concerned with discovering new relationships, and statistics with preventing spurious relationships from being reported. We pull these opposing poles closer w...

We hypothesized that soy isoflavones would attenuate the anticipated increase in androidal fat mass in postmenopausal women during the 36-month treatment, and thereby favorably modify the circulating cardiometabolic risk factors: triacylglycerol, LDL-C, HDL-C, glucose, insulin, uric acid, C-reactive protein, fibrinogen, and homocysteine. We collect...

For the 2006 ASA Data Exposition we created graphics that, in the legacy of John Tukey, tried to “force the unexpected upon
us” (Tukey in Proceedings of the 18th conference on design of experiments in Army research and development I, Washington,
1972). The data were geographic and meteorological measurements taken every month for 6years on a coarse...

We examined the effect of soy isoflavones on bone density and strength in healthy postmenopausal women (45.8–65.0 y). Peripheral quantitative computed tomography (pQCT) measured 3 y change in cortical bone mineral density (CtBMD), cortical thickness (CtThk), periosteal circumference (PC), endosteal circumference (EC), and strength‐strain index (SSI...

Heterologous gene transfer by viral vector systems is often limited by factors such as preexisting immunity, toxicity, low packaging capacity, or weak immunogenic potential. A novel viral vector system derived from equine herpesvirus type 1 (EHV-1) not only overcomes some of these obstacles but also promotes the robust expression of a delivered tra...

We propose to furnish visual statistical methods with an inferential framework and protocol, modelled on confirmatory statistical testing. In this framework, plots take on the role of test statistics, and human cognition the role of statistical tests. Statistical significance of 'discoveries' is measured by having the human viewer compare the plot...

Overall, technology companies increased in number, sales, prod-ucts and employees, over 1989-2003. Most of the increase can be attributed to a few types of industries, such as telecommunications, and primarily non-technological companies. On an absolute scale the east (New York, Boston) and west coasts (San Francisco, Los Angeles) dominate, but on...

This paper describes progress towards developing a platform for rapid prototyping of interactive data visualizations, using R, GGobi, rggobi and RGtk2. GGobi is a software tool for multivariate interactive graphics. At the core of GGobi is a data pipeline that incrementally transforms data through a series of stages into a plot and maps user intera...

What is a pipeline, and why do we need one for interactive graphics? This conceptual paper attempts to answer these questions,
building on previous work. A pipeline controls the transformation from data to graphical objects on our screens, and we argue
that the pipeline must be present, in some form, in all graphics software. The pipeline is made e...

We discuss methodology for multidimensional scaling (MDS) and its implementation in two software systems, GGvis and XGvis. MDS is a visualization technique for proximity data, that is, data in the form of N × N dissimilarity matrices. MDS constructs maps (“configurations,” “embeddings”) in IRk by interpreting the dissimilarities as distances. Two f...

In this chapter we consider mosaicplots, which were introduced by Hartigan and Kleiner (1981) as a way of visualizing contingency
tables. Named “mosaicplots” due to their resemblance to the art form, they consist of groups of rectangles that represent
the cells in a contingency table. Both the sizes and the positions of the rectangles are relevant...

We develop ways to predict the side chain orientations of residues within a protein structure by using several different statistical machine learning methods. Here side chain orientation of a given residue i is measured by an angle Omega(i) between the vector pointing from the center of the protein structure to the C(i)(alpha) atom and the vector p...

MetNet (http://metnetdb.org) is an emerging open-source software platform for exploration of disparate experimental data types and regulatory and metabolic networks in the context of Arabidopsis systems biology. The MetNet platform features graph visualization, interactive displays, graph theoretic computations for determining biological distances,...

This paper describes how to explore gene expression data using a combination of graphical and numerical methods. We start from the general methodology for multivariate data visualization, describing heatmaps, par-allel coordinate plots and scatterplots. We propose new methods for gene expression data analysis using direct manipulation graphics. Wit...