Jeffrey Heer's research while affiliated with Trinity Washington University and other places

Publications (186)

Preprint
Full-text available
A multiverse analysis evaluates all combinations of "reasonable" analytic decisions to promote robustness and transparency, but can lead to a combinatorial explosion of analyses to compute. Long delays before assessing results prevent users from diagnosing errors and iterating early. We contribute (1) approximation algorithms for estimating multive...
Preprint
Full-text available
The in-context learning capabilities of LLMs like GPT-3 allow annotators to customize an LLM to their specific tasks with a small number of examples. However, users tend to include only the most obvious patterns when crafting examples, resulting in underspecified in-context functions that fall short on unseen cases. Further, it is hard to know when...
Preprint
Full-text available
The many genres of narrative visualization (e.g. data comics, data videos) each offer a unique set of affordances and constraints. To better understand a genre that we call cinematic visualizations-3D visualizations that make highly deliberate use of a camera to convey a narrative-we gathered 50 examples and analyzed their traditional cinematic asp...
Preprint
Full-text available
Narrative visualization is a powerful communicative tool that can take on various formats such as interactive articles, slideshows, and data videos. These formats each have their strengths and weaknesses, but existing authoring tools only support one output target. We conducted a series of formative interviews with seven domain experts to understan...
Article
Full-text available
Large scale analysis of source code, and in particular scientific source code, holds the promise of better understanding the data science process, identifying analytical best practices, and providing insights to the builders of scientific toolkits. However, large corpora have remained unanalyzed in depth, as descriptive labels are absent and requir...
Article
Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization . In a formative content analy...
Preprint
Full-text available
Proper statistical modeling incorporates domain theory about how concepts relate and details of how data were measured. However, data analysts currently lack tool support for recording and reasoning about domain assumptions, data collection, and modeling choices in an integrated manner, leading to mistakes that can compromise scientific validity. F...
Preprint
Complex animated transitions may be easier to understand when divided into separate, consecutive stages. However, effective staging requires careful attention to both animation semantics and timing parameters. We present Gemini2, a system for creating staged animations from a sequence of chart keyframes. Given only a start state and an end state, G...
Article
Full-text available
In this crowdsourced initiative, independent analysts used the same dataset to test two hypotheses regarding the effects of scientists' gender and professional status on verbosity during group meetings. Not only the analytic approach but also the operationalizations of key variables were left unconstrained and up to individual analysts. For instanc...
Preprint
Full-text available
Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization. In a formative content analys...
Preprint
Full-text available
Counterfactual examples have been shown to be useful for many applications, including calibrating, evaluating, and explaining model decision boundaries. However, previous methods for generating such counterfactual examples have been tightly tailored to a specific application, used a limited range of linguistic patterns, or are hard to scale. We pro...
Article
Animated transitions help viewers follow changes between related visualizations. Specifying effective animations demands significant effort: authors must select the elements and properties to animate, provide transition parameters, and coordinate the timing of stages. To facilitate this process, we present Gemini, a declarative grammar and recommen...
Article
Multiverse analysis is an approach to data analysis in which all "reasonable" analytic decisions are evaluated in parallel and interpreted collectively, in order to foster robustness and transparency. However, specifying a multiverse is demanding because analysts must manage myriad variants from a cross-product of analytic decisions, and the result...
Preprint
Animated transitions help viewers follow changes between related visualizations. Specifying effective animations demands significant effort: authors must select the elements and properties to animate, provide transition parameters, and coordinate the timing of stages. To facilitate this process, we present Gemini, a declarative grammar and recommen...
Preprint
Large scale analysis of source code, and in particular scientific source code, holds the promise of better understanding the data science process, identifying analytical best practices, and providing insights to the builders of scientific toolkits. However, large corpora have remained unanalyzed in depth, as descriptive labels are absent and requir...
Preprint
Multiverse analysis is an approach to data analysis in which all "reasonable" analytic decisions are evaluated in parallel and interpreted collectively, in order to foster robustness and transparency. However, specifying a multiverse is demanding because analysts must manage myriad variants from a cross-product of analytic decisions, and the result...
Article
Visualization system designers must decide whether and how to aggregate data by default. Aggregating distributional information in a single summary mark like a mean or sum simplifies interpretation, but may lead untrained users to overlook distributional features. We ask, How are the conclusions drawn by untrained visualization users affected by ag...
Article
Gender is one of the central categories organizing children’s social world. Clear patterns of gender development have been well-documented among cisgender children (i.e., children who identify as a gender that is typically associated with their sex assigned at birth). We present a comprehensive study of gender development (e.g., gender identity and...
Preprint
How do analysis goals and context affect exploratory data analysis (EDA)? To investigate this question, we conducted semi-structured interviews with 18 data analysts. We characterize common exploration goals: profiling (assessing data quality) and discovery (gaining new insights). Though the EDA literature primarily emphasizes discovery, we observe...
Preprint
Drawing reliable inferences from data involves many, sometimes arbitrary, decisions across phases of data collection, wrangling, and modeling. As different choices can lead to diverging conclusions, understanding how researchers make analytic decisions is important for supporting robust and replicable analysis. In this study, we pore over nine publ...
Preprint
We contribute user-centered prefetching and indexing methods that provide low-latency interactions across linked visualizations, enabling cold-start exploration of billion-record datasets. We implement our methods in Falcon, a web-based system that makes principled trade-offs between latency and resolution to optimize brushing and view switching ti...
Article
An emerging generation of visualization authoring systems support expressive information visualization without textual programming. As they vary in their visualization models, system architectures, and user interfaces, it is challenging to directly compare these systems using traditional evaluative methods. Recognizing the value of contextualizing...
Preprint
Full-text available
An emerging generation of visualization authoring systems support expressive information visualization without textual programming. As they vary in their visualization models, system architectures, and user interfaces, it is challenging to directly compare these systems using traditional evaluative methods. Recognizing the value of contextualizing...
Article
Tools for Interactive Machine Learning (IML) enable end users to update models in a “rapid, focused, and incremental”—yet local—manner. In this work, we study the question of local decision making in an IML context around feature selection for a sentiment classification task. Specifically, we characterize the utility of interactive feature selectio...
Article
Journalists, educators, and technical writers are increasingly publishing interactive content on the web. However, popular analytics tools provide only coarse information about how readers interact with individual pages, and laboratory studies often fail to capture the variability of a real‐world audience. We contribute extensions to the Idyll mark...
Article
Latent spaces—reduced‐dimensionality vector space embeddings of data, fit via machine learning—have been shown to capture interesting semantic properties and support data analysis and synthesis within a domain. Interpretation of latent spaces is challenging because prior knowledge, sometimes subtle and implicit, is essential to the process. We cont...
Article
Data can be aggregated in many ways before being visualized in charts, profoundly affecting what a chart conveys. Despite this importance, the type of aggregation is often communicated only via axis titles. In this paper, we investigate the use of animation to disambiguate different types of aggregation and communicate the meaning of aggregate oper...
Article
Supporting exploratory visual analysis (EVA) is a central goal of visualization research, and yet our understanding of the process is arguably vague and piecemeal. We contribute a consistent definition of EVA through review of the relevant literature, and an empirical evaluation of existing assumptions regarding how analysts perform EVA using Table...
Conference Paper
We contribute user-centered prefetching and indexing methods that provide low-latency interactions across linked visualizations, enabling cold-start exploration of billion-record datasets. We implement our methods in Falcon, a web-based system that makes principled trade-offs between latency and resolution to optimize brushing and view switching ti...
Article
Much contemporary rhetoric regards the prospects and pitfalls of using artificial intelligence techniques to automate an increasing range of tasks, especially those once considered the purview of people alone. These accounts are often wildly optimistic, understating outstanding challenges while turning a blind eye to the human labor that undergirds...
Preprint
We contribute user-centered prefetching and indexing methods that provide low-latency interactions across linked visualizations, enabling cold-start exploration of billion-record datasets. We implement our methods in Falcon, a web-based system that makes principled trade-offs between latency and resolution to optimize brushing and view switching ti...
Conference Paper
The web has matured as a publishing platform: news outlets regularly publish rich, interactive stories while technical writers use animation and interaction to communicate complex ideas. This style of interactive media has the potential to engage a large audience and more clearly explain concepts, but is expensive and time consuming to produce. Dra...
Article
Full-text available
There exists a gap between visualization design guidelines and their application in visualization tools. While empirical studies can provide design guidance, we lack a formal framework for representing design knowledge, integrating results across studies, and applying this knowledge in automated design tools that promote effective encodings and fac...
Preprint
Full-text available
In this paper, we describe a research agenda for deriving design principles directly from data. We argue that it is time to go beyond manually curated and applied visualization design guidelines. We propose learning models of visualization design from data collected using graphical perception studies and build tools powered by the learned models. T...
Article
In addition to the choice of visual encodings, the effectiveness of a data visualization may vary with the analytical task being performed and the distribution of data values. To better assess these effects and create refined rankings of visual encodings, we conduct an experiment measuring subject performance across task types (e.g., comparing indi...
Article
Constraints enable flexible graph layout by combining the ease of automatic layout with customizations for a particular domain. However, constraint‐based layout often requires many individual constraints defined over specific nodes and node pairs. In addition to the effort of writing and maintaining a large number of similar constraints, such const...
Conference Paper
Full-text available
Understanding uncertainty is critical for many analytical tasks. One common approach is to encode data values and uncertainty values independently, using two visual variables. These resulting bivariate maps can be difficult to interpret, and interference between visual channels can reduce the discriminability of marks. To address this issue, we con...
Conference Paper
Programmers must draw explicit connections between their code and runtime state to properly assess the correctness of their programs. However, debugging tools often decouple the program state from the source code and require explicitly invoked views to bridge the rift between program editing and program understanding. To unobtrusively reveal runtim...
Conference Paper
An essential goal of quantitative color encoding is the accurate mapping of perceptual dimensions of color to the logical structure of data. Prior research identifies weaknesses of 'rainbow' colormaps and advocates for ramping in luminance, while recent work contributes multi-hue colormaps generated using perceptually-uniform color models. We contr...
Article
Visualization designers regularly use color to encode quantitative or categorical data. However, visualizations “in the wild” often violate perceptual color design principles and may only be available as bitmap images. In this work, we contribute a method to semi-automatically extract color encodings from a bitmap visualization image. Given an imag...
Article
Full-text available
We investigate how to automatically recover visual encodings from a chart image, primarily using inferred text elements. We contribute an end-to-end pipeline which takes a bitmap image as input and returns a visual encoding specification as output. We present a text analysis pipeline which detects text elements in a chart, classifies their role (e....
Conference Paper
Visual data analysis involves both open-ended and focused exploration. Manual chart specification tools support question answering, but are often tedious for early-stage exploration where systematic data coverage is needed. Visualization recommenders can encourage broad coverage, but irrelevant suggestions may distract users once they commit to spe...
Conference Paper
Observing trends and predicting future values are common tasks for viewers of bivariate data visualizations. As many charts do not explicitly include trend lines or related statistical summaries, viewers often visually estimate trends directly from a plot. How reliable are the inferences viewers draw when performing such regression by eye? Do parti...
Conference Paper
We present GraphScape, a directed graph model of the vi- sualization design space that supports automated reasoning about visualization similarity and sequencing. Graph nodes represent grammar-based chart specifications and edges rep- resent edits that transform one chart to another. We weight edges with an estimated cost of the difficulty of inter...
Conference Paper
Full-text available
Eye movement based analysis is becoming ever prevalent across domains with the commoditization of eye-tracking hardware. Eye-tracking datasets are, however, often complex and difficult to interpret and map to higher-level visual-cognitive behavior. Practitioners using eye tracking need tools to explore, characterize and quantify patterned structure...
Conference Paper
Creating effective visualizations requires domain familiarity as well as design and analysis expertise, and may impose a tedious specification process. To address these difficulties, many visualization tools complement manual specification with recommendations. However, designing interfaces, ranking metrics, and scalable recommender systems remain...
Article
Interaction is critical to effective visualization, but can be difficult to author and debug due to dependencies among input events, program state, and visual output. Recent advances leverage reactive semantics to support declarative design and avoid the “spaghetti code” of imperative event handlers. While reactive programming improves many aspects...
Conference Paper
Crowdsourcing is a common strategy for collecting the “gold standard” labels required for many natural language applications. Crowdworkers differ in their responses for many reasons, but existing approaches often treat disagreements as "noise" to be removed through filtering or aggregation. In this paper, we introduce the workflow design pattern of...
Article
Thematic maps are commonly used for visualizing the density of events in spatial data. However, these maps can mislead by giving visual prominence to known base rates (such as population densities) or to artifacts of sample size and normalization (such as outliers arising from smaller, and thus more variable, samples). In this work, we adapt Bayesi...
Article
We present Vega-Lite, a high-level grammar that enables rapid specification of interactive data visualizations. Vega-Lite combines a traditional grammar of graphics, providing visual encoding rules and a composition algebra for layered and multi-view displays, with a novel grammar of interaction. Users specify interactive semantics by composing sel...
Article
Full-text available
Models of human perception - including perceptual "laws" - can be valuable tools for deriving visualization design recommendations. However, it is important to assess the explanatory power of such models when using them to inform design. We present a secondary analysis of data previously used to rank the effectiveness of bivariate visualizations fo...
Article
We present Reactive Vega, a system architecture that provides the first robust and comprehensive treatment of declarative visual and interaction design for data visualization. Starting from a single declarative specification, Reactive Vega constructs a dataflow graph in which input data, scene graph elements, and interaction events are all treated...
Article
Full-text available
General visualization tools typically require manual specification of views: analysts must select data variables and then choose which transformations and visual encodings to apply. These decisions often involve both domain and visualization design expertise, and may impose a tedious specification process that impedes exploration. In this paper, we...
Article
The fields of artificial intelligence (AI) and human-computer interaction (HCI) are influencing each other considerably due to the impact of systems, such as Google Translate, Facebook Graph Search, and RelateIQ hide. The intersection between AI and HCI is more visible in natural language processing (NLP). Professional translators use suggestions f...
Article
This paper presents BigDAWG, a reference implementation of a new architecture for "Big Data" applications. Such applications not only call for large-scale analytics, but also for real-time streaming support, smaller analytics at interactive speeds, data visualization, and cross-storage-system queries. Guided by the principle that "one size does not...
Article
The fields of artificial intelligence (AI) and human-computer interaction (HCI) are influencing each other like never before. Widely used systems such as Google Translate, Facebook Graph Search, and RelateIQ hide the complexity of large-scale AI systems behind intuitive interfaces. But relations were not always so auspicious. The two fields emerged...
Article
Distributed database performance is often unpredictable due to issues such as system complexity, network congestion, or imbalanced data distribution. These issues are difficult for users to assess in part due to the opaque mapping between declaratively specified queries and actual physical execution plans. Database developers currently must expend...
Article
Browsing is a fundamental aspect of exploratory information-seeking. Associative browsing represents a common and intuitive set of exploratory strategies in which users step iteratively from familiar to novel bits of information. In this paper, we examine associative browsing as a strategy for bottom-up exploration of large, heterogeneous networks....
Conference Paper
Prescription drug abuse is a pressing public health issue, and people who misuse prescription drugs are turning to online forums for help. Are such forums effective? We analyze the process of opioid withdrawal, recovery and relapse on Forum77, MedHelp.org's online health forum for substance abuse recovery. Applying Prochashka's Transtheoretical Mod...
Conference Paper
Full-text available
Evaluating the effectiveness of the visual design of an interface is an important yet challenging problem. In this paper, we introduce the VERP (Visualization of Eye movements with Recurrence Plots) Explorer, a visual analysis tool for exploring eye movements during visual-cognitive tasks. The VERP Explorer couples conventional visualizations of ey...
Article
To support effective exploration, it is often stated that interactive visualizations should provide rapid response times. However, the effects of interactive latency on the process and outcomes of exploratory visual analysis have not been systematically studied. We present an experiment measuring user behavior and knowledge discovery with interacti...
Conference Paper
Full-text available
Visualization design can benefit from careful consideration of perception, as different assignments of visual encoding variables such as color, shape and size affect how viewers interpret data. In this work, we introduce perceptual kernels: distance matrices derived from aggregate perceptual judgments. Perceptual kernels represent perceptual differ...
Article
Declarative visualization grammars can accelerate development, facilitate retargeting across platforms, and allow language-level optimizations. However, existing declarative visualization languages are primarily concerned with visual encoding, and rely on imperative event handlers for interactive behaviors. In response, we introduce a model of decl...
Article
The standard approach to computer-aided language translation is post-editing: a machine generates a single translation that a human translator corrects. Recent studies have shown this simple technique to be surprisingly effective, yet it underutilizes the complementary strengths of precision-oriented humans and recall-oriented machines. We present...
Article
Objective To reliably extract two entity types, symptoms and conditions (SCs), and drugs and treatments (DTs), from patient-authored text (PAT) by learning lexico-syntactic patterns from data annotated with seed dictionaries. Background and significance Despite the increasing quantity of PAT (eg, online discussion threads), tools for identifying me...
Article
We present Lyra, an interactive environment for designing customized visualizations without writing code. Using drag-and-drop interactions, designers can bind data to the properties of graphical marks to author expressive visualization designs. Marks can be moved, rotated and resized using handles; relatively positioned using connectors; and parame...
Article
Data visualization is now a popular medium for journalistic storytelling. However, current visualization tools either lack support for storytelling or require significant technical expertise. Informed by interviews with journalists, we introduce a model of storytelling abstractions that includes state-based scene structure, dynamic annotations and...
Article
Full-text available
The authors propose visual embedding as a model for automatically generating and evaluating visualizations. A visual embedding is a function from data points to a space of visual primitives that measurably preserves structures in the data (domain) within the mapped perceptual space (range). The authors demonstrate its use with three examples: color...
Article
Thousands of people use the Internet to discuss pain symptoms. While communication between patients and physicians involves both verbal and physical interactions, online discussions of symptoms typically comprise text only. We present BodyDiagrams, an online interface for expressing symptoms via drawings and text. BodyDiagrams augment textual descr...
Article
Sociologists wishing to employ topic models in their research need a helpful guide that describes the variety of topic modeling procedures, their issues, and various means of resolving them so as to convincingly answer sociological questions. We present this overview by recounting a series of our prior collaborative projects that have employed and...
Conference Paper
Full-text available
In vitro selection and evolution is a powerful method for discovering RNA molecules based on their binding and catalysis properties. It has important applications to the study of genetic variation and molecular evolution. However, the resulting RNA sequences form a large, high-dimensional space and biologists lack adequate tools to explore and inte...
Article
We introduce an algorithm for automatic selection of semantically-resonant colors to represent data (e.g., using blue for data about “oceans”, or pink for “love”). Given a set of categorical values and a target color palette, our algorithm matches each data value with a unique color. Values are mapped to colors by collecting representative images,...