Jeffrey Heer’s research while affiliated with University of Washington and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (203)


Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows
  • Article

February 2025

·

6 Reads

·

7 Citations

ACM Transactions on Computer-Human Interaction

Madeleine Grunde-McLaughlin

·

Michelle S. Lam

·

·

[...]

·

Jeffrey Heer

DracoGPT: Extracting Visualization Design Preferences from Large Language Models

September 2024

·

25 Reads

·

7 Citations

IEEE Transactions on Visualization and Computer Graphics

Trained on vast corpora, Large Language Models (LLMs) have the potential to encode visualization design knowledge and best practices. However, if they fail to do so, they might provide unreliable visualization recommendations. What visualization design preferences, then, have LLMs learned? We contribute DracoGPT, a method for extracting, modeling, and assessing visualization design preferences from LLMs. To assess varied tasks, we develop two pipelines—DracoGPT-Rank and DracoGPT-Recommend—to model LLMs prompted to either rank or recommend visual encoding specifications. We use Draco as a shared knowledge base in which to represent LLM design preferences and compare them to best practices from empirical research. We demonstrate that DracoGPT can accurately model the preferences expressed by LLMs, enabling analysis in terms of Draco design constraints. Across a suite of backing LLMs, we find that DracoGPT-Rank and DracoGPT-Recommend moderately agree with each other, but both substantially diverge from guidelines drawn from human subjects experiments. Future work can build on our approach to expand Draco's knowledge base to model a richer set of preferences and to provide a robust and cost-effective stand-in for LLMs.


Mixing Linters with GUIs: A Color Palette Design Probe

September 2024

·

5 Reads

·

2 Citations

IEEE Transactions on Visualization and Computer Graphics

Visualization linters are end-user facing evaluators that automatically identify potential chart issues. These spell-checker like systems offer a blend of interpretability and customization that is not found in other forms of automated assistance. However, existing linters do not model context and have primarily targeted users who do not need assistance, resulting in obvious—even annoying—advice. We investigate these issues within the domain of color palette design, which serves as a microcosm of visualization design concerns. We contribute a GUI-based color palette linter as a design probe that covers perception, accessibility, context, and other design criteria, and use it to explore visual explanations, integrated fixes, and user defined linting rules. Through a formative interview study and theory-driven analysis, we find that linters can be meaningfully integrated into graphical contexts thereby addressing many of their core issues. We discuss implications for integrating linters into visualization tools, developing improved assertion languages, and supporting end-user tunable advice—all laying the groundwork for more effective visualization linters in any context.


BLADE: Benchmarking Language Model Agents for Data-Driven Science
  • Preprint
  • File available

August 2024

·

19 Reads

Data-driven scientific discovery requires the iterative integration of scientific domain knowledge, statistical expertise, and an understanding of data semantics to make nuanced analytical decisions, e.g., about which variables, transformations, and statistical models to consider. LM-based agents equipped with planning, memory, and code execution capabilities have the potential to support data-driven science. However, evaluating agents on such open-ended tasks is challenging due to multiple valid approaches, partially correct steps, and different ways to express the same decisions. To address these challenges, we present BLADE, a benchmark to automatically evaluate agents' multifaceted approaches to open-ended research questions. BLADE consists of 12 datasets and research questions drawn from existing scientific literature, with ground truth collected from independent analyses by expert data scientists and researchers. To automatically evaluate agent responses, we developed corresponding computational methods to match different representations of analyses to this ground truth. Though language models possess considerable world knowledge, our evaluation shows that they are often limited to basic analyses. However, agents capable of interacting with the underlying data demonstrate improved, but still non-optimal, diversity in their analytical decision making. Our work enables the evaluation of agents for data-driven science and provides researchers deeper insights into agents' analysis approaches.

Download

DracoGPT: Extracting Visualization Design Preferences from Large Language Models

August 2024

·

49 Reads

Trained on vast corpora, Large Language Models (LLMs) have the potential to encode visualization design knowledge and best practices. However, if they fail to do so, they might provide unreliable visualization recommendations. What visualization design preferences, then, have LLMs learned? We contribute DracoGPT, a method for extracting, modeling, and assessing visualization design preferences from LLMs. To assess varied tasks, we develop two pipelines--DracoGPT-Rank and DracoGPT-Recommend--to model LLMs prompted to either rank or recommend visual encoding specifications. We use Draco as a shared knowledge base in which to represent LLM design preferences and compare them to best practices from empirical research. We demonstrate that DracoGPT can accurately model the preferences expressed by LLMs, enabling analysis in terms of Draco design constraints. Across a suite of backing LLMs, we find that DracoGPT-Rank and DracoGPT-Recommend moderately agree with each other, but both substantially diverge from guidelines drawn from human subjects experiments. Future work can build on our approach to expand Draco's knowledge base to model a richer set of preferences and to provide a robust and cost-effective stand-in for LLMs.


Fig. 7: PALETTELINT lints can be configured or created by filling out a form within the COLOR BUDDY UI.
Fig. 11: Modifying a lint to check only the first 6 colors in a palette.
Mixing Linters with GUIs: A Color Palette Design Probe

July 2024

·

26 Reads

Visualization linters are end-user facing evaluators that automatically identify potential chart issues. These spell-checker like systems offer a blend of interpretability and customization that is not found in other forms of automated assistance. However, existing linters do not model context and have primarily targeted users who do not need assistance, resulting in obvious -- even annoying -- advice. We investigate these issues within the domain of color palette design, which serves as a microcosm of visualization design concerns. We contribute a GUI-based color palette linter as a design probe that covers perception, accessibility, context, and other design criteria, and use it to explore visual explanations, integrated fixes, and user defined linting rules. Through a formative interview study and theory-driven analysis, we find that linters can be meaningfully integrated into graphical contexts thereby addressing many of their core issues. We discuss implications for integrating linters into visualization tools, developing improved assertion languages, and supporting end-user tunable advice -- all laying the groundwork for more effective visualization linters in any context.






Citations (75)


... To alleviate the hallucination problem, retrievalaugmented generation (RAG) [13] improves accuracy and relevance, particularly in fields such as scientific research [35]. Additionally, crowdsourcing-inspired workflows have been proposed to optimize LLM chains for better performance [15]. These works motivated us to investigate large models in simulating human ratings and inspired the design of our agent-based procedure. ...

Reference:

Do Language Model Agents Align with Humans in Rating Visualizations? An Empirical Study
Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows
  • Citing Article
  • February 2025

ACM Transactions on Computer-Human Interaction

... Previous work proposed benchmarks that focus on individual steps of such a pipeline, such as code generation given fine-grained instructions [9,10,11,12], text-to-sql [13,14], or data analytics and modeling [15,16,17]. We provide an overview of the features evaluated by existing benchmarks and by our proposed benchmark in Table 1. ...

BLADE: Benchmarking Language Model Agents for Data-Driven Science
  • Citing Conference Paper
  • January 2024

... With the recent advancement, AI-centric visualization techniques [31,36] have grown dramatically and spread through most subfields of visualization, e.g., visualization education [11], visualization design/recommendation [9,29,28,6,32,22], compression [17], and super-resolution [8]. Moreover, the visual-understanding capabilities of modern MLLMs have sparked studies for evaluating how well these models interpret visualization [34,5,14,27,33]. ...

DracoGPT: Extracting Visualization Design Preferences from Large Language Models
  • Citing Article
  • September 2024

IEEE Transactions on Visualization and Computer Graphics

... undefined variables), and domain-specific issues (such as certain operations being slow [71]). Recent work has explored generalizing the idea of a linter out of code and into other domains, such as visualization [10], spreadsheets [4], and color palettes [46]. ...

Mixing Linters with GUIs: A Color Palette Design Probe
  • Citing Article
  • September 2024

IEEE Transactions on Visualization and Computer Graphics

... However, future analyses on larger datasets may benefit from fully or partially automating this process. Recent advances provide sophisticated methods for extracting interpretable, high-level concepts from unstructured text (Pacheco et al. 2023;Lam et al. 2024). ...

Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM
  • Citing Conference Paper
  • May 2024

... They serve multiple possible end users: those who program directly in the languages and those who use the languages as "backends" of interactive systems. Most related to this work are the DSLs that HCI researchers have developed for other data-related tasks, including data visualization (e.g., Vega, Vega-lite [34]), analytical-hypothesis expression [38], and statistical analysis [22,24,25]. In this paper, we emphasize the development of a grammar of experimental assignment because it forms the basis of the PLanet language. ...

rTisane: Externalizing conceptual models for data analysis prompts reconsideration of domain assumptions and facilitates statistical modeling
  • Citing Conference Paper
  • May 2024

... On the programming side, the biggest impact comes from autocomplete-focused assistance through tools like CoPilot [23], which has been extensively studied [7,19,51] and shown to serve both "recognition-over-recall" [49] and epistemic (e.g., "oh, I didn't know about list comprehension in Python!") goals [7]. Other work has explored how LLMs can be used in the service of design processes for creative coding [5], data analysis [24,36,42,47], and even, circularly, for the assessment of LLM outputs themselves [55]. ...

How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz Study
  • Citing Conference Paper
  • May 2024

... CReBot [42] interactively asks section-level critical thinking questions, catering to both experienced researchers and novices. Highlighttext [49] explores effective text highlighting methods, while Living Papers [16] introduces a grammar for augmenting scientific articles. SCIM [11] enables researchers to quickly skim papers, and Vogel et al. [25] show that highlighting a limited number of keywords improves comprehension. ...

Living Papers: A Language Toolkit for Augmented Scholarly Communication
  • Citing Conference Paper
  • October 2023

... A broad range of systems meet these definitions. Horowitz and Heer [14] note that live and rich tools and systems face challenges with inter-operation; in particular, problems arise when trying to inter-operate with the "tools and environments in the outside [non-live, rich] world. " Our system provides a new way to attach these tools to typical codebases. ...

Engraft: An API for Live, Rich, and Composable Programming
  • Citing Conference Paper
  • October 2023

... Achieving this balance requires an effective mechanism for computing and organizing data facts-one that clarifies user intent and generates a focused list of insights based on interactions. User intent behind interactions has been studied across various contexts [YKSJ07,SH24], such as analysts' selection behaviors in scatterplots [GGC * 21] and annotations in grouped bar charts [RQSR24]. We extend this understanding by examining intent behind "calling out" visual elements, i.e., "When users highlight data points in a chart, what insights or facts are they likely to incorporate into their narrative or discussion?" ...

DIVI: Dynamically Interactive Visualization
  • Citing Article
  • October 2023

IEEE Transactions on Visualization and Computer Graphics