Danielle Albers SzafirUniversity of North Carolina at Chapel Hill | UNC
Danielle Albers Szafir
Ph.D. in Computer Sciences
About
95
Publications
10,424
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,762
Citations
Introduction
My research bridges visualization and perception to drive the design of novel visualization systems for large, complex datasets. I derive quantified insight into the role of perceptual phenomena in interpreting visualizations by gauging how real viewers in natural environments perceive encoded information to design visualizations that overcome scalability and interpretability limitations in existing designs. My work addresses research problems in domains ranging from biology to the humanities.
Additional affiliations
September 2009 - August 2015
Education
May 2011 - May 2015
September 2009 - May 2011
September 2007 - December 2009
Publications
Publications (95)
Ensemble coding supports rapid extraction of visual statistics about distributed visual information. Researchers typically study this ability with the goal of drawing conclusions about how such coding extracts information from natural scenes. Here we argue that a second domain can serve as another strong inspiration for understanding ensemble codin...
Color is frequently used to encode values in visualizations. For color encodings to be effective, the mapping between colors and values must preserve important differences in the data. However, most guidelines for effective color choice in visualization are based on either color perceptions measured using large, uniform fields in optimal viewing en...
his work describes a first step towards the creation of an engineering model for the perception of color difference as a function of size. Our approach is to non-uniformly scale CIELAB using data from crowdsourced experiments, such as those run on Amazon Mechanical Turk. In such experiments, the inevitable variations in viewing conditions reflect t...
Many visualization tasks require the viewer to make judgments about aggregate properties of data. Recent work has shown that viewers can perform such tasks effectively, for example to efficiently compare the maximums or means over ranges of data. However, this work also shows that such effectiveness depends on the designs of the displays. In this p...
Physical therapy (PT) plays a crucial role in muscle injury recovery, but people struggle to adhere to and perform PT exercises correctly from home. To support challenges faced with in-home PT, augmented reality (AR) holds promise in enhancing patient's engagement and accuracy through immersive interactive visualizations. However, effectively lever...
Shape is commonly used to distinguish between categories in multi-class scatterplots. However, existing guidelines for choosing effective shape palettes rely largely on intuition and do not consider how these needs may change as the number of categories increases. Unlike color, shapes can not be represented by a numerical space, making it difficult...
Annotations play a vital role in highlighting critical aspects of visualizations, aiding in data externalization and exploration, collaborative sensemaking, and visual storytelling. However, despite their widespread use, we identified a lack of a design space for common practices for annotations. In this paper, we evaluated over 1,800 static annota...
Annotations are an essential part of data analysis and communication in visualizations, which focus a readers attention on critical visual elements (e.g. an arrow that emphasizes a downward trend in a bar chart). Annotations enhance comprehension, mental organization, memorability, user engagement, and interaction and are crucial for data externali...
Individuals with Intellectual and Developmental Disabilities (IDD) have unique needs and challenges when working with data. While visualization aims to make data more accessible to a broad audience, our understanding of how to design cognitively accessible visualizations remains limited. In this study, we engaged 20 participants with IDD as co-desi...
Shape is commonly used to distinguish between categories in multi-class scatterplots. However, existing guidelines for choosing effective shape palettes rely largely on intuition and do not consider how these needs may change as the number of categories increases. Although shapes can be a finite number compared to colors, they can not be represente...
In the rapidly evolving field of information visualization, rigorous evaluation is essential for validating new techniques, understanding user interactions, and demonstrating the effectiveness and usability of visualizations. Faithful evaluations provide valuable insights into how users interact with and perceive the system, enabling designers to i...
The increasing ubiquity of data in everyday life has elevated the importance of data literacy and accessible data representations, particularly for individuals with disabilities. While prior research predominantly focuses on the needs of the visually impaired, our survey aims to broaden this scope by investigating accessible data representations ac...
Visual clustering is a common perceptual task in scatterplots that supports diverse analytics tasks (e.g., cluster identification). However, even with the same scatterplot, the ways of perceiving clusters (i.e., conducting visual clustering) can differ due to the differences among individuals and ambiguous cluster boundaries. Although such perceptu...
Interaction is critical for data analysis and sensemaking. However, designing interactive physicalizations is challenging as it requires cross-disciplinary knowledge in visualization, fabrication, and electronics. Interactive physicalizations are typically produced in an unstructured manner, resulting in unique solutions for a specific dataset, pro...
To create effective data visualizations, it helps to represent data using visual features in intuitive ways. When visualization designs match observer expectations, visualizations are easier to interpret. Prior work suggests that several factors influence such expectations. For example, the dark-is-more bias leads observers to infer that darker col...
Interaction is critical for data analysis and sensemaking. However, designing interactive physicalizations is challenging as it requires cross-disciplinary knowledge in visualization, fabrication, and electronics. Interactive physicalizations are typically produced in an unstructured manner, resulting in unique solutions for a specific dataset, pro...
Visual clustering is a common perceptual task in scatterplots that supports diverse analytics tasks (e.g., cluster identification). However, even with the same scatterplot, the ways of perceiving clusters (i.e., conducting visual clustering) can differ due to the differences among individuals and ambiguous cluster boundaries. Although such perceptu...
Annotations are a vital component of data externalization and collaborative analysis, directing readers' attention to important visual elements. Therefore, it is crucial to understand their design space for effectively annotating visualizations. However, despite their widespread use in visualization, we have identified a lack of a design space for...
Some 15 years ago, Visualization Viewpoints published an influential article titled Rainbow Color Map (Still) Considered Harmful (Borland and Taylor, 2007). The paper argued that the "rainbow colormap's characteristics of confusing the viewer, obscuring the data and actively misleading interpretation make it a poor choice for visualization." Subseq...
Scatterplots commonly use color to encode categorical data. However, as datasets increase in size and complexity, the efficacy of these channels may vary. Designers lack insight into how robust different design choices are to variations in category numbers. This paper presents a crowdsourced experiment measuring how the number of categories and cho...
Data is everywhere but may not be accessible to everyone. Conventional data visualization tools and guidelines often do not actively consider the specific needs and abilities of people with Intellectual and Developmental Disabilities (IDD), leaving them excluded from data-driven activities and vulnerable to ethical issues. To understand the needs a...
Fostering data visualization literacy (DVL) as part of childhood education could lead to a more data literate society. However, most work in DVL for children relies on a more formal educational context (i.e., a teacher-led approach) that limits children's engagement with data to classroom-based environments and, consequently, children's ability to...
Interpretive scholars generate knowledge from text corpora by manually sampling documents, applying codes, and refining and collating codes into categories until meaningful themes emerge. Given a large corpus, machine learning could help scale this data sampling and analysis, but prior research shows that experts are generally concerned about algor...
Fostering data visualization literacy (DVL) as part of childhood education could lead to a more data literate society. However, most work in DVL for children relies on a more formal educational context (i.e., a teacher-led approach) that limits children's engagement with data to classroom-based environments and, consequently, children's ability to...
Designing a data physicalization requires a myriad of different considerations. Despite the cross-disciplinary nature of these considerations, research currently lacks a synthesis across the different communities data physicalization sits upon, including their approaches, theories, and even terminologies. To bridge these communities synergistically...
Designing a data physicalization requires a myriad of different considerations. Despite the cross-disciplinary nature of these considerations, research currently lacks a synthesis across the different communities data physicalization sits upon, including their approaches, theories, and even terminologies. To bridge these communities synergistically...
Problem-driven visualization work is rooted in deeply understanding the data, actors, processes, and workflows of a target domain. However, an individual's personality traits and cognitive abilities may also influence visualization use. Diverse user needs and abilities raise natural questions for specificity in visualization design:
Could individu...
Scatterplots can encode a third dimension by using additional channels like size or color (e.g. bubble charts). We explore a potential misinterpretation of trivariate scatterplots, which we call the
weighted average illusion
, where locations of larger and darker points are given more weight toward x- and y-mean estimates. This systematic bias is...
Scatterplots can encode a third dimension by using additional channels like size or color (e.g. bubble charts). We explore a potential misinterpretation of trivariate scatterplots, which we call the weighted average illusion, where locations of larger and darker points are given more weight toward x- and y-mean estimates. This systematic bias is se...
Problem-driven visualization work is rooted in deeply understanding the data, actors, processes, and workflows of a target domain. However, an individual's personality traits and cognitive abilities may also influence visualization use. Diverse user needs and abilities raise natural questions for specificity in visualization design: Could individua...
Immersive Analytics is a quickly evolving field that unites several areas such as visualisation, immersive environments, and human-computer interaction to support human data analysis with emerging technologies. This research has thrived over the past years with multiple workshops, seminars, and a growing body of publications, spanning several confe...
This work examines the problem of fusing human operator observations with probabilistic information extracted by an automated data fusion system, in the context of dynamic multitarget track characterization for large-scale surveillance. This soft data fusion problem is challenging because human operator observation errors are difficult to calibrate...
Our world is a complex ecosystem of interdependent processes. Geoscientists collect individual datasets addressing hyperspecific questions, which seek to probe these deeply intertwined processes. Scientists are beginning to explore how investigating relationships between disciplines can foster richer and more holistic research, but visualization to...
Our world is a complex ecosystem of interdependent processes. Geoscientists collect individual datasets addressing hyperspecific questions which seek to probe these deeply intertwined processes. Scientists are beginning to explore how investigating relationships between disciplines can foster richer and more holistic research, but visualization too...
A growing number of efforts aim to understand what people see when using a visualization. These efforts provide scientific grounding to complement design intuitions, leading to more effective visualization practice. However, published visualization research currently reflects a limited set of available methods for understanding how people process v...
Color mapping is a foundational technique for visualizing scalar data. Prior literature offers guidelines for effective colormap design, such as emphasizing luminance variation while limiting changes in hue. However, empirical studies of color are largely focused on perceptual tasks. This narrow focus inhibits our understanding of how generalizable...
A growing number of efforts aim to understand what people see when using a visualization. These efforts provide scientific grounding to complement design intuitions, leading to more effective visualization practice. However, published visualization research currently reflects a limited set of available methods for understanding how people process v...
Augmented reality (AR) blends physical and virtual components to create a mixed reality experience. This unique display medium presents new opportunities for application design, as applications can move beyond the desktop and integrate with the physical environment. In order to build effective applications for AR displays, we need to be able to ite...
We investigate how different active learning (AL) query policies coupled with classification uncertainty visualizations affect analyst trust in automated classification systems. A current standard policy for AL is to query the oracle (e.g., the analyst) to refine labels for datapoints where the classifier has the highest uncertainty. This is an opt...
Visualizations often encode numeric data using sequential and diverging color ramps. Effective ramps use colors that are sufficiently discriminable, align well with the data, and are aesthetically pleasing. Designers rely on years of experience to create high-quality color ramps. However, it is challenging for novice visualization developers that l...
Data collection and analysis in the field is critical for operations in domains such as environmental science and public safety. However, field workers currently face data- and platform-oriented issues in efficient data collection and analysis in the field, such as limited connectivity, screen space, and attentional resources. In this paper, we exp...
Visualization research has paid little attention to individuals with intellectual developmental disabilities (IDDs). This lack of attention is problematic due to the fact that the consumption of visualization relies on a significant number of cognitive processes, including the ability to read and process language and retain information, and these p...
Data collection and analysis in the field is critical for operations in domains such as environmental science and public safety. However, field workers currently face data- and platform-oriented issues in efficient data collection and analysis in the field, such as limited connectivity, screen space, and attentional resources. In this paper, we exp...
Visualizations often encode numeric data using sequential and diverging color ramps. Effective ramps use colors that are sufficiently discriminable, align well with the data, and are aesthetically pleasing. Designers rely on years of experience to create high-quality color ramps. However, it is challenging for novice visualization developers that l...
Promoting a wider range of contribution types can facilitate healthy growth of the visualization community, while increasing the intellectual diversity of visualization research papers. In this paper, we discuss the importance of contribution types and summarize contribution types that can be meaningful in visualization research. We also propose se...
This work examines Twitter discussion surrounding the 2015 outbreak of Zika, a virus that is most often mild but has been associated with serious birth defects and neurological syndromes. We introduce and analyze a collection of 3.9 million tweets mentioning Zika geolocated to North and South America, where the virus is most prevalent. Using a mult...
Scatterplots commonly use multiple visual channels to encode multivariate datasets. Such visualizations often use size, shape, and color as these dimensions are considered separable--dimensions represented by one channel do not significantly interfere with viewers' abilities to perceive data in another. However, recent work shows the size of marks...
Many real-world datasets are incomplete due to factors such as data collection failures or misalignments between fused datasets. Visualizations of incomplete datasets should allow analysts to draw conclusions from their data while effectively reasoning about the quality of the data and resulting conclusions. We conducted a pair of crowdsourced stud...
Data summarization allows analysts to explore datasets that may be too complex or too large to visualize in detail. Designers face a number of design and implementation choices when using summarization in visual analytics systems. While these choices influence the utility of the resulting system, there are no clear guidelines for the use of these s...
Augmented reality (AR) applications can leverage the full space of an environment to create immersive experiences. However, most empirical studies of interaction in AR focus on interactions with
objects close to the user, generally within arms reach. As objects move farther away, the efficacy and usability of different interaction modalities may ch...
Museum directors are often faced with the challenge of engaging users in the museum experience while preserving the intentions of exhibit content. Designing exhibits for children can heighten the tension between these sometimes competing goals. Working with the University of Colorado Museum of Natural History, we designed and implemented Metamorpho...
Patterns of words used in different text collections can characterize interesting properties of a corpus. However, these patterns are challenging to explore as they often involve complex relationships across many words and collections in a large space of words. In this paper, we propose a configurable colorfield design to aid this exploration. Our...
Email
Print
Request Permissions
Color is a common channel for displaying data in surface visualization, but is affected by the shadows and shading used to convey surface depth and shape. Understanding encoded data in the context of surface structure is critical for effective analysis in a variety of domains, such as in molecular biology. In the phy...
In this position paper, we enumerate two approaches to the evaluation of visualizations which are associated with two approaches to knowledge formation in science: reductionism, which holds that the understanding of complex phenomena is based on the understanding of simpler components; and holism, which states that complex phenomena have characteri...
CIELAB is commonly used in design as it provides a sim-ple method for approximating color difference. However, these approximations model color perception under laboratory conditions, with correctly calibrated displays and carefully constrained viewing environments that are not reflective of complexity of viewing conditions encountered in the real...
This work describes a first step towards the creation of an engineering model for the perception of color difference as a function of size. Our approach is to non-uniformly rescale CIELAB using data from crowdsourced experiments, such as those run on Amazon Mechanical Turk. In such experiments, the inevitable variations in viewing conditions reflec...
When faced with a large collection of objects, our visual system can extract statistical properties of the collection. Most studies of this ability focus on judgments of mean value, and typically focus on a constrained set of dimensions (e.g., size, location, or facial properties). We tested a variety of judgments in addition to means - deviation,...
Many bioinformatics applications construct classifiers that are validated in experiments that compare their results to known ground truth over a corpus. In this paper, we introduce an approach for exploring the results of such classifier validation experiments, focusing on classifiers for regions of molecular surfaces. We provide a tool that allows...