Bum Chul Kwon

Bum Chul Kwon
IBM Research Cambridge MA · Computational Healthcare

Ph.D.

About

89
Publications
26,223
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,155
Citations
Introduction
I am Bum Chul Kwon (“BC”), a researcher of data visualization and visual analytics at the Healthcare Analytics Research Group in IBM Research. My research area includes visual analytics, data visualization, human-computer interaction, healthcare, and machine learning. My research goal is to enhance users’ abilities to derive knowledge from various forms of data through development of interactive visual analytics systems.
Additional affiliations
January 2019 - present
University of California, Berkeley
Position
  • Lecturer
Description
  • The goal of the course is to learn theories, methods, and techniques to read and use data visualization for given problems. The course aims to teach students how to turn data into visual representations.
August 2018 - present
University of Pennsylvania
Position
  • Lecturer
Description
  • The course teaches visual representation methods and techniques for understanding complex data. It also covers the fundamentals of perception, theories of data visualization, and good design practices. I provided hands-on experience in the data communication process, from analysis to crafting visualizations, using R, Tableau, Python, Vega, and d3.js.
May 2017 - present
Columbia University
Position
  • Lecturer
Description
  • The goal of the course is to an overview of theories, methods and techniques of data visualization. The course provides students with the tools for creating effective visual representations of data.

Publications

Publications (89)
Article
Full-text available
We present Roslingifier, a data-driven storytelling method for animated scatterplots. Like its namesake, Hans Rosling (1948--2017), a professor of public health and a spellbinding public speaker, Roslingifier turns a sequence of entities changing over time---such as countries and continents with their demographic data---into an engaging narrative t...
Article
Full-text available
Traditional deep learning interpretability methods which are suitable for model users cannot explain network behaviors at the global level and are inflexible at providing fine-grained explanations. As a solution, concept-based explanations are gaining attention due to their human intuitiveness and their flexibility to describe both global and local...
Article
Full-text available
As online news increasingly include data journalism, there is a corresponding increase in the incorporation of visualization in article thumbnail images. However, little research exists on the design rationale for visualization thumbnails, such as resizing, cropping, simplifying, and embellishing charts that appear within the body of the associated...
Preprint
Full-text available
Pre-trained transformer-based language models are becoming increasingly popular due to their exceptional performance on various benchmarks. However, concerns persist regarding the presence of hidden biases within these models, which can lead to discriminatory outcomes and reinforce harmful stereotypes. To address this issue, we propose Finspector,...
Conference Paper
Full-text available
Causal inference is a statistical paradigm for quantifying causal effects using observational data. It is a complex process, requiring multiple steps, iterations, and collaborations with domain experts. Analysts often rely on visualizations to evaluate the accuracy of each step. However, existing visualization toolkits are not designed to support t...
Preprint
Full-text available
As online news increasingly include data journalism, there is a corresponding increase in the incorporation of visualization in article thumbnail images. However, little research exists on the design rationale for visualization thumbnails, such as resizing, cropping, simplifying, and embellishing charts that appear within the body of the associated...
Preprint
Full-text available
Large Language Models (LLMs) have gained widespread popularity due to their ability to perform ad-hoc Natural Language Processing (NLP) tasks with a simple natural language prompt. Part of the appeal for LLMs is their approachability to the general public, including individuals with no prior technical experience in NLP techniques. However, natural...
Preprint
Full-text available
Causal inference is a statistical paradigm for quantifying causal effects using observational data. It is a complex process, requiring multiple steps, iterations, and collaborations with domain experts. Analysts often rely on visualizations to evaluate the accuracy of each step. However, existing visualization toolkits are not designed to support t...
Conference Paper
Full-text available
Disease risk models can identify high-risk patients and help clinicians provide more personalized care. However, risk models developed on one dataset may not generalize across diverse subpopulations of patients in different datasets and may have unexpected performance. It is challenging for clinical researchers to inspect risk models across differe...
Article
Full-text available
Our previous data-driven analysis of evolving patterns of islet autoantibodies (IAbs) against insulin (IAA), glutamic acid decarboxylase (GADA) and islet antigen 2 (IA-2A) discovered three trajectories characterized by either multiple IAbs (TR1), IAA (TR2), or GADA (TR3) as the first appearing autoantibodies. Here we examined the evolution of IAb l...
Preprint
Full-text available
Image classification models often learn to predict a class based on irrelevant co-occurrences between input features and an output class in training data. We call the unwanted correlations "data biases," and the visual features causing data biases "bias factors." It is challenging to identify and mitigate biases automatically without human interven...
Preprint
Full-text available
Disease risk models can identify high-risk patients and help clinicians provide more personalized care. However, risk models developed on one dataset may not generalize across diverse subpopulations of patients in different datasets and may have unexpected performance. It is challenging for clinical researchers to inspect risk models across differe...
Article
Full-text available
Prediction models are commonly used to estimate risk for cardiovascular diseases, to inform diagnosis and management. However, performance may vary substantially across relevant subgroups of the population. Here we investigated heterogeneity of accuracy and fairness metrics across a variety of subgroups for risk prediction of two common diseases: a...
Conference Paper
Full-text available
Figure 1: An overview of DASH: (A) Projection View shows the latent space representation of images using t-SNE; (B) Mosaic View summarizes the performance differences between two previously trained classifiers; (C) Trace View shows how the two classifiers predict individual images differently (red: incorrect to correct, blue: correct to incorrect);...
Article
Our previous data-driven analysis from five large-scale prospective studies discovered three trajectories (TR1, TR2, and TR3) composed of latent states for evolving patterns of islet autoantibodies (IAbs) : IAA, GADA and IA-2A. Here we examined the evolution of IAb levels within these trajectories for 2145 IAb positive participants, followed from e...
Article
Full-text available
Rapid advances in artificial intelligence (AI) and availability of biological, medical, and healthcare data have enabled the development of a wide variety of models. Significant success has been achieved in a wide range of fields, such as genomics, protein folding, disease diagnosis, imaging, and clinical tasks. Although widely used, the inherent o...
Preprint
Full-text available
Coordinated Multiple views (CMVs) are a visualization technique that simultaneously presents multiple visualizations in separate but linked views. There are many studies that report the advantages (e.g., usefulness for finding hidden relationships) and disadvantages (e.g., cognitive load) of CMVs. But little empirical work exists on the impact of t...
Preprint
Traditional deep learning interpretability methods which are suitable for non-expert users cannot explain network behaviors at the global level and are inflexible at providing fine-grained explanations. As a solution, concept-based explanations are gaining attention due to their human intuitiveness and their flexibility to describe both global and...
Article
Full-text available
Development of islet autoimmunity precedes the onset of type 1 diabetes in children, however, the presence of autoantibodies does not necessarily lead to manifest disease and the onset of clinical symptoms is hard to predict. Here we show, by longitudinal sampling of islet autoantibodies (IAb) to insulin, glutamic acid decarboxylase and islet antig...
Article
Full-text available
Background Polygenic scores—which quantify inherited risk by integrating information from many common sites of DNA variation—may enable a tailored approach to clinical medicine. However, alongside considerable enthusiasm, we and others have highlighted a lack of standardized approaches for score disclosure. Here, we review the landscape of polygeni...
Preprint
Full-text available
Prediction models are commonly used to estimate risk for cardiovascular diseases; however, performance may vary substantially across relevant subgroups of the population. Here we investigated the variability of performance and fairness across a variety of subgroups for risk prediction of two common diseases, atherosclerotic cardiovascular disease (...
Preprint
Full-text available
Background: Polygenic scores - which quantify inherited risk by integrating information from many common sites of DNA variation - may enable a tailored approach to clinical medicine. However, alongside considerable enthusiasm, we and others have highlighted a lack of systematic approaches for score disclosure. Here, we review the landscape of polyg...
Article
Full-text available
Analyzing disease progression patterns can provide useful insights into the disease processes of many chronic conditions. These analyses may help inform recruitment for prevention trials or the development and personalization of treatments for those affected. We learn disease progression patterns using Hidden Markov Models (HMM) and distill them in...
Article
Full-text available
Background Diet-tracking mobile apps have gained increased interest from both academic and clinical fields. However, quantity-focused diet tracking (eg, calorie counting) can be time-consuming and tedious, leading to unsustained adoption. Diet quality—focusing on high-quality dietary patterns rather than quantifying diet into calories—has shown eff...
Preprint
Full-text available
Analyzing disease progression patterns can provide useful insights into the disease processes of many chronic conditions. These analyses may help inform recruitment for prevention trials or the development and personalization of treatments for those affected. We learn disease progression patterns using Hidden Markov Models (HMM) and distill them in...
Preprint
Full-text available
A goal of clinical researchers is to understand the progression of a disease through a set of biomarkers. Researchers often conduct observational studies, where they collect numerous samples from selected subjects throughout multiple years. Hidden Markov Models (HMMs) can be applied to discover latent states and their transition probabilities over...
Poster
Full-text available
In this paper, we explore the use of interactive visualization to explain a classifier that predicts risk scores of disease related adverse events for patients computed from electronic health records. We discuss challenges and opportunities for explainable AI techniques for such problems and propose interactive visualization methods to involve clin...
Preprint
BACKGROUND Diet-tracking mobile apps have gained increased interest from both academic and clinical fields. However, quantity-focused diet tracking (eg, calorie counting) can be time-consuming and tedious, leading to unsustained adoption. Diet quality—focusing on high-quality dietary patterns rather than quantifying diet into calories—has shown eff...
Article
We investigated evolution of islet autoantibodies (IAs) prior to onset of T1D from 5 large-scale birth cohort studies. Our analysis revealed three distinct IA trajectories leading up to diagnosis of T1D. Of 24673 children from five prospective studies (DAISY, DiPiS, DIPP, DEW-IT, and BABYDIAB), 688 who were diagnosed with T1D and had 3 or more visi...
Article
Full-text available
Clinical researchers use disease progression models to understand patient status and characterize progression patterns from longitudinal health records. One approach for disease progression modeling is to describe patient status using a small number of states that represent distinctive distributions over a set of observed measures. Hidden Markov mo...
Preprint
Full-text available
Diet-tracking mobile apps have been effective in behavior change. At the same time, quantity-focused diet tracking (e.g., calorie counting) can be time-consuming and tedious, leading to unsustained adoption. Diet Quality—focusing on high-quality dietary patterns rather than quantifying diet into calories—has shown effectiveness in improving heart d...
Preprint
Full-text available
Users may face challenges while designing graphical user interfaces, due to a lack of relevant experience and guidance. This paper aims to investigate the issues that users with no experience face during the design process, and how to resolve them. To this end, we conducted semi-structured interviews, based on which we built a GUI prototyping assis...
Book
This two-volume set of LNCS 12509 and 12510 constitutes the refereed proceedings of the 15th International Symposium on Visual Computing, ISVC 2020, which was supposed to be held in San Diego, CA, USA in October 2020, took place virtually instead due to the COVID-19 pandemic. The 114 full and 4 short papers presented in these volumes were carefully...
Preprint
Full-text available
Biologists often perform clustering analysis to derive meaningful patterns, relationships, and structures from data instances and attributes. Though clustering plays a pivotal role in biologists' data exploration, it takes non-trivial efforts for biologists to find the best grouping in their data using existing tools. Visual cluster analysis is cur...
Preprint
Full-text available
Attention networks, a deep neural network architecture inspired by humans' attention mechanism, have seen significant success in image captioning, machine translation, and many other applications. Recently, they have been further evolved into an advanced approach called multi-head self-attention networks, which can encode a set of input vectors, e....
Preprint
Full-text available
When people browse online news, small thumbnail images accompanying links to articles attract their attention and help them to decide which articles to read. As an increasing proportion of online news can be construed as data journalism, we have witnessed a corresponding increase in the incorporation of visualization in article thumbnails. However,...
Preprint
Full-text available
Clinical researchers use disease progression modeling algorithms to predict future patient status and characterize progression patterns. One approach for disease progression modeling is to describe patient status using a small number of states that represent distinctive distributions over a set of observed measures. Hidden Markov models (HMMs) and...
Article
Full-text available
One of the ultimate goals of studies on visualization literacy is to improve users' visualization literacy through education and training. Even though users' cognitive characteristics may significantly affect learning and responding processes in general, studies have addressed the relationships between users' cognitive characteristics and visualiza...
Patent
Full-text available
The current invention relates to a system for manipulating data sets. More specifically, the current invention relates to an interactive system allowing an expert to seamlessly filter complex databases as a tool to evaluate hypotheses. When longitudinal data are concerned, the system allows the user to sketch a chart illustrating the trend of a var...
Article
We have recently seen many successful applications of recurrent neural networks (RNNs) on electronic medical records (EMRs), which contain histories of patients' diagnoses, medications, and other various events, in order to predict the current and future states of patients. Despite the strong performance of RNNs, it is often challenging for users t...
Preprint
Full-text available
In the past decade, we have seen many successful applications of recurrent neural networks (RNNs) on electronic medical records (EMRs), which contain histories of patients' diagnoses, medications, and other various events, in order to predict the current and future states of patients. Despite the strong performance of RNNs, it is often very challen...
Article
Full-text available
Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While...
Article
Full-text available
Background While online health social networks (OHSNs) serve as an effective platform for patients to fulfill their various social support needs, predicting the needs of users and providing tailored information remains a challenge. Objective The objective of this study was to discriminate important features for identifying users’ social support ne...
Article
Full-text available
The Information Visualization community has begun to pay attention to visualization literacy; however, researchers still lack instruments for measuring the visualization literacy of users. In order to address this gap, we systematically developed a visualization literacy assessment test (VLAT), especially for non-expert users in data visualization,...
Article
Full-text available
Visual analytics techniques help users explore high-dimensional data. However, it is often challenging for users to express their domain knowledge in order to steer the underlying data model, especially when they have little attribute-level knowledge. Furthermore, users' complex, high-level domain knowledge, compared to low-level attributes, posits...
Article
Full-text available
The ability to scale interactive visual analysis to massive datasets is becoming increasingly important. We make a case for sampling as an essential tool for scalable interactive visual analysis. We first outline prior work by the database community on sampling for visualization of “aggregation queries” and then consider how these results might be...
Conference Paper
Full-text available
Exploring event sequences in big data is challenging. Though many mining algorithms have been developed to derive the most frequently occurring and the most meaningful sequential patterns, it is yet difficult to make sense of the results. To tackle the problem, we introduce a visual analytics approach , Peekquence. In this paper, we describe the de...
Conference Paper
Full-text available
Recently, deep learning has gained exceptional popularity due to its outstanding performances in many machine learning and artificial intelligence applications. Among various deep learning models, convolutional neural network (CNN) is one of the representative models that solved various complex tasks in computer vision since AlexNet, a widely-used...
Conference Paper
Full-text available
As visualizations are increasingly used as a storytelling medium for the general public, it becomes important to help people learn how to understand visualizations. Prior studies indicate that interactive multimedia learning environments can increase the effectiveness of learning [11]. To investigate the efficacy of the multimedia learning environm...
Article
Full-text available
Comparative Case Analysis (CCA) is an important tool for criminal investigation and crime theory extraction. It analyzes the commonalities and differences between a collection of crime reports in order to understand crime patterns and identify abnormal cases. A big challenge of CCA is the data processing and exploration. Traditional manual approach...
Article
Full-text available
Through online health communities (OHCs), patients and caregivers exchange their illness experiences and strategies for overcoming the illness, and provide emotional support. To facilitate healthy and lively conversations in these communities, their members should be continuously monitored and nurtured by OHC administrators. The main challenge of O...
Article
Biologists are keen to understand how processes in cells react to environmental changes. Differential gene expression analysis allows biologists to explore functions of genes with data generated from different environments. However, these data and analysis lead to unique challenges since tasks are ill-defined, require implicit domain knowledge, com...
Article
Full-text available
Semi-automatic text analysis involves manual inspection of text. Often, different text annotations (like part-of-speech or named entities) are indicated by using distinctive text highlighting techniques. In typesetting there exist well-known formatting conventions, such as bold typeface, italics, or background coloring, that are useful for highligh...
Article
Full-text available
Visual analytics supports humans in generating knowledge from large and often complex datasets. Evidence is collected, collated and cross-linked with our existing knowledge. In the process, a myriad of analytical and visualisation techniques are employed to generate a visual representation of the data. These often introduce their own uncertainties,...
Conference Paper
Full-text available
The visual exploration of large data spaces often requires zooming and panning operations to obtain details. However, drilling down to see details results in the loss of contextual overview. Existing overview-plus-detail approaches provide context while the user examines details, but typically suffer from distortion or overplotting. This is why, th...
Article
Full-text available
Online consumer reviews have become a substantial component of e-commerce and provide online shoppers with abundant information about products. However, previous studies provided mixed results about whether consumers experience information overload from such a vast volume of reviews. Thus, this study investigates how users perceive products dependi...
Article
Full-text available
As online health communities (OHCs) grow, users find it challenging to properly search, read, and contribute to the community because of its overwhelming content. Our goal is to understand OHC users' needs and requirements for better delivering large-scale OHC content. We interviewed 14 OHC users with interests in diabetes to investigate their atti...
Article
Full-text available
Healthcare information systems (e.g., Bar Code Medication Administration [BCMA] system) have been adopted to deliver efficient healthcare services recently. However, though it is seemingly simple to use (scanning barcodes before medication), users of the BCMA system (e.g., nurses and pharmacists) often show noncompliance behaviors. Therefore, the g...
Conference Paper
Full-text available
When exploring large spatial datasets, zooming and panning interactions often lead to the loss of contextual overview. Existing overview-plus-detail approaches allow users to view context while inspecting details, but they often suffer from distortion or overplotting. In this paper, we present an off-screen visualization method called Ambient Grids...