Shixia Liu

Shixia Liu
Microsoft · Internet Graphics Research Group

About

125
Publications
32,384
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,343
Citations

Publications

Publications (125)
Preprint
Sample selection improves the efficiency and effectiveness of machine learning models by providing informative and representative samples. Typically, samples can be modeled as a sample graph, where nodes are samples and edges represent their similarities. Most existing methods are based on local information, such as the training difficulty of sampl...
Article
Assigning discriminable and harmonic colors to samples according to their class labels and spatial distribution can generate attractive visualizations and facilitate data exploration. However, as the number of classes increases, it is challenging to generate a high-quality color assignment result that accommodates all classes simultaneously. A prac...
Preprint
Full-text available
The high performance of tree ensemble classifiers benefits from a large set of rules, which, in turn, makes the models hard to understand. To improve interpretability, existing methods extract a subset of rules for approximation using model reduction techniques. However, by focusing on the reduced rule set, these methods often lose fidelity and ign...
Preprint
Full-text available
Matrix reordering permutes the rows and columns of a matrix to reveal meaningful visual patterns, such as blocks that represent clusters. A comprehensive collection of matrices, along with a scoring method for measuring the quality of visual patterns in these matrices, contributes to building a benchmark. This benchmark is essential for selecting o...
Preprint
Assigning discriminable and harmonic colors to samples according to their class labels and spatial distribution can generate attractive visualizations and facilitate data exploration. However, as the number of classes increases, it is challenging to generate a high-quality color assignment result that accommodates all classes simultaneously. A prac...
Article
Full-text available
Recent studies have indicated that foundation models, such as BERT and GPT, excel at adapting to various downstream tasks. This adaptability has made them a dominant force in building artificial intelligence (AI) systems. Moreover, a new research paradigm has emerged as visualization techniques are incorporated into these models. This study divides...
Article
italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Temporal action localization aims to identify the boundaries and categories of actions in videos, such as scoring a goal in a football match. Single-frame supervision has emerged as a labor-efficient way to train action localizers as it requires only o...
Article
Existing model evaluation tools mainly focus on evaluating classification models, leaving a gap in evaluating more complex models, such as object detection. In this paper, we develop an open-source visual analysis tool, Uni-Evaluator, to support a unified model evaluation for classification, object detection, and instance segmentation in computer v...
Article
Grid visualizations are widely used in many applications to visually explain a set of data and their proximity relationships. However, existing layout methods face diffculties when dealing with the inherent cluster structures within the data. To address this issue, we propose a cluster-aware grid layout method that aims to better preserve cluster s...
Preprint
Existing model evaluation tools mainly focus on evaluating classification models, leaving a gap in evaluating more complex models, such as object detection. In this paper, we develop an open-source visual analysis tool, Uni-Evaluator, to support a unified model evaluation for classification, object detection, and instance segmentation in computer v...
Preprint
Full-text available
Grid visualizations are widely used in many applications to visually explain a set of data and their proximity relationships. However, existing layout methods face difficulties when dealing with the inherent cluster structures within the data. To address this issue, we propose a cluster-aware grid layout method that aims to better preserve cluster...
Preprint
Full-text available
When exploring time series datasets, analysts often pose "which and when" questions. For example, with world life expectancy data over one hundred years, they may inquire about the top 10 countries in life expectancy and the time period when they achieved this status, or which countries have had longer life expectancy than Ireland and when. This pa...
Article
Recent advances in artificial intelligence largely benefit from better neural network architectures. These architectures are a product of a costly process of trial-and-error. To ease this process, we develop ArchExplorer, a visual analysis method for understanding a neural architecture space and summarizing design principles. The key idea behind ou...
Preprint
Full-text available
Recent advances in artificial intelligence largely benefit from better neural network architectures. These architectures are a product of a costly process of trial-and-error. To ease this process, we develop ArchExplorer, a visual analysis method for understanding a neural architecture space and summarizing design principles. The key idea behind ou...
Article
The rapid development of deep natural language processing (NLP) models for text classification has led to an urgent need for a unified understanding of these models proposed individually. Existing methods cannot meet the need for understanding different models in one framework due to the lack of a unified measure for explaining both low-level (e.g....
Preprint
Full-text available
The rapid development of deep natural language processing (NLP) models for text classification has led to an urgent need for a unified understanding of these models proposed individually. Existing methods cannot meet the need for understanding different models in one framework due to the lack of a unified measure for explaining both low-level (e.g....
Article
The base learners and labeled samples (shots) in an ensemble few-shot classifier greatly affect the model performance. When the performance is not satisfactory, it is usually difficult to understand the underlying causes and make improvements. To tackle this issue, we propose a visual analysis method, FSLDiagnotor. Given a set of base learners and...
Preprint
The base learners and labeled samples (shots) in an ensemble few-shot classifier greatly affect the model performance. When the performance is not satisfactory, it is usually difficult to understand the underlying causes and make improvements. To tackle this issue, we propose a visual analysis method, FSLDiagnotor. Given a set of base learners and...
Article
As training high-performance object detectors requires expensive bounding box annotations, recent methods resort to freeavailable image captions. However, detectors trained on caption supervision perform poorly because captions are usually noisy and cannot provide precise location information. To tackle this issue, we present a visual analysis meth...
Preprint
Dimensionality Reduction (DR) techniques can generate 2D projections and enable visual exploration of cluster structures of high-dimensional datasets. However, different DR techniques would yield various patterns, which significantly affect the performance of visual cluster analysis tasks. We present the results of a user study that investigates th...
Article
Dimensionality Reduction (DR) techniques can generate 2D projections and enable visual exploration of cluster structures of high-dimensional datasets. However, different DR techniques would yield various patterns, which significantly affect the performance of visual cluster analysis tasks. We present the results of a user study that investigates th...
Article
Semi-supervised learning (SSL) provides a way to improve the performance of prediction models (e.g., classifier) via the usage of unlabeled samples. An effective and widely used method is to construct a graph that describes the relationship between labeled and unlabeled samples. Practical experience indicates that graph quality significantly affect...
Article
Full-text available
Visual analytics for machine learning has recently evolved as one of the most exciting areas in the field of visualization. To better identify which research topics are promising and to learn how to apply relevant techniques in visual analytics, we systematically review 259 papers published in the last ten years together with representative works b...
Article
Given a scatterplot with tens of thousands of points or even more, a natural question is which sampling method should be used to create a small but “good” scatterplot for a better abstraction. We present the results of a user study that investigates the influence of different sampling strategies on multi-class scatterplots. The main goal of this st...
Preprint
Hierarchical clustering is an important technique to organize big data for exploratory data analysis. However, existing one-size-fits-all hierarchical clustering methods often fail to meet the diverse needs of different users. To address this challenge, we present an interactive steering method to visually supervise constrained hierarchical cluster...
Article
Full-text available
Sampling is a widely used graph reduction technique to accelerate graph computations and simplify graph visualizations. By comprehensively analyzing the literature on graph sampling, we assume that existing algorithms cannot effectively preserve minority structures that are rare and small in a graph but are very important in graph analysis. In this...
Preprint
Visual analytics for machine learning has recently evolved as one of the most exciting areas in the field of visualization. To better identify which research topics are promising and to learn how to apply relevant techniques in visual analytics, we systematically review 259 papers published in the last ten years together with representative works b...
Preprint
Concept drift is a phenomenon in which the distribution of a data stream changes over time in unforeseen ways, causing prediction models built on historical data to become inaccurate. While a variety of automated methods have been developed to identify when concept drift occurs, there is limited support for analysts who need to understand and corre...
Article
Full-text available
Ensemble feature selection algorithms aggregate the results of multiple feature selection methods in order to select an effective subset of features. However, typically, ensemble algorithms treat each feature selection method equally and do not consider performance differences. Consequently, features selected by a relatively smaller number of metho...
Article
Hierarchical clustering is an important technique to organize big data for exploratory data analysis. However, existing one-size-fits-all hierarchical clustering methods often fail to meet the diverse needs of different users. To address this challenge, we present an interactive steering method to visually supervise constrained hierarchical cluster...
Article
One major cause of performance degradation in predictive models is that the test samples are not well covered by the training data. Such not well-represented samples are called OoD samples. In this paper, we propose OoDAnalyzer, a visual analysis approach for interactively identifying OoD samples and explaining them in context. Our approach integra...
Preprint
One major cause of performance degradation in predictive models is that the test samples are not well covered by the training data. Such not well-represented samples are called OoD samples. In this paper, we propose OoDAnalyzer, a visual analysis approach for interactively identifying OoD samples and explaining them in context. Our approach integra...
Preprint
Adversarial examples, generated by adding small but intentionally imperceptible perturbations to normal examples, can mislead deep neural networks (DNNs) to make incorrect predictions. Although much work has been done on both adversarial attack and defense, a fine-grained understanding of adversarial examples is still lacking. To address this issue...
Article
Adversarial examples, generated by adding small but intentionally imperceptible perturbations to normal examples, can mislead deep neural networks (DNNs) to make incorrect predictions. Although much work has been done on both adversarial attack and defense, a fine-grained understanding of adversarial examples is still lacking. To address this issue...
Article
Full-text available
Data quality management, especially data cleansing, has been extensively studied for many years in the areas of data management and visual analytics. In the paper, we first review and explore the relevant work from the research areas of data management, visual analytics and human-computer interaction. Then for different types of data such as multim...
Article
Interactive machine learning (IML) is an iterative learning process that tightly couples a human with a machine learner, which is widely used by researchers and practitioners to effectively solve a wide variety of real-world application problems. Although recent years have witnessed the proliferation of IML in the field of visual analytics, most re...
Preprint
Interactive Machine Learning (IML) is an iterative learning process that tightly couples a human with a machine learner, which is widely used by researchers and practitioners to effectively solve a wide variety of real-world application problems. Although recent years have witnessed the proliferation of IML in the field of visual analytics, most re...
Preprint
Deep neural networks (DNNs) are vulnerable to maliciously generated adversarial examples. These examples are intentionally designed by making imperceptible perturbations and often mislead a DNN into making an incorrect prediction. This phenomenon means that there is significant risk in applying DNNs to safety-critical applications, such as driverle...
Article
In order to effectively infer correct labels from noisy crowdsourced annotations, learning-from-crowds models have introduced expert validation. However, little research has been done on facilitating the validation procedure. In this paper, we propose an interactive method to assist experts in verifying uncertain instance labels and unreliable work...
Article
Visual text analytics has recently emerged as one of the most prominent topics in both academic research and the commercial world. To provide an overview of the relevant techniques and analysis tasks, as well as the relationships between them, we comprehensively analyzed 263 visualization papers and 4,346 mining papers published between 1992-2017 i...
Article
We present a visual analysis method for interactively recomposing a large number of photos based on example photos with high-quality composition. The recomposition method is formulated as a matching problem between photos. The key to this formulation is a new metric for accurately measuring the composition distance between photos. We have also deve...
Article
Analyzing social streams is important for many applications, such as crisis management. However, the considerable diversity, increasing volume, and high dynamics of social streams of large events continue to be significant challenges that must be overcome to ensure effective exploration. We propose a novel framework by which to handle complex socia...
Article
Full-text available
Tree boosting, which combines weak learners (typically decision trees) to generate a strong learner, is a highly effective and widely used machine learning method. However, the development of a high performance tree boosting model is a time-consuming process that requires numerous trial-and-error experiments. To tackle this issue, we have developed...
Article
Among the many types of deep models, deep generative models (DGMs) provide a solution to the important problem of unsupervised and semi-supervised learning. However, training DGMs requires more skill, experience, and know-how because their training is more complex than other types of deep models such as convolutional neural networks (CNNs). We deve...
Conference Paper
Although several effective learning-from-crowd methods have been developed to infer correct labels from noisy crowdsourced labels, a method for post-processed expert validation is still needed. This paper introduces a semi-supervised learning algorithm that is capable of selecting the most informative instances and maximizing the influence of exper...
Article
Nested Chinese Restaurant Process (nCRP) topic models are powerful nonparametric Bayesian methods to extract a topic hierarchy from a given text corpus, where the hierarchical structure is automatically determined by the data. Hierarchical Latent Dirichlet Allocation (hLDA) is a popular instance of nCRP topic models. However, hLDA has only been eva...
Article
Full-text available
Interactive model analysis, the process of understanding, diagnosing, and refining a machine learning model with the help of interactive visualization, is very important for users to efficiently solve real-world artificial intelligence and data mining problems. Dramatic advances in big data analytics has led to a wide variety of interactive model a...
Article
The digital traces of U.S. members of congress on Twitter enable researchers to observe how these public officials interact with one another in a direct and unobtrusive manner. Using data from Twitter and other sources (e.g., roll-call vote data), this study aims to examine how members of congress connect and communicate with one another on Twitter...
Article
The papers in this special section were presenteda at the 2015 IEEE Pacific Visualization Symposium (PacificVis’15) that was held in Hangzhou from April 14 to 17, 2015.
Article
Due to the huge popularity of microblogging services, microblogs have become important sources of customer opinions. Sentiment analysis systems can provide useful knowledge to decision support systems and decision makers by aggregating and summarizing the opinions in massive microblogs automatically. The most important component of sentiment analys...
Article
Deep convolutional neural networks (CNNs) have achieved breakthrough performance in many pattern recognition tasks such as image classification. However, the development of high-quality deep models typically relies on a substantial amount of trial-and-error, as there is still no clear understanding of when and why a deep model works. In this paper,...
Conference Paper
Dynamic topic models (DTMs) are very effective in discovering topics and capturing their evolution trends in time series data. To do posterior inference of DTMs, existing methods are all batch algorithms that scan the full dataset before each update of the model and make inexact variational approximations with mean-field assumptions. Due to a lack...
Conference Paper
Zoomable user interfaces are widely used in static visualizations and have many benefits. However, they are not well supported in animated visualizations due to problems such as change blindness and information overload. We propose the spot-tracking lens, a new zoomable user interface for animated bubble charts, to tackle these problems. It couples...
Article
Dynamic topic models (DTMs) are very effective in discovering topics and capturing their evolution trends in time series data. To do posterior inference of DTMs, existing methods are all batch algorithms that scan the full dataset before each update of the model and make inexact variational approximations with mean-field assumptions. Due to a lack...
Article
This paper presents a visual analytics approach to analyzing a full picture of relevant topics discussed in multiple sources, such as news, blogs, or micro-blogs. The full picture consists of a number of common topics covered by multiple sources, as well as distinctive topics from each source. Our approach models each textual corpus as a topic grap...
Article
In many applications, ideas that are described by a set of words often flow between different groups. To facilitate users in analyzing the flow, we present a method to model the flow behaviors that aims at identifying the lead-lag relationships between word clusters of different user groups. In particular, an improved Bayesian conditional cointegra...
Article
We present an online visual analytics approach to helping users explore and understand hierarchical topic evolution in high-volume text streams. The key idea behind this approach is to identify representative topics in incoming documents and align them with the existing representative topics that they immediately follow (in time). To this end, we l...
Article
Although there has been a great deal of interest in analyzing customer opinions and breaking news in microblogs, progress has been hampered by the lack of an effective mechanism to discover and retrieve data of interest from microblogs. To address this problem, we have developed an uncertainty-aware visual analytics approach to retrieve salient pos...
Article
This installment of Computer's series highlighting the work published in IEEE Computer Society journals comes from IEEE Transactions on Visualization and Computer Graphics. The Web extra at http://youtu.be/E1PVTitj7h0 is a video demonstration of a novel solution to multivariate data visualization that helps users interactively explore data by combi...
Article
A high-quality label layout is critical for effective information understanding and consumption. Existing labeling methods fail to help users quickly gain an overview of visualized data when the number of labels is large. Visual clutter is a major challenge preventing these methods from being applied to real-world applications. To address this, we...
Article
In this paper, we study a challenging problem of deriving a taxonomy from a set of keyword phrases. A solution can benefit many real-world applications because i) keywords give users the flexibility and ease to characterize a specific domain; and ii) in many applications, such as online advertisements, the domain of interest is already represented...
Article
We present a visual analytics approach to developing a full picture of relevant topics discussed in multiple sources such as news, blogs, or micro-blogs. The full picture consists of a number of common topics among multiple sources as well as distinctive topics. The key idea behind our approach is to jointly match the topics extracted from each sou...
Article
Identifying which text corpus leads in the context of a topic presents a great challenge of considerable interest to researchers. Recent research into lead-lag analysis has mainly focused on estimating the overall leads and lags between two corpora. However, real-world applications have a dire need to understand lead-lag patterns both globally and...
Article
Recursive neural network models have achieved promising results in many natural language processing tasks. The main difference among these models lies in the composition function, i.e., how to obtain the vector representation for a phrase or sentence using the representations of words it contains. This paper introduces a novel Adaptive Multi-Compos...
Article
The huge amount of user log data collected by search engine providers creates new opportunities to understand user loyalty and defection behavior at an unprecedented scale. However, this also poses a great challenge to analyze the behavior and glean insights into the complex, large data. In this paper, we introduce LoyalTracker, a visual analytics...
Article
Using a sequence of topic trees to organize documents is a popular way to represent hierarchical and evolving topics in text corpora. However, following evolving topics in the context of topic trees remains difficult for users. To address this issue, we present an interactive visual text analysis approach to allow users to progressively explore and...
Article
It is important for many different applications such as government and business intelligence to analyze and explore the diffusion of public opinions on social media. However, the rapid propagation and great diversity of public opinions on social media pose great challenges to effective analysis of opinion diffusion. In this paper, we introduce a vi...
Article
Full-text available
Cooperation and competition (jointly called “coopetition”) are two modes of interactions among a set of concurrent topics on social media. How do topics cooperate or compete with each other to gain public attention? Which topics tend to cooperate or compete with one another? Who plays the key role in coopetition-related interactions? We answer thes...
Conference Paper
We present a system to analyze user interests by analyzing their online behaviors from large-scale usage logs. We surmise that user interests can be characterized by a large collection of features we call the behavioral genes that can be deduced from both their explicit and implicit online behaviors. It is the goal of this research to sequence the...
Patent
A layout method in a display area for multiple graph components in a dynamic network. The multiple graph components are sorted according to their importance into a first subset S1 and a second subset S2 according to the order of their importance. The first subset S1 is divided into an upper subset Cp including only the most important graph componen...
Conference Paper
Research into social network analysis has shown that graph metrics, such as degree and closeness, are often used to summarize structural changes in a dynamic graph. However there have been few visual analytics approaches that have been proposed to help analysts study graph evolutions in the context of graph metrics. In this paper, we present a nove...
Article
Information visualization (InfoVis), the study of transforming data, information, and knowledge into interactive visual representations, is very important to users because it provides mental models of information. The boom in big data analytics has triggered broad use of InfoVis in a variety of domains, ranging from finance to sports to politics. I...
Article
We present an evolutionary multi-branch tree clustering method to model hierarchical topics and their evolutionary patterns over time. The method builds evolutionary trees in a Bayesian online filtering framework. The tree construction is formulated as an online posterior estimation problem, which well balances both the fitness of the current tree...
Article
Full-text available
How do various topics compete for public attention when they are spreading on social media? What roles do opinion leaders play in the rise and fall of competitiveness of various topics? In this study, we propose an expanded topic competition model to characterize the competition for public attention on multiple topics promoted by various opinion le...
Article
Storyline visualizations, which are useful in many applications, aim to illustrate the dynamic relationships between entities in a story. However, the growing complexity and scalability of stories pose great challenges for existing approaches. In this paper, we propose an efficient optimization approach to generating an aesthetically appealing stor...
Conference Paper
Correlated topical trend detection is very useful in analyzing public and social media influence. In this paper, we propose an algorithm that can both detect the correlation and discover the corresponding keywords that trigger the correlation. To detect the correlation, we use a projection vector to project two text streams onto the same space, and...
Conference Paper
Understanding topic hierarchies in text streams and their evolution patterns over time is very important in many applications. In this paper, we propose an evolutionary multi-branch tree clustering method for streaming text data. We build evolutionary trees in a Bayesian online filtering framework. The tree construction is formulated as an online p...
Article
In this paper, we propose a novel constrained coclustering method to achieve two goals. First, we combine information-theoretic coclustering and constrained clustering to improve clustering performance. Second, we adopt both supervised and unsupervised constraints to demonstrate the effectiveness of our algorithm. The unsupervised constraints are a...
Patent
Full-text available
A method and an apparatus for animating transition among a dynamic graph series including a preceding graph and a succeeding graph which are temporally sequentially displayed. The method includes: selecting an unchanged node which is the nearest to a removed node and merging through animation the removed node with the unchanged node, thereby animat...
Conference Paper
We are building a topic-based, interactive visual analytic tool that aids users in analyzing large collections of text. To help users quickly discover content evolution and significant content transitions within a topic over time, here we present a novel, constraint-based approach to temporal topic segmentation. Our solution splits a discovered top...