Conference Paper

Arrangement of Low-Dimensional Parallel Coordinate Plots for High-Dimensional Data Visualization

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Multidimensional data visualization is an important research topic that has been receiving increasing attention. Several techniques that use parallel coordinate plots have been proposed to represent all dimensions of data in a single display space. In addition, several other techniques that apply scatter plot matrices have been proposed to represent multidimensional data as a collection of low-dimensional data visualization spaces. Typically, when using the latter approach it is easier to understand relations among particular dimensions, but it is often difficult to observe relations between dimensions separated into different visualization spaces. This paper presents a framework for displaying an arrangement of low-dimensional data visualization spaces that are generated from high-dimensional datasets. Our proposed technique first divides the dimensions of the input datasets into groups of lower dimensions based on their correlations or other relationships. If the groups of lower dimensions can be visualized in independent rectangular spaces, our technique packs the set of low-dimensional data visualizations into a single display space. Because our technique places relevant low-dimensions closer together in the display space, it is easier to visually compare relevant sets of low-dimensional data visualizations. In this paper, we describe in detail how we implement our framework using parallel coordinate plots, and present several results demonstrating its effectiveness.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Several techniques [3] [33] have attempted to address this last issue by only displaying PCPs for subsets of the dimensions that are highly correlated. Given the example dataset and situation specified above, these techniques would construct two PCPs: one displaying a, b, and c, and the other displaying b and d. ...
... applied multiple PCPs to represent time-varying multidimensional data [3]. Suematsu et al. [33] also converted high-dimensional datasets into low-dimensional subsets and visualized these subsets using multiple PCPs arranged on display spaces based upon their similarity and correlation. Using similar ideas, Zheng et al. [42] selected SPs based upon the meaningfulness of the dimensions being displayed and adjusted their layout based upon their similarity. ...
Preprint
Parallel coordinate plots (PCPs) are among the most useful techniques for the visualization and exploration of high-dimensional data spaces. They are especially useful for the representation of correlations among the dimensions, which identify relationships and interdependencies between variables. However, within these high-dimensional spaces, PCPs face difficulties in displaying the correlation between combinations of dimensions and generally require additional display space as the number of dimensions increases. In this paper, we present a new technique for high-dimensional data visualization in which a set of low-dimensional PCPs are interactively constructed by sampling user-selected subsets of the high-dimensional data space. In our technique, we first construct a graph visualization of sets of well-correlated dimensions. Users observe this graph and are able to interactively select the dimensions by sampling from its cliques, thereby dynamically specifying the most relevant lower dimensional data to be used for the construction of focused PCPs. Our interactive sampling overcomes the shortcomings of the PCPs by enabling the visualization of the most meaningful dimensions (i.e., the most relevant information) from high-dimensional spaces. We demonstrate the effectiveness of our technique through two case studies, where we show that the proposed interactive low-dimensional space constructions were pivotal for visualizing the high-dimensional data and discovering new patterns.
... Previous research has proposed exploratory techniques to enhance the visualization of multidimensional data. Within the last 20 years researches focused on Techniques to reduce the number of poly-lines or reducing or reordering the parallel axes [8] [9]. This paper introduces novel techniques for reordering the factors of the data based on the correlation coefficient calculations. ...
... Other papers proposed new methods for interpreting the readiness of the parallel coordinate by dividing the dimensions of the datasets input into groups of lower dimensions based on the correlations calculations; the conclusion of this technique can represent various groups of correlated dimensions in high dimensional data space [8]. ...
... Several techniques [3] [33] have attempted to address this last issue by only displaying PCPs for subsets of the dimensions that are highly correlated. Given the example dataset and situation specified above, these techniques would construct two PCPs: one displaying a, b, and c, and the other displaying b and d. ...
... applied multiple PCPs to represent time-varying multidimensional data [3]. Suematsu et al. [33] also converted high-dimensional datasets into low-dimensional subsets and visualized these subsets using multiple PCPs arranged on display spaces based upon their similarity and correlation. Using similar ideas, Zheng et al. [42] selected SPs based upon the meaningfulness of the dimensions being displayed and adjusted their layout based upon their similarity. ...
Article
Parallel coordinate plots (PCPs) are among the most useful techniques for the visualization and exploration of high-dimensional data spaces. They are especially useful for the representation of correlations among the dimensions, which identify relationships and interdependencies between variables. However, within these high-dimensional spaces, PCPs face difficulties in displaying the correlation between combinations of dimensions and generally require additional display space as the number of dimensions increases. In this paper, we present a new technique for high-dimensional data visualization in which a set of low-dimensional PCPs are interactively constructed by sampling user-selected subsets of the high-dimensional data space. In our technique, we first construct a graph visualization of sets of well-correlated dimensions. Users observe this graph and are able to interactively select the dimensions by sampling from its cliques, thereby dynamically specifying the most relevant lower dimensional data to be used for the construction of focused PCPs. Our interactive sampling overcomes the shortcomings of the PCPs by enabling the visualization of the most meaningful dimensions (i.e., the most relevant information) from high-dimensional spaces. We demonstrate the effectiveness of our technique through two case studies, where we show that the proposed interactive low-dimensional space constructions were pivotal for visualizing the high-dimensional data and discovering new patterns.
... Parallel coordinate graph is a visual method for problems with multiple attributes [6]. A row of data in the dataset is represented by a broken line in the parallel coordinate graph. ...
... Parallel coordinate graph is a visual method for problems with multiple attributes [6] . A row of data in the dataset is represented by a broken line in the parallel coordinate graph. ...
... Claessen et al. [2] visualized high-dimensional datasets by representing a set of low-dimensional subspaces as a combination of PCPs and scatterplots. Suematsu et al. [15] and Zheng et al. [22] also converted high-dimensional datasets into low-dimensional subsets and visualized these subsets using multiple PCPs or scatterplots respectively. These techniques did not provide rich interaction mechanisms to freely select the numbers of dimensions. ...
Preprint
Scatterplot selection is an effective approach to represent essential portions of multidimensional data in a limited display space. Various metrics for evaluation of scatterplots such as scagnostics have been presented and applied to scatterplot selection. This paper presents a new scatterplot selection technique that applies multiple metrics. The technique firstly calculates scores of scatterplots with multiple metrics and then constructs a graph by connecting similar scatterplots. The technique applies a graph coloring problem so that different colors are assigned to similar scatterplots. We can extract a set of various scatterplots by selecting them that the specific same color is assigned. This paper introduces visualization examples with a retail dataset containing multidimensional climate and sales values.
Article
Multidimensional data visualization is one of the most active research topics in information visualization since various information in our daily life forms multidimensional datasets. Scatterplot selection is an effective approach to represent essential portions of multidimensional data in a limited display space. Various metrics for evaluating scatterplots, such as scagnostics, have been applied to scatterplot selection. One of the open problems of this research topic is that various scatterplots cannot be selected if we simply apply one of the metrics. In other words, we may want to apply multiple metrics simultaneously in a balanced manner when we want to select a variety of scatterplots. This paper presents a new scatterplot selection technique that solves this problem. First, the technique calculates the scores of scatterplots with multiple metrics and then constructs a graph by connecting pairs of scatterplots that have similar scores. Next, it uses a graph coloring algorithm to assign different colors to scatterplots that have similar scores. We can extract a set of various scatterplots by selecting them that the specific same color is assigned. This paper introduces two case studies: the former study is with a retail transaction dataset while the latter study is with a design optimization dataset.
Book
This book is devoted to the emerging field of integrated visual knowledge discovery that combines advances in artificial intelligence/machine learning and visualization/visual analytic. A long-standing challenge of artificial intelligence (AI) and machine learning (ML) is explaining models to humans, especially for live-critical applications like health care. A model explanation is fundamentally human activity, not only an algorithmic one. As current deep learning studies demonstrate, it makes the paradigm based on the visual methods critically important to address this challenge. In general, visual approaches are critical for discovering explainable high-dimensional patterns in all types in high-dimensional data offering "n-D glasses," where preserving high-dimensional data properties and relations in visualizations is a major challenge. The current progress opens a fantastic opportunity in this domain. This book is a collection of 25 extended works of over 70 scholars presented at AI and visual analytics related symposia at the recent International Information Visualization Conferences with the goal of moving this integration to the next level. The sections of this book cover integrated systems, supervised learning, unsupervised learning, optimization, and evaluation of visualizations. The intended audience for this collection includes those developing and using emerging AI/machine learning and visualization methods. Scientists, practitioners, and students can find multiple examples of the current integration of AI/machine learning and visualization for visual knowledge discovery. The book provides a vision of future directions in this domain. New researchers will find here an inspiration to join the profession and to be involved for further development. Instructors in AI/ML and visualization classes can use it as a supplementary source in their undergraduate and graduate classes.
Chapter
Strategic foresight, corporate foresight, and technology management enable firms to detect discontinuous changes early and develop future courses for a more sophisticated market positioning. The enhancements in machine learning and artificial intelligence allow more automatic detection of early trends to create future courses and make strategic decisions. Visual Analytics combines methods of automated data analysis through machine learning methods and interactive visualizations. It enables a far better way to gather insights from a vast amount of data to make a strategic decision. While Visual Analytics got various models and approaches to enable strategic decision-making, the analysis of trends is still a matter of research. The forecasting approaches and involvement of humans in the visual trend analysis process require further investigation that will lead to sophisticated analytical methods. We introduce in this paper a novel model of Visual Analytics for decision-making, particularly for technology management, through early trends from scientific publications. We combine Corporate Foresight and Visual Analytics and propose a machine learning-based Technology Roadmapping based on our previous work. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Chapter
There have been a variety of techniques on multi-dimension data visualization techniques. Scatterplot matrix and parallel coordinate plots are typical and famous ones, but they have a problem that they may need a very large screen space when a dataset has a large number of dimensions. We suppose there are two types of solutions for this problem: one of them is to effectively select interesting sets of dimensions, and the other is to develop drawing techniques that realize comprehensive representations in the limited screen space. Based on this direction, we propose a method for selecting important scatterplots from all scatterplots generated from input datasets and for drawing the scatterplots as “outliers” and “regions enclosing non-outlier plots.” The technique is useful for users to determine whether to delete outliers from the datasets and form mathematical models of non-outlier plots. This chapter extends the authors’ conference paper by additionally introducing an example with a design optimization dataset and the user evaluation of this technique.
Article
In a variety of research and application areas, graphs are an important structure for data modeling and analysis. While graph properties can have a crucial influence on the performance of graph algorithms, and thus on the outcome of experiments, often only basic analysis of the graphs under investigation in an experimental evaluation is performed and a few characteristics are reported in publications. We present Graph Landscape, a concept for the visual analysis of graph set properties. The Graph Landscape aims to support researchers to explore graphs and graph sets regarding their properties, to allow to select good experimental test sets, analyze newly generated sets, compare sets and assess the validity (or range) of experimental results and corresponding conclusions. Graphical Abstract Graphical Abstract text
Conference Paper
In a variety of research and application areas graphs are an important structure for data modeling and analysis. While graph properties can have a crucial influence on the performance of graph algorithms, and thus on the outcome of experiments, often only basic analysis of the graphs under investigation in an experimental evaluation is performed, and a few characteristics are reported in publications. We present Graph Landscape, a concept for the visual analysis of graph set properties. The Graph Landscape aims to support researchers to explore graphs and graph sets regarding their properties, in order to allow to select good experimental test sets, analyze newly generated sets, compare sets and assess the validity (or range) of experimental results and corresponding conclusions.
Article
Full-text available
Progressive refinement is a methodology that makes it possible to elegantly integrate scalable data compression, access, and presentation into one approach. Specifically, this paper concerns the effective use of progressive parallel coordinates (PPCs), utilized routinely for high-dimensional data visualization. It discusses how the power of the typical stages of progressive data visualization can also be utilized fully for PPCs. Further, different implementations of the underlying methods and potential application domains are described. The paper also presents empirical results concerning the benefits of PPC with regard to efficient data management and improved presentation, indicating that the proposed approach is able to close the gap between data handling and visualization.
Article
Full-text available
The navigation of high-dimensional data spaces remains challenging, making multivariate data exploration difficult. To be effective and appealing for mainstream application, navigation should use paradigms and metaphors that users are already familiar with. One such intuitive navigation paradigm is interactive route planning on a connected network. We have employed such an interface and have paired it with a prominent high-dimensional visualization paradigm showing the N-D data in undistorted raw form: parallel coordinates. In our network interface, the dimensions form nodes that are connected by a network of edges representing the strength of association between dimensions. A user then interactively specifies nodes/edges to visit, and the system computes an optimal route, which can be further edited and manipulated. In our interface, this route is captured by a parallel coordinate data display in which the dimension ordering is configured by the specified route. Our framework serves both as a data exploration environment and as an interactive presentation platform to demonstrate, explain, and justify any identified relationships to others. We demonstrate our interface within a business scenario and other applications.
Conference Paper
Full-text available
Many graphs used in real-world applications consist of nodes be- longing to more than one category. We call such graph "multiple- category graphs". Socialnetworks aretypical examples ofmultiple- category graphs: nodes are persons, links are friendships, and cate- gories are communities thatthe persons belong to. Itis often helpful to visualize both connectivity and categories of the graphs simulta- neously. In this paper, we present a new visualization technique for multiple-category graphs. The technique firstly constructs hierar- chical clusters of the nodes based on both connectivity and cate- gories. It then places the nodes by a new hybrid space-filling and force-directed layout algorithm to clearly display both connectivity and category information. We show layout results using our hybrid method and compare it with other methods, and present a case study using an active biological network dataset.
Conference Paper
Full-text available
Our ability to accumulate large, complex (multivariate) data sets has far exceeded our ability to effectively process them in search of patterns, anomalies, and other interesting features. Conventional multivariate visualization techniques generally do not scale well with respect to the size of the data set. The focus of this paper is on the interactive visualization of large multivariate data sets based on a number of novel extensions to the parallel coordinates display technique. We develop a multiresolutional view of the data via hierarchical clustering, and use a variation on parallel coordinates to convey aggregation information for the resulting clusters. Users can then navigate the resulting structure until the desired focus region and level of detail is reached, using our suite of navigational and filtering tools. We describe the design and implementation of our hierarchical parallel coordinates system which is based on extending the XmdvTool system. Lastly, we show examples of the tools and techniques applied to large (hundreds of thousands of records) multivariate data sets.
Conference Paper
Full-text available
The eld of visualization assists data interpretation in many areas, but some types of data are not manageable by existing visualization techniques. This holds in particular for time-varying multichannel EEG data. No existing technique can simultaneously visualize information from all channels in use and all time steps. To address this problem, a new visualization technique is presented, based on the parallel coordinate method and making use of a tiled organization. This tiled organization employs a two-dimensional row-column representation, rather than a one-dimensional arrangement in columns as used for the classical parallel coordinates. The usefulness of the new method, referred to as tiled parallel coordinates, is demonstrated by one particular type of EEG data. It can be applied to an arbitrary number of time steps, for the maximum number of channels currently in use. The general setup of the method makes it widely applicable to other time-varying multivariate data types.
Article
Full-text available
Scatterplots remain one of the most popular and widely-used visual representations for multidimensional data due to their simplicity, familiarity and visual clarity, even if they lack some of the flexibility and visual expressiveness of newer multidimensional visualization techniques. This paper presents new interactive methods to explore multidimensional data using scatterplots. This exploration is performed using a matrix of scatterplots that gives an overview of the possible configurations, thumbnails of the scatterplots, and support for interactive navigation in the multidimensional space. Transitions between scatterplots are performed as animated rotations in 3D space, somewhat akin to rolling dice. Users can iteratively build queries using bounding volumes in the dataset, sculpting the query from different viewpoints to become more and more refined. Furthermore, the dimensions in the navigation space can be reordered, manually or automatically, to highlight salient correlations and differences among them. An example scenario presents the interaction techniques supporting smooth and effortless visual exploration of multidimensional datasets.
Article
Full-text available
VizRank is a tool that finds interesting two-dimensional projections of class-labeled data. When applied to multi-dimensional functional genomics datasets, VizRank can systematically find relevant biological patterns. Availability: http://www.ailab.si/supp/bi-vizrank Supplementary information: http://www.ailab.si/supp/bi-vizrank Contact: blaz.zupan{at}fri.uni-lj.si
Article
Full-text available
The field of visualization assists data interpretation in many areas, but does not manage all types of data equally well. This holds, in particular, for time-varying multichannel EEG data. No existing method can successfully visualize simultaneous information from all channels in use at all time steps. To address this problem, a new visualization method is presented based on the parallel coordinate method and making use of a tiled organization. This tiled organization employs a two-dimensional row-column representation, rather than a one-dimensional arrangement in columns as used for classical parallel coordinates. The usefulness of the new method, referred to as tiled parallel coordinates (TPC), is demonstrated by a particular type of EEG data. It can be applied to an arbitrary number of time steps, handling the maximum number of channels currently in use. An extensive user evaluation shows that, for a typical EEG assessment task, data evaluation by the TPC method is faster than by an existing clinical EEG visualization method, without loss of information. The generality of the TPC method makes it widely applicable to other time-varying multivariate data types.
Article
Full-text available
In order to gain insight into multivariate data, complex structures must be analysed and understood. Parallel coordinates is an excellent tool for visualizing this type of data but has its limitations. This paper deals with one of its main limitations - how to visualize a large number of data items without hiding the inherent structure they constitute. We solve this problem by constructing clusters and using high precision textures to represent them. We also use transfer functions that operate on the high precision textures in order to highlight different aspects of the cluster characteristics. Providing predefined transfer functions as well as the support to draw customized transfer functions makes it possible to extract different aspects of the data. We also show how feature animation can be used as guidance when simultaneously analysing several clusters. This technique makes it possible to visually represent statistical information about clusters and thus guides the user, making the analysis process more efficient.
Conference Paper
Full-text available
We introduce Tukey and Tukey scagnostics and develop graph-theoretic methods for implementing their procedure on large datasets.
Article
Parallel coordinates have been widely applied to visualize high-dimensional and multivariate data, discerning patterns within the data through visual clustering. However, the effectiveness of this technique on large data is reduced by edge clutter. In this paper, we present a novel framework to reduce edge clutter, consequently improving the effectiveness of visual clustering. We exploit curved edges and optimize the arrangement of these curved edges by minimizing their curvature and maximizing the parallelism of adjacent edges. The overall visual clustering is improved by adjusting the shape of the edges while keeping their relative order. The experiments on several representative datasets demonstrate the effectiveness of our approach.
Conference Paper
Visual clutter denotes a disordered collection of graphical entities in information visualization. Clutter can obscure the structure present in the data. Even in a small dataset, clutter can make it hard for the viewer to find patterns, relationships and structure. In this paper, we define visual clutter as any aspect of the visualization that interferes with the viewer's understanding of the data, and present the concept of clutter-based dimension reordering. Dimension order is an attribute that can significantly affect a visualization's expressiveness. By varying the dimension order in a display, it is possible to reduce clutter without reducing information content or modifying the data in any way. Clutter reduction is a display-dependent task. In this paper, we follow a three-step procedure for four different visualization techniques. For each display technique, first, we determine what constitutes clutter in terms of display properties; then we design a metric to measure visual clutter in this display; finally we search for an order that minimizes the clutter in a display
Conference Paper
We present the first truly polynomial algorithm for PAC-lear ning the structure of bounded-treewidth junction trees - an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efficient exact inference. For a constant treewidth, our algorithm has polyno- mial time and sample complexity. If a junction tree with suffi ciently strong intra- clique dependencies exists, we provide strong theoretical guarantees in terms of KL divergence of the result from the true distribution. We also present a lazy extension of our approach that leads to very significant spee d ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of variables w ith only polynomially many mutual information computations on fixed-size subsets of variables, if the underlying distribution can be approximated by a bounded-treewidth junction tree.
Article
Many visualization techniques involve mapping high-dimensional data spaces to lower-dimensional views. Unfortunately, mapping a high-dimensional data space into a scatterplot involves a loss of information; or, even worse, it can give a misleading picture of valuable structure in higher dimensions. In this paper, we propose class consistency as a measure of the quality of the mapping. Class consistency enforces the constraint that classes of n–D data are shown clearly in 2–D scatterplots. We propose two quantitative measures of class consistency, one based on the distance to the class’s center of gravity, and another based on the entropies of the spatial distributions of classes. We performed an experiment where users choose good views, and show that class consistency has good precision and recall. We also evaluate both consistency measures over a range of data sets and show that these measures are efficient and robust.
Conference Paper
The authors address the problem of visualizing a scalar dependent variable which is a function of many independent variables. In particular, cases where the number of independent variables is three or greater are discussed. A new hierarchical method of plotting that allows one to interactively view millions of data points with up to 10 independent variables is presented. The technique is confined to the case where each independent variable is sampled in a regular grid or lattice-like fashion, i.e., in equal increments. The proposed technique can be described in either an active or a passive manner. In the active view the points of the N -dimensional independent variables lattice are mapped to a single horizontal axis in a hierarchical manner, while in the passive view an observer samples the points of the N -dimensional lattice in a prescribed fashion and notes the values of the dependent variable. In the passive view a plot of the dependent variable versus a single parametric variable, which is simply the sampling number, forms the multidimensional graph
Article
A technique for visualizing intrusion-detection system log files using hierarchical data based on IP addresses represents the number of incidents for thousands of computers in one display space. Our technique applies a hierarchical data visualization technique that represents leaf nodes as black square icons and branch nodes as rectangular borders enclosing the icons. This representation style visualizes thousands of hierarchical data leaf nodes equally in one display space. We applied the technique to bioactive chemical visualization and job distribution in parallel-computing environments.
Article
1 Introduction One common problem in graphical user interface design has been n-Vision is a testbed for exploring n-dimensional worlds the need to manipulate and view 3D environments using containing functions of an arbitrary number of variables. inherently 2D interaction devices and displays. Although Although our interaction devices and display hardware are graphics researchers have long been developing true 3D inherently 3D, we demonstrate how they can be used to support interaction and display devices [SUTH65; VICK70; KILP76], it is interaction with these higher-dimensional objects. We introduce a only over the past decade that high-performance 3D graphics new interaction metaphor developed for the system, which we call workstations have been coupled with commercially available 3D "worlds within worlds": nested heterogeneous coordinate devices such as polarized liquid crystal shutters for stereo viewing systems that allow the user to view and manipulate functions. [TEKT87; STER89]...
Article
We present a survey of multidimensional multivariate (mdmv) visualization techniques developed during the last three decades. This subfield of scientific visualization deals with the analysis of data with multiple parameters or factors, and the key relationships among them. The course of development is roughly organized into four stages, within which major milestones are discussed. Recently developed techniques are explored with examples. 1 Introduction Multidimensional multivariate visualization is an important subfield of scientific visualization. It was studied separately by statisticians and psychologists long before computer science was deemed a discipline. The appearance of low-priced personal computers and workstations during the 1980's breathed new life into graphical analysis of mdmv data. This research topic was among one of the short-term goals included in the 1987 National Science Foundation (NSF) sponsored workshop on Visualization in Scientific Computing [MDB87]. Th...
Article
Markov networks are a common class of graphical models used in machine learning. Such models use an undirected graph to capture dependency information among random variables in a joint probability distribution. Once one has chosen to use a Markov network model, one aims to choose the model that "best explains" the data that has been observed---this model can then be used to make predictions about future data. We show that the problem of learning a maximum likelihood Markov network given certain observed data can be reduced to the problem of identifying a maximum weight low-treewidth graph under a given input weight function. We give the first constant factor approximation algorithm for this problem. More precisely, for any fixed treewidth objective k, we find a treewidth-k graph with an f(k) fraction of the maximum possible weight of any treewidth- k graph. 1
Progressive Parallel Coor-dinates, IEEE Pacific Visualization Symposium
  • R Rosenbaum
  • J Zhi
  • B Hamann
R. Rosenbaum, J. Zhi, B. Hamann, Progressive Parallel Coor-dinates, IEEE Pacific Visualization Symposium, 25-32, 2012.
30 Years of Multidimen-sional Multivariate Visualization, Scientific Visualization: Overviews Methodologies and Techniques
  • P C Wong
  • R D Bergeron
P. C. Wong, R. D. Bergeron, 30 Years of Multidimen-sional Multivariate Visualization, Scientific Visualization: Overviews Methodologies and Techniques, IEEE Computer Society Press, 3-33, 1997.
High-dimensional visualizations