ArticlePDF Available

Abstract and Figures

Visual data analysis often requires grouping of data objects based on their similarity. In many application domains researchers use algorithms and techniques like clustering and multidimensional scaling to extract groupings from data. While extracting these groups using a single similarity criteria is relatively straightforward, comparing alternative criteria poses additional challenges. In this paper we define visual reconciliation as the problem of reconciling multiple alternative similarity spaces through visualization and interaction. We derive this problem from our work on model comparison in climate science where climate modelers are faced with the challenge of making sense of alternative ways to describe their models: one through the output they generate, another through the large set of properties that describe them. Ideally, they want to understand whether groups of models with similar spatio-temporal behaviors share similar sets of criteria or, conversely, whether similar criteria lead to similar behaviors. We propose a visual analytics solution based on linked views, that addresses this problem by allowing the user to dynamically create, modify and observe the interaction among groupings, thereby making the potential explanations apparent. We present case studies that demonstrate the usefulness of our technique in the area of climate science.
Content may be subject to copyright.
Visual Reconciliation of Alternative
Similarity Spaces in Climate Modeling
Jorge Poco, Aritra Dasgupta, Yaxing Wei, William Hargrove, Christopher R. Schwalm,
Deborah N. Huntzinger, Robert Cook, Enrico Bertini, and Cl ´
audio T. Silva
Fig. 1: Iterative visual reconciliation of groupings based on climate model structure and model output. Visual inspection of
similarity coupled with an underlying computation model facilitates iterative refinement of the groups and flexible exploration of
the importance of the different parameters.
Abstract— Visual data analysis often requires grouping of data objects based on their similarity. In many application domains re-
searchers use algorithms and techniques like clustering and multidimensional scaling to extract groupings from data. While extracting
these groups using a single similarity criteria is relatively straightforward, comparing alternative criteria poses additional challenges.
In this paper we define visual reconciliation as the problem of reconciling multiple alternative similarity spaces through visualization
and interaction. We derive this problem from our work on model comparison in climate science where climate modelers are faced with
the challenge of making sense of alternative ways to describe their models: one through the output they generate, another through
the large set of properties that describe them. Ideally, they want to understand whether groups of models with similar spatio-temporal
behaviors share similar sets of criteria or, conversely, whether similar criteria lead to similar behaviors. We propose a visual ana-
lytics solution based on linked views, that addresses this problem by allowing the user to dynamically create, modify and observe
the interaction among groupings, thereby making the potential explanations apparent. We present case studies that demonstrate the
usefulness of our technique in the area of climate science.
1 INTRODUCTION
Grouping of data objects based on similarity criteria is a common ana-
lysis task. In different application domains, computational methods
such as clustering, dimensionality reduction, are used for extracting
groupings from data. However, in the real world, with the growing
variety of collected and available data, group characterization is no
longer restricted to a single set of criteria; it usually involves alterna-
tive sets. Exploring the inter-relationship between groups defined by
J. Poco, A. Dasgupta, E. Bertini, and C. Silva are with New York
University. E-mail: {jpocom, adasgupt, enrico.bertini, csilva}@nyu.edu
Y. Wei and R. Cook are with Oak Ridge National Laboratory.
E-mail: {weiy, cookrb}@ornl.gov
W. Hargrove is with USDA Forest Service.
E-mail: hnw@geobabble.org
C. Schwalm and D. Huntzinger are with Northern Arizona University.
E-mail: {Christopher.Schwalm, deborah.huntzinger}@nau.edu
Manuscript received 31 Mar. 2014; accepted 1 Aug. 2014; date of
publication xx xxx 2014; date of current version xx xxx 2014.
For information on obtaining reprints of this article, please send
e-mail to: tvcg@computer.org.
such alternative similarity criteria is a challenging problem. For exam-
ple, in health care, an emerging area of research is to reconcile patient
groups based on their demographics and based on their disease his-
tory, for targeted drug development [42]. In climate science, an open
problem is to analyze how similar outputs from model simulations can
be linked with similarity in the model structures, characterized by di-
verse sets of criteria. Analyzing features of model structures and their
impact on model output, can throw light into important global climate
change indicators [21].
Redescription mining algorithms have been developed for quantify-
ing and exploring relationships among multiple data descriptors [26].
These techniques have focused on mining algorithms for binary data,
where objects are characterized by the presence or absence of certain
features. Group extraction based on such computational methods are
heavily influenced by parameter settings. Also, it usually takes multi-
ple iterations to find an adequate solution; and in most cases, only ap-
proximate solutions can be found. Domain experts need to be involved
in this iterative process, utilizing their expertise for controlling the pa-
rameters. This necessitates a visual analytics approach towards user-
driven group extraction, and communication of relationships among
the groups, which are characterized by diverse descriptive parameters.
To achieve this goal, we introduce a novel visual analytics
Model&Structure&
models'
criteria'
-me'
models'
Reflect'
Model&Output&
Reconcile&structure&with&output&
Reconcile&output&with&structure&
Create'groups'
Split'
Create'groups'
Reflect'
Op-mize'
Fig. 2: Conceptual model of visual reconciliation between binary
model structure data and time-varying model output data. Iterative
creation of groups and derivations of relationship between output si-
milarity and importance of the different model structure criteria. Blue
and orange indicate different groups of models.
paradigm: visual reconciliation, which is an iterative, human-in-the-
loop refinement strategy for reconciling alternative similarity spaces.
The reconciliation technique involves synergy among computational
methods, adaptive visual representations, and a flexible interaction
model, for communicating the relationships among the similarity
spaces. While iterative refinement strategies are not new in visual ana-
lytics [30, 33], sense-making of diverse characterization of data spaces
is still an emerging area of research [39]. In this context, we introduce
the problem of reconciling the same data with respect to alternative
similarity spaces, which in this case comprise of boolean and time-
varying attributes. The strength of the reconciliation model stems from
transparency in presentation and communication of the similarity re-
lationships among diverse data descriptors, with minimal abstraction,
and effective visual guidance through visual cues and direct manipula-
tion of the data. The design and interactions are motivated by domain
experts’ need for visual representations with high fidelity, and a simple
yet effective interaction mechanism for browsing through the parame-
ters.
Our concept of visual reconciliation is grounded in our experience
of collaborating with climate scientists as part of the Multi-Scale Syn-
thesis and Terrestrial Model Inter-comparison Project (MsTMIP). An
open problem in climate science research is how to analyze the effect
that similarity and differences in climate model structures have on the
temporal variance in model outputs. Recent research has shown model
structures can have significant impact on variability of outputs [16],
and that, some of these findings need to be further investigated in de-
tails for exploring different hypotheses.
To achieve these goals, we propose an analysis paradigm for recon-
ciling alternative similarity spaces, that leverages the high bandwidth
of human perception system and exploits the pattern detection and op-
timization capabilities of computing models [3, 18]. The key contribu-
tions of this work stems from a visual reconciliation technique (Fig. 2)
that i) helps climate scientists understand the dependency between al-
ternative similarity spaces for climate models, ii) facilitates iterative
refinement of groups with the help of a feedback loop, and iii) allows
flexible multi-way interaction and exploration of the parameter space
for reconciling the importance of the model parameters with the model
groupings.
2 MOTIVATION
Why do we need to define a new visual analytics technique? Recon-
ciling alternative similarity spaces is challenging on several counts:
i) Data descriptors can comprise of different attribute types. From
a human cognition point-of-view, reconciling the similarity of cli-
mate models across two different visual representations is challenging.
There needs to be explicit encoding of similarity [11] that helps in ef-
ficient visual comparison and preserve the mental model about simi-
larity. Adaptation of similarity needs to be reflected by dynamic link-
ing between views without causing change blindness; ii) For aligning
two different similarity spaces, say computed by two clustering algo-
rithms, we will in most cases get an approximate result. The result will
need to be iterated upon with subsequent parameter tuning to achieve
higher accuracy. This necessitates iteration, and therefore a human-in-
the-loop approach; iii) Domain experts need to trust the methodology
working at the back-end and interact with parameters for understand-
ing their importance. Fully automated methods do not allow that flex-
ibility. Thereby, a transparent representation with minimal abstraction
is necessary where parameters in similarity computation can be influ-
enced by user selections and filters.
As mentioned before, the technique is not restricted to climate mo-
dels, but for simplifying our discussion in this paper we specifically
discuss the applicability of the visual reconciliation technique in the
climate modeling context.
2.1 Problem Characterization
Climate models, specifically Terrestrial Biosphere Models (TBM) are
now an important tool for understanding land-atmosphere carbon ex-
change across the globe. TBMs can be used to attribute carbon
sources (e.g., fires, farmlands) and sinks (e.g., forests, oceans) to ex-
plicit ecosystem processes. Each TBM is defined by the different input
parameters for characterizing these processes and outputs that quan-
tify the dependency between the carbon cycle and the ecosystem pro-
cesses. In the context of this work, each model has a dual representa-
tion of a weighted collection of criteria or descriptive parameters, and
time-series for different outputs, for different regions.
Model Structure: Model structure refers to the types of processes
considered (e.g., nutrient cycling, lateral transport of carbon), and how
these processes are represented through different criteria (e.g., pho-
tosynthetic formulation, temperature sensitivity, etc.) in the models.
A model simulation algorithm can have different implementations of
these processes. These implementations are different from each other
due to the presence or absence of the different criteria, that control the
specific process. For example, if a model simulates photosynthesis, a
group of criteria like simulating carbon pools, influence of
soil moisture, and stomatal conductance can be either
present or absent. Currently, climate scientists do not have an objec-
tive way of choosing one set of criteria over other, that can influence
the output. A model structure is a function of these criteria. If there
are ccriteria, there can be 2ccombinations of this function. In our
data, there are 4 different classes of criteria, for energy, carbon, veg-
etation, and respiration; with each class comprising of criteria, which
are about 20 to 30 in number.
Model Output: Model simulation outputs are ecosystem variables
that help climate scientists predict the rates of carbon dioxide increases
and changes in the atmosphere. For example, Gross Primary Produc-
tivity (GPP) is arguably the most important ecosystem variable, indi-
cating the total amount of energy that is fixed from sunlight, before
respiration and decomposition. Climate scientists need to understand
patterns of GPP in order to predict rates of carbon dioxide increases
and changes in atmospheric temperature.
Relationship between model structure and output: In previous
work, we had developed the SimilarityExplorer tool [28] for analyz-
ing similarity and differences among multifaceted model outputs. De-
spite the standardized protocol used to derive initial conditions, mod-
els show a high degree of variation for GPP, which can be attributed
to differences in model structural information [16].
Therefore, one of the open research questions in the TBM domain
is how similarity or differences in model output can be correlated with
that in model structures. The heterogeneity of model structure and
model output data makes it complex to derive one-to-one relationships
among them. Currently, in absence of an effective analysis technique,
scientists manually browse through the theoretically exponential num-
ber of model structure combinations, and analyze their output. This
process is inefficient and also ineffective due to the large parameter
space which can easily cause important patterns to be missed.
In the visual reconciliation technique, we provide a conceptual
framework that enable scientists to reconcile model structural similar-
ity with output similarity. We focus on using visual analytics methods
for addressing the following high-level analysis questions: i) given all
other factors are constant, analyze how different combination of pa-
rameters within model structure cause similarity or difference in model
output, and ii) by examining time-varying model outputs at different
regions, understand which combination of parameters cause the same
clusters or groups in model structure.
2.2 Visual Reconciliation Goals
As illustrated in Fig. 2, the visual reconciliation technique enables cli-
mate scientists to: i) analyze model structure and use that as feedback
for reconciling similarity or differences in model output, and ii) ana-
lyze model output and use that as a feedback for comparing similarity
or differences in model structure. The reconciliation model focuses on
three key goals:
Similarity encoding and linking: For providing guidance on choos-
ing the starting points of analysis, the visual representations of both
structure and output encode similarity functions. Subsequently, sci-
entists can use those initial seed points for reconciling structure char-
acteristics with output data, or conversely, for reconciling output data
with structure characteristics.
Flexible exploration of parameters: The visual feedback and inter-
action model adapts to the analysts’ workflow. Scientists can choose
different combinations of parameters, customize clusters on model
structure and model output side and accordingly the visual representa-
tions change, different indicators of similarity are highlighted.
Iterative refinement of groups: By incorporating user feedback in
conjunction with a computation model, the reconciliation technique
allows users to explore different group parameters in both data spaces
and iteratively refine the groupings. The key goal here is to understand,
which criteria in model structures are most important in determining
how the outputs are similar or different over time.
3 RE LATED WORK
We discuss the related work in the context of the following threads of
research: i) automated clustering methods for handling different data
descriptors, and visual analytics approaches towards user-driven clus-
tering, ii) integration of user feedback for handling distance functions
in the context of high-dimensional data, and iii) visual analytics solu-
tions for similarity analysis of climate models.
3.1 Clustering Methods
Different clustering methods have been proposed for dealing with al-
ternative similarity spaces. Pfitzner et al. proposed a theoretical frame-
work for evaluating the quality of clusterings through pairwise estima-
tion of similarity [27]. The area of multi-view clustering [4] analyzes
cases when data can be split into two independent subsets. In that
case either subset is conditionally independent of each other and can
be used for learning. Similarly, authors have proposed approaches to-
wards combining multiple clustering results into one clustered output,
using similarity graphs [23]. Although we are also dealing with multi-
ple similarity functions, the goal is to reconcile one with respect to the
other.
In this context, the most relevant research in data mining commu-
nity looks into learning the relationship between different data descrip-
tor sets. The reconciliation idea is similar, in principle, to redescrip-
tion mining which looks at binary feature spaces and uses automated
algorithms for reconciling those spaces [29, 26]. While redescriptions
mostly deal with binary data, we handle both binary data and time-
varying data in our technique.
Our work is also inspired by the consensus clustering concept,
which attempts to find the consensus among multiple clustering al-
gorithms [24] in the context of gene expression data. Consensus clus-
tering has also been applied in other applications in biology and chem-
istry [9, 7]. In our case, while we are interested in the consensus be-
tween similarity of model structure and model output, we also aim at
quantifying and communicating the contribution of the different pa-
rameters towards that consensus or the lack thereof.
We adopt a human-in-the-loop approach, as automated methods
do not provide adequate transparency with respect to the clustering
parameters, and also in most cases, iteration is necessary to present
reconciliation results. Iterative refinement strategies for user-driven
clustering have been proposed for interacting with the intermediate
clustering results [30] for tuning parameters of the underlying algo-
rithms [33], and for making sense of dimension space and item space
of data [39]. Dealing with diverse similarity functions and at the same
time providing a high fidelity visual representation to domain experts
which can be interactively refined, are the key differentiators of our
work. The reconciliation workflow follows an adaptive process, where
the groupings on the model output side are used as an input to the
model structure side for: i) providing guidance to the scientists to-
wards finding similar groups with respect to diverse descriptors or cri-
teria, and ii) understanding the importance of criteria, which is handled
by an underlying optimization algorithm.
3.2 User Feedback for Adaptive Distance Functions
Recently, there has been a lot of interest in the visual analytics com-
munity for investigating how computation and tuning of distance func-
tions can be steered by user interaction and feedback. Gleicher pro-
posed a system called Explainers that attempts to alleviate the problem
of multidimensional projection, where the axes have no semantics, by
providing named axes based on experts’ input [10]. Eli et al. pre-
sented a system that allows an expert to interact directly with a visual
representation of the data to define an appropriate distance function,
without having to modify different parameters [5]. In our case, the
parameter space is of key interest to the user; therefore we create a vi-
sual representation of the parameters, and allow direct user interaction
with them. Our user feedback mechanism based weighted optimiza-
tion method is inspired by the work on manipulating distance functions
by Hu et al. [14]. However, the interactivity and conceptual implemen-
tation is different, since we are working with two different data spaces,
without using multidimensional projections. The modification of dis-
tance functions have also been used for spatial clustering, where user
selected groups are given as input to the algorithm [25]. Our reconcili-
ation method is similar, in principle to this approach, where the system
suggests grouping in one data space, based on the same in other space,
by a combination of user selection and computation.
3.3 Visual Analytics for Climate Modeling
Similarity analysis of model simulations is an emerging problem
in climate science. While visual analysis of simulation models
and their spatio-temporal variance have received attention in other
domains[1, 22], current visual analytics solutions for climate model
analysis [19] mostly focus on addressing the problem at the level of
a single model and understanding its spatio-temporal characteristics.
For example, Steed et al. introduced EDEN [36], a tool based on visu-
alizing correlations in an interactive parallel coordinates plot, focused
on multivariate analysis. Recently, UV-CDAT [40] has been developed
which is a provenance-enabled framework for climate data analysis.
However, like most other tools, UV-CDAT does not support multi-
model analysis [32]. To fill this gap, we recently developed Simi-
larityExplorer [28] for analyzing multi-model similarity with respect
to model outputs. In this case, we are not only comparing multiple
models, but also comparing two different data spaces: model struc-
ture and model output. Climate scientists have found that different
combinations of model structure criteria can potentially throw light
into different simulation output behavior [16]. However, to the best
of our knowledge, no visual analytics solution currently exists in cli-
mate science to address this problem. For developing a solution, for-
mulating an analysis paradigm precedes tool development because of
the complexities involved in handling multiple descriptor spaces. Al-
though there has been some work on hypothesis generation [17] and
task characterization [34] for climate science, they are not sufficient
for handling the reconciliation problem involving alternative similar-
ity spaces.
4 COORDINATED MULTIPLE VIEWS
An important component of the visual reconciliation technique is the
interaction between multiple views [31]. In this case we have binary
model structure data and time-varying model output data. As we had
shown in Fig. 2, the goal is to let domain scientists create and vi-
sualize groups on both sides, and understand the importance of the
different criteria in creating those groups. In this section we provide
Fig. 3: Matrix view for model structure data: Rows represent mod-
els and columns represent criteria. The variation of average implemen-
tation of a criterion for all models is shown by a color gradient from
light yellow to red, with red signifying higher implementation. In the
default view, all criteria have equal importance or weights, indicated
by the heights of the bars. Connectors help visually link the columns
and bars when they are reordered independently.
an overview of the different views and describe the basic interactions
between those.
Matrix View: To display the model structure data, which is a two-
dimensional matrix of 0’s and 1’s, we use a color-coded matrix Fig. 3,
which serves as a presence/absence representation of the different cri-
teria for the model structure. This is inspired from Bertin’s reorderable
matrix [2] and the subsequent interactive versions of the matrix [35].
Since the data is binary, we use two color hues: purple for denoting
presence and gray for absence. Visual salience of a matrix depends
on the order of the rows and columns and numerous techniques have
been developed till data fore reordering [6, 41] and seriation [20]. In
this case, the main motivation is to let the scientists visually separate
the criteria which have high average non-implementation (indicated
by 0’s) and those with high average implementation. For providing
visual cues on potential groups within the data, we reorder the rows
and columns, based on a function that puts the criteria, that are present,
to the upper left of the matrix; and pushes those that are absent, to the
bottom right.
The colored bars on top of the matrix serve a dual purpose. The
heights of the bars indicate the importance or weight of each criteria
for creating groups in model structure. The colors of the bars, with a
light yellow to red gradient indicate the average implementation of a
criterion. For example, as indicated in Fig. 3, the yellow bar indicates
that only three models have implemented that criterion. This gives a
quick overview of which criteria are most implemented, and which
ones, the least. The grey connectors preserve link among bars and
columns during reordering. This is important, especially when criteria
bars and the data columns in the matrix are reordered independently.
Groups can be created by selecting the different criteria. For a sin-
gle criterion, there can be two groups of models: those which do not
implement the criteria and have a value 0, and those which implement
criteria, and have a value 1. With multiple selections, there can be 2c
combinations, with cbeing a criterion. In most practical cases, only a
subset of these combinations exist in the data.
Time Series View: The model output data, which comprises of a time
series for each model, is displayed using a line chart comprising of
multiple time series (Fig. 4a). But effective visual comparison of si-
milarity among multiple groups is difficult using this view because of
two reasons. First, due to similar trajectory of the series, there is a
a lot of overlap, leading to clutter. Second, we are unable to show
the degree of clustering using this approach. To resolve these design
problems, we use small multiples. Small multiples [38] have been
used extensively in visualization, one problem with them is when there
are a large number of them, it becomes difficult to group them visually
without any additional cues. To prevent this, we create a small multiple
for each group. When there are time series for different region, a small
multiple can also be created for each region to compare groupings
across different regions.
Interaction: An overview of the steps in the interactive workflows
between the matrix view and the time series view are shown in Fig. 2.
These actions and operations are described below:
Create Groups: While reconciling model structure with model output,
scientists can first observe similarity among the models based on their
criteria, and accordingly create groups. This is part of the reconcili-
ation workflow described in Section 5.1. In the matrix view, groups
can be created on interaction. In the time-series view, groups are ei-
ther suggested by the system or selected by the user through direct
manipulation. This is part of the reconciliation workflow described in
Section 5.2.
Reflect: Creation of groups triggers reflection of the groups in both
views. On the matrix side, this is through grouping of the rows. On
the time series side, this is done by color coding the lines.
Split: In the time series view, groups can be reflected by splitting the
models into small multiples of model groups.
Optimize: While reconciling model output with structure, to handle
the variable importance of the criteria, an optimization step is neces-
sary. This workflow starts with the scientist selecting groups in the
output, which get reflected in the matrix view. Next they can choose
to optimize the importance or the weights, which leads to subsequent
iteration. This reconciliation workflow is described in detail in Sec-
tion 5.2.
5 RECONCILIATION WORK FLOWS
In this section we describe how we instantiate the conceptual model
of visual reconciliation described in Fig. 2 by incorporating the co-
ordinated multiple views, user interaction and an underlying computa-
tional model. The following workflows provide a step-by-step analysis
of how the views and interactions can be leveraged by climate scien-
tists for getting insight into structure similarity and output similarity.
5.1 Reconcile Structure Similarity with Output Similarity
In Fig. 4 we show the different steps in the workflow when the starting
point of analysis is the model structure. This workflow relies on visual
inspection of structure similarity by using matrix manipulation, and
observing the corresponding patterns in output by creation of small
multiples. The steps are described as follows:
Create groups: For reconciling model structure with output, it is ne-
cessary to first provide visual cues about which models are more sim-
ilar with respect to the different criteria. For this the default layout
of the matrix is sorted from left to right, by high to low average im-
plementation of the different criteria. This is indicated in Fig. 4b by
the transition of the importance bars from red to yellow. This gives
the scientists an idea of which criteria create more evenly sized groups
with 0’s and 1’s. The criteria which are colored dark red and light
yellow will create groups which are skewed: either too many models
implement the criteria or they do not. Selecting criteria which are deep
yellow and orange, gives more balanced clusters, with around 50 per
cent implementation. The highlighted column indicates the criterion
with the highest percentage of implementation.
The selected columns are indicated in Fig. 4c. These two criteria
create four groups. For showing groups of models within the matrix,
we introduce vertical gaps between groups, and then draw colored
borders around each group. Reordering by columns is also allowed
for each group independently as shown in Fig. 4c. In that case, the
weighted ordering of the bars is kept fixed. For visually indicating the
change in ordering we link the criteria by lines. Lines that are parallel
indicate that those criteria have not moved due to reordering and share
the same position for different groups. Since too many crossing lines
can cause clutter, we render the lines with varying opacity. For indi-
cating movement of criteria, we render those lines with higher opacity.
To highlight where a certain criterion is within a group, on selection
we highlight the line by coloring it red as shown in the figure.
Fig. 4: Workflow for reconciling model structure with model output: This linear workflow relies on matrix manipulation techniques and
visual inspection of grouping patterns in the matrix view and the small multiple view.
If columns in each group are reordered independently, that shows
the average implementation patterns for each group clearly. But it
becomes difficult to compare the implementations of a set of criteria
across the different groups. To enable this comparison, user can se-
lect a specific group which will be reordered column-wise, and the
columns in other groups will be sorted by that order. This is shown
in Fig. 4d, where the first group from the top is reordered based on
the columns, and other groups are aligned relative to that group. As
observed, this enables more efficient comparison relative all the imple-
mented and non-implemented criteria in the first group. For example,
we can easily find that the rightmost criteria are not implemented by
the first group of models, but is implemented by all other groups.
Reflect: The creation of groups in the structure is reflected in the out-
put by the color of the groups. Users can see the names of the models
on interaction.
Split: Small multiples can be created for each group (Fig. 4d). The
range of variability of models in each small multiple group reflects
how similar or different they are. This comparison is difficult to
achieve in a time series overloaded with too many lines. This also en-
ables a direct reconciliation of the quality of grouping in model struc-
ture with that of the output. For example, as shown in the figure, only
the orange group has low variability across models, denoting that the
groups based on the criteria in model structure do not create groups
where models produce similar output behavior.
5.2 Reconcile Output Similarity with Structure Similarity
To reconcile output with structure and complete the loop, we need to
account for the fact that different criteria can have different weights or
importance in the creation of groups. One of the goals of the reconcil-
iation models is to enable scientists explore different combinations of
these criteria that can create groups that are similar to the correspond-
ing model output. However, naive visual inspection is inefficient to
analyze all possible combinations without any guidance from the sys-
tem. For this, we developed a weighted optimization algorithm that
complements the human interaction. We describe the algorithm, pro-
vide an outline of its validation, and the corresponding workflow, as
follows.
5.2.1 Weighted Optimization
Using the model structure data and the model output data, we can cre-
ate two distance matrices. The eventual goal is to learn a similarity
function from the output distance matrix and modify the weights of
the criteria in the structure distance function for adapting to the output
similarity matrix. We describe the problem formulation below.
Let ˆ
Mbe a matrix representing the model output with size n×pand
˜
Mrepresents the model structure with size n×q. Similarity in model
output is computed by the function ˆ
d:Rp×RpR. This function
can be any specialized distance function such as Euclidean, Cosine,
etc. For the model structure we use weighted euclidean distance ˜
dw:
Rq×RqR=q
k=1qwk(yk
iyk
j)2, where wkis a weight assigned
to each dimension on ˜
M.
Using ˆ
dwe encode the similarity information of the model output
in a distance matrix ˆ
D. Our goal would be to find the weights’ vector
w={w1, ..., wq}which could create a distance matrix for the model
structure ˜
Dcontaining approximately the same similarity information
as the model output. This problem can be formulated as the minimiza-
tion of the square error of the two distance functions:
minimize
w
n
i=1
n
j=1
k˜
dw(xi,xj)2ˆ
d(yi,yj)2k2
subject to wk0,k=1,...,q.
(1)
where k.kis the L2norm.
Using this vector wwe can define which criteria are important in
the model structure to recreate the same similarity information from
the model output. Note that in the previous formulation we have not
taken into account the user’s feedback. The weights computation step
is similar to the one used in weighted metric multidimensional scal-
ing [12] technique.
If we want to incorporate user’s feedback into our formulation we
can multiply the square errors in Eq. 1 by a coefficient ri,j. This num-
ber represents the importance of each pair of elements in the mini-
mization problem. In our approach we allow the user to define groups
on the model output, then ri,jwill be almost zero or zero for all the
elements i,jin a group. Now, we need to minimize:
minimize
w
n
i=1
n
j=1
ri,jk˜
dw(xi,xj)2ˆ
d(yi,yj)2k2
subject to wk0,k=1,...,q.
(2)
Both equations above can be converted into quadratic problems
and solved using any quadratic programming solvers, such as JOp-
timizer [37] for Java or quadprog in MATLAB.
Our approach of incorporating user feedback for computation of
the weights is similar to the cognitive feedback model, namely V2PI-
MDS [15]. Mathematically the approaches are similar but conceptu-
ally they are different on two counts. First, in their case the projected
data space is another representation of the high-dimensional data space
and they attempt to reconcile the two. In our case however, the un-
derlying data spaces are entirely different. We handle this problem
by using interactive visualization as a means to preserve the mental
model of the scientists about the characteristics of the data. We could
also have used multidimensional projections. But as found in previous
work, domain scientists tend not to trust the information loss caused
by the dimensionality reduction and prefer transparent visualizations,
where the raw data is represented [8].
Second, the user interaction mechanism for providing feedback to
the computation model is also different than the V2PI-MDS model.
We allow users to define groups within the data, as opposed to direct
manipulation and movement of data points in a projection; which is
not applicable in our case. Our focus is on the relationship between the
weights of the dimensions and the similarity they induce. As a result,
we let users explore different groupings by using the sorted weights
and modifying the views accordingly. This results in a rich interactive
analysis for reconciling the two similarity spaces.
−0.8 −0.6 −0.4 −0.20 0.2 0.4 0.6 0.8
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
m1
m2
m3
m4
m6
m7
m8
m9
m10
(a) Model Output (b) Model Structure
Fig. 5: Synthetic data for validating weighted optimization. Using
the model output data in (a) and model structure data in (b), we validate
the accuracy of the optimization algorithm.
5.2.2 Validation
To validate our optimization, we use two synthetic datasets, one for
model output and the other one for model structure. The purpose of
this validation is to demonstrate the accuracy of the algorithm in the
best case scenario, i.e., when a perfect grouping based on some criteria
exists in the data. In most real-world cases, however the optimization
will only create an approximation of the input groups.
Our model output is a two-dimensional dataset and we use a scatter
plot to visualize it (Fig. 5a). We can notice that we have three well
defined groups {m1,m2,m3,m4},{m5,m6,m7,m8}and {m9,m10}.
Fig. 5b shows our synthetic model structure data which contains
boolean values. Each row represents a different model (mi) and each
column a different criterion. The first two criteria were chosen specif-
ically to split the dataset into the same three groups as the model out-
put. For instance when criterion1=0 and criterion2=0 we can create
the group {m1,m2,m3,m4}.The next three columns are random values
(zero or one).
First, we solve the Eq. 1 using our synthetic dataset and
Euclidean distance for the model output; and we get w=
{1.00,0.14,0.06,0.08,0.10}. For visualizations purpose we use the
classical multidimensional scaling algorithm to project the model
structure data using the Weighted Euclidean distance. We normal-
ized the weights between zero and one for visualization purpose,
but the weighted Euclidean distance uses the unnormalized weights.
Fig. 6a shows the two dimensional data. Our vector wwas able
to capture some similarity information from the model output. For
example, {m1,m2,m3,m4}is a well defined group. Even though
{m5,m6,m7,m8}and {m9,m10}are not mixed, they are not well de-
fined groups.
Next, we incorporate user feedback and set the coefficient ri,j
to zero for all pair combinations in the groups {m1,m2,m3,m4},
{m5,m6,m7,m8}and {m9,m10}. Solving Eq. 2 we get the vector
w={1.00,0.77,0.07,0.08,0.10}. Fig. 6b shows the two-dimensional
projection of the model structure using the weighted Euclidean dis-
tance and w. We notice that now the three groups are well de-
fined. Our algorithm gave the highest weights to the first two criteria
(criterion1=1.0 and criterion2=0.7) which we knew a priori have
the best combination to split the model structure in the same groups as
the model output.
These two experiments show that our formulation accurately gives
the highest weights to the most relevant criterion for splitting models
m1
m2
m3
m4
m5
m6
m7
m8
m9
m10
1.00
0.10
0.080.06
0.14
(a) Automatic optimization
m1
m2
m3
m4
m5
m6
m7
m8
m9
m10
1.00
0.100.08
0.07
0.77
(b) Optimization based on user feed-
back
Fig. 6: Validation of user feedback based optimization in the MDS
plots. As we can observe in (b), optimization based on user’s feedback
gives highest weights to the two criteria which are splitting the models
into three groups.
into groups, and this will be used to guide the user during the explo-
ration process. In Section 6 we will show how this approach works
with real data; where in most cases, an approximation of the output
group is produced by the algorithm.
5.2.3 Workflow
In Fig. 7 we show how the complete loop starting from output to struc-
ture, and back, is executed by user interaction and the optimization
algorithm described above. This workflow relies on human inspection
of structure similarity through manipulation of the matrix view and
observation of the corresponding output in the small multiples of time
series. The steps are described as follows:
Create groups in output: For suggesting groups of similar outputs,
the system uses clustering of time series by Euclidean distance or cor-
relation (Fig. 7a). While other metrics are available for clustering time
series, for this case scientists were only interested in these two. Ac-
cordingly, the clusters are updated in the output view.
Reflect in structure: These clusters are reflected in the model struc-
ture side by reordering the matrix based on the groups (Fig. 7b). All
the criteria are given equal weights by default, as indicated by the uni-
form height of the bars. The two views are linked by the color of the
groups. Users can also select groups through direct manipulation of
the time series in the output view.
Optimize weights: Next on observing the system-defined clusters,
one can choose to optimize the weights for the criteria on the struc-
ture side. As shown in Fig. 7c, the columns are reordered from left
to right based on weights. These weights serve as hints to the user
for creating groups on the structure side. The groups are not immedi-
ately created to prevent change blindness. The system needs the user
to intervene to select the criteria, based on which the groups can be
created.
The underlying optimization algorithm as described earlier creates
an approximate grouping based on the input. In many cases, as shown
in the figure, the highest weight may not give a perfect grouping. By
perfect grouping we mean, the optimization algorithm is able to create
the exact same groups as the input from the output side. In most cases,
the weights for an exact solution might not even exist. By using the
optimization, all we get is a group of structure clusters which are as
closely aligned with the output as possible.
Create groups in structure: Based on the suggested weights, a user
can select the two highest weights and create groups, as shown in
Fig. 7d. There are four possible combinations of these two crite-
ria (with 0’s and 1’s) and all of them are shown in their own group.
In many cases all possible combinations might not exist.
Reflect/Split in output:The creation of the groups are also reflected
on the output side by indicating the group membership of each model
by color-coding or by creation of small multiples (Fig. 7e), the output
groups created are not perfect, as they do not exactly match with the
output groups in the previous step. From this however, the scientists
can judge the effect of the two criteria on model output. For example,
if for the selected criteria, the presence or absence does not have an
Fig. 7: Workflow for reconciling output with structure through feedback: This iterative workflow relies on weighted optimization, based
on Equations 1 and 2, and human initiated parameter tuning and selection for reconciling model output with model structure.
impact on the output, that will be reflected in the time series, by their
spread or lack of any significant correlation. For inspecting if com-
bining other criteria can give a more perfect grouping on the structure
side, that matches with the output, scientists need to continue the iter-
ation and repeat the previous steps.
6 CA SE STUDY
We collaborated with 3 climate scientists from the Oak Ridge Na-
tional Laboratory and from the United States Forest Service, as part
of the Multi-Scale Synthesis and Terrestrial Model Inter-comparison
Project (MsTMIP). Each of them have at least ten years of experience
in climate modeling and model inter-comparison. MsTMIP is a formal
multi-scale synthesis, with prescribed environmental and meteorolog-
ical drivers shared among model teams, and simulations standardized
to facilitate comparison with other model results and observations
through an integrated evaluation framework [16]. One key goal of
MsTMIP is to understand the sources and sinks of the greenhouse gas
carbon dioxide, the evolution of those fluxes with time, and their inter-
action with climate change. To accomplish these goals, inter-annual
and seasonal variability of models need to be examined using multiple
time-series. Early results from MsTMIP have shown that variation in
model outputs could be traced to the same in model structure. Using
visual reconciliation, climate scientists wanted to further understand
whether similarity or differences in model structure play a role in the
inter-annual variability of Gross Primary Productivity (GPP) for dif-
ferent regions. Inclusion of particular combinations of simulated pro-
cesses may exaggerate GPP or its timing more than any component
in isolation. Inclusion of a patently incorrect model structure could
dramatically sour model output by itself.
We provided our collaborators with an executable, which they used
for a month and reported back to us on their findings, as reported
below. Then we conducted face-to-face interviews about the usage
of the technique and got positive feedback on how the technique is
a first step towards solving the problem of reconciling model struc-
ture with output. We describe two cases where our collaborators
could find relationships between model structure and model output
using a prototype implementation of the visual reconciliation tech-
nique. The model structure data is segmented into four classes: en-
ergy, carbon, vegetation, and respiration. In this case the scien-
tists wanted to understand the relationship between criteria belong-
ing to energy and vegetation, and their GPP variability in Polar and
North American Temperate regions. Each of the model struc-
ture datasets consist of about 15 models and about 20 to 30 criteria.
6.1 Reconciling seasonal cycle similarity with structural
similarity
The seasonal cycle of a climate model is given by the trajectory of the
time series and the peaks and crests for the different months in a year.
Exploring the impact of seasonal cycles for different models with re-
spect to GPP is an important goal in climate science, since the amount
and timing of energy fixation provides a baseline for almost all other
ecosystem functions, and models must accurately capture this behav-
ior for all regions and conditions before other, more subtle ecosystem
processes can be accurately modeled. The motivation for this sce-
nario was to find if there is any dependency between regional seasonal
cycles of models and included model structures with respect to this
overarching energy criterion.
The scientists started their analysis in the Polar region by select-
ing the M9 and M10 models which appeared to be similar with respect
to both their GPP values and the timing of their seasonal cycles, as
shown in Fig. 8a. Their intent was to observe which energy parameter
causes M9 and M10 to behave similarly in one group, and the rest in
another. They optimized the matrix view to find the most important
criterion, which was found to be Stomatal conductance. After
this step they chose to select this criterion to split the models into two
groups, shown in Fig. 8b and reflected in Fig. 8c. The underlying opti-
mization algorithm thus gave a perfect grouping, with the models that
implement Stomatal conductance in the orange group, while the rest
are in another group. The climate scientists were already able to infer
that Stomatal conductance has strong impact on the seasonal
cycles of M9 and M10.
Next the scientists selected the M6 and M7 models in the
North American Temperate (NAT) region, which appear to be
similar with respect to their seasonal cycle and GPP output (Fig. 8d).
This grouping is already intuitive and inspires confidence, because of
its consistency with the known genealogical relationship of these two
models as siblings. With the same goal as the previous case, they
optimized the matrix view, and found that Prognostic change
was the most important structural criterion to approximately create the
two groups. This structural criterion provided a near-perfect segmen-
Fig. 8: Reconciling seasonal cycle with model structure using the workflow described in Section 5.1. (a) Initial user selection in Polar region
output. (b) Weighted optimization, (c) Corresponding output; (d) Initial user selection in North American Temperate region, (e) Creating groups
based on the first three criteria after optimization. (f) Small multiple groups of models.
tation, except for the M1 model, which also implements this param-
eter, as shown in Fig. 8e. In an attempt to get the exact segmen-
tation, they selected the next two most important criteria, which are
prescribed leaf index and RTS2-stream.M6 and M7 im-
plement both of these criteria and are in one output group, while the
other green output group is split into three sub-groups based on their
implementation of these three criteria. The implementation of these
three criteria thus has a significant effect on the grouping of these
two models with respect to their GPP. The scientists could continue
in this way to find more inferences from the implementation or non-
implementation of these three structural criteria, by further observing
their output in small multiples, as shown in Fig. 8f. This shows that
the blue group, none of which implement Prognostic change,
but all of which implemented the other two, show a greater spread of
GPP output values than any other group. In this way, the scientists
could reconcile the impact of different energy criteria on the seasonal
cycle and regional variability of GPP.
6.2 Iterative exploration of structure-output dependency
In this case, the scientists started by looking at the model structure data
for discovering structure criteria that could explain model groups ha-
ving high and low GPP values across both Polar and NAT regions. A
simple sequential search for criteria is inefficient for reconciliation. To
start their analysis, as shown in Fig. 9a, the matrix view is first sorted
from left to right by the columns having high numbers of implemen-
tations. The sorting enabled the scientists to group using a criterion
that would cause balanced clusters, i.e., divide the models into equal
groups. In this view, these criteria would lie in the center, having or-
ange or deep yellow color. In course of this exploration, they found
that the canopy/stomatal conductance whole canopy
structural criterion splits the group into nearly equal halves. These
clusters are represented in the output by green, i.e., not implement-
ing that criterion, and orange, i.e., implementing that criterion. Fur-
ther, looking at the output, as shown in Fig. 9b, scientists found that
the orange group has higher GPP values and the green group has
lower values. In other words, the models that have implemented
stomatal conductance have higher GPP values than the ones
that have not implemented this criterion. This grouping is consistent
for the North American Temperate region, with the exception
of the M1 model, as shown in Fig. 9c.
Next, the scientists wanted to verify whether by performing op-
timization, they can get the same criterion to be the most impor-
tant for the behavior of GPP within the Polar region, which rep-
resents a different, extreme combination of ecological conditions.
They selected the green group, as shown in Fig. 9d, and then
chose to optimize the matrix view. They found the same crite-
rion (canopy/stomatal conductance whole canopy) to
have the highest weight, reinforcing the reconciling power of this same
group of model structures for explaining differences in GPP across two
extreme eco-regions. Thus, the same criterion that they discovered in-
teractively could be verified algorithmically. Note that, as shown in
Fig. 9d, only one of the models is classified in a different group than
the user-selected group.
For the NAT region, the scientists wanted to drill-down to de-
termine what was causing M1 to behave differently, as was found
during the initial exploration. They defined two groups, with
one of them only having M1 as shown in Fig. 9g. Once they
chose to optimize the matrix, they found that no single criterion
could produce the same output groups. However, by combin-
ing the two most important criteria, that is vegetation heat
and canopy-stomatal sunlit shaded (Fig. 9h), M1 was
put in a separate group by itself. It was the only model that
implemented both of these criteria. Additionally, the scientists
also saw that the models in the green group, which did not im-
plement any of these structures, had a larger range of GPP vari-
ability than the other model groups (Fig. 9i). They concluded
that,by allowing both more- and less-productive sunlit
and shaded canopy leaves, respectively, models which imple-
ment these differential processes seem to stabilize the production of
GPP, even across extremely different eco-regions, possibly accurately
reflecting the actual effect of these processes in nature.
7 CONCLUSION AND FUTURE WORK
We have presented a novel visual reconciliation technique, using
which climate scientists can understand the dependency relationships
between model structure similarity and model output similarity.
Impact: By exploiting visual linking and user-steered optimization,
we are able to communicate to the scientists, the effects of different
Fig. 9: Iterative exploration of structure-output dependency using a combination of the two workflows for reconciliation. (a) Initial user
creation of groups, (b,c) Corresponding groups in regions, (d,e,f) workflow for verifying user-defined groups, (g,h,i) workflow for finding the
criteria that can potentially cause M1 to be an outlier, and then looking at range of variability in small multiple outputs.
groups of criteria on the variability of model output. Using this tech-
nique, scientists could form and explore hypotheses about reconciling
the two different similarity spaces, which was not possible before, yet
crucial for refining climate models; which is reflected in the following
comment by one of our collaborators: “Due to imperfect knowledge,
understanding, and modeling, correlations in the climate modeling do-
main may be weakly exhibited at best. This inherent weakness poses
the greatest challenge to recognition and reconciliation of such corre-
lations; yet, it is only through the reconciliation of such correlations
upon which progress in improving climate models rests. Regarding
the effectiveness of the reconciliation technique, another collaborator
observed that: “One of the most valuable functions of the technique
is to effectively remove from consideration the complications created
from model structures, that have little to no effect on outputs, and to
effortlessly show and rank the differential effects on output created by
seemingly related or unrelated model structures.
Challenges: There are several challenges that need to be addressed.
First, we are using about 15 models and not more than 30 criteria. To
make the reconciliation workflow more scalable, we plan to work on
making the matrix and the small multiples more optimized with res-
pect to the similarity metrics. Extending the matrix to a more general
case with continuous data, we have to use clustering algorithms and
row reordering operations [6, 41] for visual display of the groups. For
handling the scalability issue, also want to focus on dynamic filter-
ing strategies for letting users focus on a subset of groups or parame-
ters and drive the optimization process with more flexibility. Second,
we currently use a simple time model. The success of our approach
will lead us to extend this to more complex models of time, where we
have to use more sophisticated brushing and querying [13]. Finally,
we are handling only two types of descriptors. Increasing diversity
of descriptor data will pose challenges for a high granularity visual
representation, and also for reducing the visual complexity in how the
views interact. We plan to address these challenges in future research
with data from various application domains.
Generalization: As observed before, the visual reconciliation tech-
nique is not restricted to the climate science domain. As a next step,
we will apply this technique in the healthcare domain, where the goal
is to reconcile patient similarity with drug similarity for personalized
medicine development [42]. Another potential application is in the
product design domain. For example in the automotive market, car
models can be qualified by multitude of features. It will be of interest
to automotive companies to reconcile similarity of car models based
on their descriptors, with the similarity based on transaction data. In
short, we posit that visual reconciliation can potentially serve as an
important analytics paradigm for making sense of the ever-growing
variety of available data and their diverse similarity criteria.
8 ACKNOWLEDGEMENT
This work was supported by: the DataONE project (NSF Grant
number OCI-0830944), NSF CNS-1229185, NASA ROSES 10-
BIOCLIM10-0067, and DOE Office of Science Biological and En-
vironmental Research (BER). The data was acquired through the
MAST-DC (NASA Grant NNH10AN68I) and MsTMIP (NASA Grant
NNH10AN68I) projects funded by NASAs Terrestrial Ecology Pro-
gram. We extend our gratitude to members of the Scientific Explo-
ration, Visualization, and Analysis working group (EVA) for their
feedback and support.
REFERENCES
[1] N. Andrienko, G. Andrienko, and P. Gatalsky. Tools for visual compari-
son of spatial development scenarios. In Information Visualization, pages
237–244. IEEE, 2003.
[2] J. Bertin. Semiology of Graphics: Diagrams, Networks, Maps. Central
Asia book series. University of Wisconsin Press, 1983.
[3] E. Bertini and D. Lalanne. Surveying the complementary role of auto-
matic data analysis and visualization in knowledge discovery. In Proceed-
ings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge
Discovery, pages 12–20. ACM, 2009.
[4] S. Bickel and T. Scheffer. Multi-view clustering. In ICDM, volume 4,
pages 19–26, 2004.
[5] E. T. Brown, J. Liu, C. E. Brodley, and R. Chang. Dis-function: Learning
distance functions interactively. In IEEE Conference on Visual Analytics
Science and Technology, pages 83–92, 2012.
[6] C.-H. Chen, H.-G. Hwu, W.-J. Jang, C.-H. Kao, Y.-J. Tien, S. Tzeng, and
H.-M. Wu. Matrix visualization and information mining. In Proceedings
in Computational Statistics, pages 85–100. Springer, 2004.
[7] C.-W. Chu, J. D. Holliday, and P. Willett. Combining multiple classifi-
cations of chemical structures using consensus clustering. Bioorganic &
medicinal chemistry, 20(18):5366–5371, 2012.
[8] J. Chuang, D. Ramage, C. Manning, and J. Heer. Interpretation and trust:
Designing model-driven visualizations for text analysis. In Proceedings
of the SIGCHI Conference on Human Factors in Computing Systems,
pages 443–452. ACM, 2012.
[9] V. Filkov and S. Skiena. Heterogeneous data integration with the consen-
sus clustering formalism. In Data Integration in the Life Sciences, pages
110–123. Springer, 2004.
[10] M. Gleicher. Explainers: Expert explorations with crafted projec-
tions. IEEE Transactions on Visualization and Computer Graphics,
19(12):2042–2051, 2013.
[11] M. Gleicher, D. Albers, R. Walker, I. Jusufi, C. D. Hansen, and J. C.
Roberts. Visual comparison for information visualization. Information
Visualization, 10(4):289–309, 2011.
[12] M. Greenacre. Weighted metric multidimensional scaling. In New De-
velopments in Classification and Data Analysis, Studies in Classification,
Data Analysis, and Knowledge Organization, pages 141–149. Springer,
2005.
[13] H. Hochheiser and B. Shneiderman. Dynamic query tools for time se-
ries data sets: timebox widgets for interactive exploration. Information
Visualization, 3(1):1–18, 2004.
[14] X. Hu, L. Bradel, D. Maiti, L. House, and C. North. Semantics of di-
rectly manipulating spatializations. IEEE Transactions on Visualization
and Computer Graphics, 19(12):2052–2059, 2013.
[15] X. Hu, L. Bradel, D. Maiti, L. House, C. North, and S. Leman. Semantics
of directly manipulating spatializations. IEEE Transactions on Visualiza-
tion and Computer Graphics, 19(12):2052–2059, 2013.
[16] D. N. Huntzinger, C. Schwalm, A. M. Michalak, K. Schaefer, et al.
The north american carbon program multi-scale synthesis and terrestrial
model intercomparison project - part 1: Overview and experimental de-
sign. Geoscientific Model Development Discussions, 6(3):3977–4008,
2013.
[17] J. Kehrer, F. Ladstadter, P. Muigg, H. Doleisch, A. Steiner, and H. Hauser.
Hypothesis generation in climate research with interactive visual data ex-
ploration. IEEE Transactions on Visualization and Computer Graphics,
14(6):1579–1586, 2008.
[18] D. A. Keim, F. Mansmann, and J. Thomas. Visual analytics: how much
visualization and how much analytics? ACM SIGKDD Explorations
Newsletter, 11(2):5–8, 2010.
[19] F. Ladst¨
adter, A. K. Steiner, B. C. Lackner, B. Pirscher, G. Kirchengast,
J. Kehrer, H. Hauser, P. Muigg, and H. Doleisch. Exploration of climate
data using interactive visualization. Journal of Atmospheric and Oceanic
Technology, 27(4):667–679, Apr. 2010.
[20] I. Liiv. Seriation and matrix reordering methods: An historical overview.
Statistical analysis and data mining, 3(2):70–91, 2010.
[21] D. Masson and R. Knutti. Climate model genealogy. Geophysical Re-
search Letters, 38(8), 2011.
[22] K. Matkovic, M. Jelovic, J. Juric, Z. Konyha, and D. Gracanin. Interactive
visual analysis and exploration of injection systems simulations. 2005.
[23] S. Mimaroglu and E. Erdil. Combining multiple clusterings using simi-
larity graph. Pattern Recognition, 44(3):694–703, 2011.
[24] S. Monti, P. Tamayo, J. Mesirov, and T. Golub. Consensus clustering:
a resampling-based method for class discovery and visualization of gene
expression microarray data. Machine learning, 52(1-2):91–118, 2003.
[25] E. Packer, P. Bak, M. Nikkila, V. Polishchuk, and H. J. Ship. Visual
analytics for spatial clustering: Using a heuristic approach for guided ex-
ploration. IEEE Transactions on Visualization and Computer Graphics,
19(12):2179–2188, 2013.
[26] L. Parida and N. Ramakrishnan. Redescription mining: Structure theory
and algorithms. In AAAI, volume 5, pages 837–844, 2005.
[27] D. Pfitzner, R. Leibbrandt, and D. Powers. Characterization and eval-
uation of similarity measures for pairs of clusterings. Knowledge and
Information Systems, 19(3):361–394, 2009.
[28] J. Poco, A. Dasgupta, Y. Wei, W. Hargrove, C. Schwalm, R. Cook,
E. Bertini, and C. Silva. Similarityexplorer: A visual intercomparison
tool for multifaceted climate data. In Computer Graphics Forum, volume
In Publication, 2014.
[29] N. Ramakrishnan, D. Kumar, B. Mishra, M. Potts, and R. F. Helm.
Turning cartwheels: an alternating algorithm for mining redescriptions.
In Proceedings of the tenth ACM SIGKDD international conference on
Knowledge discovery and data mining, pages 266–275. ACM, 2004.
[30] S. Rinzivillo, D. Pedreschi, M. Nanni, F. Giannotti, N. Andrienko, and
G. Andrienko. Visually driven analysis of movement data by progressive
clustering. Information Visualization, 7(3-4):225–239, 2008.
[31] J. C. Roberts. State of the art: Coordinated & multiple views in ex-
ploratory visualization. In Proceedings of the Fifth International Con-
ference on Coordinated and Multiple Views in Exploratory Visualization,
CMV ’07, pages 61–71, Washington, DC, USA, 2007. IEEE Computer
Society.
[32] E. Santos, J. Poco, Y. Wei, S. Liu, B. Cook, D. Williams, and C. Silva.
UV-CDAT: Analyzing climate datasets from a user’s perspective. Com-
puting in Science Engineering, 15(1):94–103, 2013.
[33] T. Schreck, J. Bernard, T. Von Landesberger, and J. Kohlhammer. Visual
cluster analysis of trajectory data with interactive kohonen maps. Infor-
mation Visualization, 8(1):14–29, 2009.
[34] H.-J. Schulz, T. Nocke, M. Heitzler, and H. Schumann. A design space
of visualization tasks. IEEE Transactions on Visualization and Computer
Graphics, 19(12):2366–2375, 2013.
[35] H. Siirtola. Interaction with the reorderable matrix. In Information Vi-
sualization, 1999. Proceedings. 1999 IEEE International Conference on,
pages 272–277, 1999.
[36] C. A. Steed, G. Shipman, P. Thornton, D. Ricciuto, D. Erickson, and
M. Branstetter. Practical application of parallel coordinates for climate
model analysis. Procedia Computer Science, 9(0):877 – 886, 2012.
[37] A. Tivellato. JOptimizer. http://www.joptimizer.com/.
[38] E. R. Tufte. The Visual Display of Quantitative Information. Graphics
Press, Cheshire, CT, USA, 1986.
[39] C. Turkay, A. Lundervold, A. J. Lundervold, and H. Hauser. Repre-
sentative factor generation for the interactive visual analysis of high-
dimensional data. IEEE Transactions on Visualization and Computer
Graphics, 18(12):2621–2630, 2012.
[40] D. N. Williams, T. Bremer, C. Doutriaux, J. Patchett, S. Williams,
G. Shipman, R. Miller, D. R. Pugmire, B. Smith, C. Steed, E. W. Bethel,
H. Childs, H. Krishnan, P. Prabhat, M. Wehner, C. T. Silva, E. Santos,
D. Koop, T. Ellqvist, J. Poco, B. Geveci, A. Chaudhary, A. Bauer, A. Plet-
zer, D. Kindig, G. L. Potter, and T. P. Maxwell. Ultrascale visualization
of climate data. Computer, 46(9):68–76, 2013.
[41] H.-M. Wu, Y.-J. Tien, and C.-h. Chen. Gap: A graphical environment
for matrix visualization and cluster analysis. Computational Statistics &
Data Analysis, 54(3):767–778, 2010.
[42] P. Zhang, F. Wang, J. Hu, and R. Sorrentino. Towards personalized
medicine: Leveraging patient similarity and drug similarity analytics.
AMIA Joint Summits on Translational Science, 2014.
... This study is related to ensemble simulations [43][44][45][46][47][48], deep generative models [39,40,49,50], and residual learning [41]. First, we have a similar objective with ensemble simulations for parameter space exploration, i.e., revealing impacts of simulation parameters on simulation outputs. ...
... Existing approaches can thus be categorized into four groups: brushing and linking, sensitivity analysis, clustering, and direct manipulations. The first one applies brushing and linking of coordinated multiple views to visualize the correspondences between the simulation inputs and outputs [43,44]; the second one utilizes the sensitivity analysis to identify dominant parameters [45]; the third one leverages clustering methods to discover the potential patterns in the simulation parameters with distinguished impacts on the simulation outputs [46,47]; the fourth one uses direct manipulations to study the influence of simulation parameters [48], where one can directly interact with the visualization of simulation inputs and outputs simultaneously. ...
Article
Full-text available
Nonlinear dynamical systems in applications such as design and control typically depend on a set of variable parameters that represent system geometry , boundary conditions, material properties, etc. Such a parameterized dynamical system requires a parame-terized model (e.g., a parameterized differential equation) to describe. On the one hand, to discover the wide variety of the parameter-dependent dynamical behaviors, repeated simulations with the parameterized model are often required over a large range of parameter values, leading to significant computational burdens especially when the system is complex (strongly non-linear and/or high-dimensional) and the high-fidelity model is inefficient to simulate. Thus, seeking surro-gate models that mimic the behaviors of high-fidelity parameterized models while being efficient to simulate is critically needed. On the other hand, the governing equations of the parameterized nonlinear dynami-cal system (e.g., an aerodynamic system with a physical model (full-scale or scaled in the laboratory) for optimization or design tasks) may be unknown or partially unknown due to insufficient physics knowledge, leading to an inverse problem where we need to identify the models from measurement data only. Accordingly , this work presents a novel deep generative framework for data-driven surrogate modeling/identification of parameterized nonlinear dynamical systems from data only. Specifically, the presented framework learns the direct mapping from simulation parameters to visu-alization images of dynamical systems by leveraging deep generative convolutional neural networks, yielding two advantages: (i) the surrogate simulation is efficient because the calculation of transient dynamics over time is circumvented; (ii) the surrogate output retains characterizing ability and flexibility as the visualiza-tion image is customizable and supports any visual-ization scheme for revealing and representing high-level dynamics feature (e.g., Poincaré map). We study and demonstrate the framework on Lorenz system, forced pendulum system, and forced Duffing system. We present and discuss the prediction performance of the obtained surrogate models. It is observed that the obtained model has promising performance on capturing the sensitive parameter dependence of the nonlinear dynamical behaviors even when the bifurcation occurs. We also discuss in detail the limitation of this work and potential future work.
... Ensemble Visualization. Ensemble data is a collection of outputs generated from different executions of the same simulation models with slightly varying parameters [36], or executions of different simulation models [31,73,75]. This data is usually generated to model initial boundary conditions [75,96], investigate parameters [8,33,92,96], analyze uncertainty [8,65,79] or compare different ensemble models [73,75]. ...
... Ensemble data is a collection of outputs generated from different executions of the same simulation models with slightly varying parameters [36], or executions of different simulation models [31,73,75]. This data is usually generated to model initial boundary conditions [75,96], investigate parameters [8,33,92,96], analyze uncertainty [8,65,79] or compare different ensemble models [73,75]. Due to the advancement of computational power and data acquisition tools, ensemble data is generated at an unprecedented rate throughout varied disciplines [36,88]. ...
Article
Full-text available
Contrails are condensation trails generated from emitted particles by aircraft engines, which perturb Earth's radiation budget. Simulation modeling is used to interpret the formation and development of contrails. These simulations are computationally intensive and rely on high-performance computing solutions, and the contrail structures are not well defined. We propose a visual computing system to assist in defining contrails and their characteristics, as well as in the analysis of parameters for computer-generated aircraft engine simulations. The back-end of our system leverages a contrail-formation criterion and clustering methods to detect contrails' shape and evolution and identify similar simulation runs. The front-end system helps analyze contrails and their parameters across multiple simulation runs. The evaluation with domain experts shows this approach successfully aids in contrail data investigation.
... Ensemble Visualization. Ensemble data is a collection of outputs generated from different executions of the same simulation models with slightly varying parameters [36], or executions of different simulation models [31,73,75]. This data is usually generated to model initial boundary conditions [75,96], investigate parameters [8,33,92,96], analyze uncertainty [8,65,79] or compare different ensemble models [73,75]. ...
... Ensemble data is a collection of outputs generated from different executions of the same simulation models with slightly varying parameters [36], or executions of different simulation models [31,73,75]. This data is usually generated to model initial boundary conditions [75,96], investigate parameters [8,33,92,96], analyze uncertainty [8,65,79] or compare different ensemble models [73,75]. Due to the advancement of computational power and data acquisition tools, ensemble data is generated at an unprecedented rate throughout varied disciplines [36,88]. ...
Preprint
Full-text available
Contrails are condensation trails generated from emitted particles by aircraft engines, which perturb Earth's radiation budget. Simulation modeling is used to interpret the formation and development of contrails. These simulations are computationally intensive and rely on high-performance computing solutions, and the contrail structures are not well defined. We propose a visual computing system to assist in defining contrails and their characteristics, as well as in the analysis of parameters for computer-generated aircraft engine simulations. The back-end of our system leverages a contrail-formation criterion and clustering methods to detect contrails' shape and evolution and identify similar simulation runs. The front-end system helps analyze contrails and their parameters across multiple simulation runs. The evaluation with domain experts shows this approach successfully aids in contrail data investigation.
... Visualization researchers usually regard simulation parameters as multidimensional vectors and use techniques designed for high-dimensional data to analyze the parameter space. These techniques contains radial plots [9,11,12], glyphs [8], scatter plots [23,29,36], parallel plots [28,42], matrices [33], and line charts [7]. One significant constraint of these techniques is that they cannot explore the input parameters that have not been simulated. ...
Article
Full-text available
We propose VDL-Surrogate, a view-dependent neural-network-latent-based surrogate model for parameter space ex- ploration of ensemble simulations that allows high-resolution visualizations and user-specified visual mappings. Surrogate-enabled parameter space exploration allows domain scientists to preview simulation results without having to run a large number of computa- tionally costly simulations. Limited by computational resources, however, existing surrogate models may not produce previews with sufficient resolution for visualization and analysis. To improve the efficient use of computational resources and support high-resolution exploration, we perform ray casting from different viewpoints to collect samples and produce compact latent representations. This latent encoding process reduces the cost of surrogate model training while maintaining the output quality. In the model training stage, we select viewpoints to cover the whole viewing sphere and train corresponding VDL-Surrogate models for the selected viewpoints. In the model inference stage, we predict the latent representations at previously selected viewpoints and decode the latent representations to data space. For any given viewpoint, we make interpolations over decoded data at selected viewpoints and generate visualizations with user-specified visual mappings. We show the effectiveness and efficiency of VDL-Surrogate in cosmological and ocean simulations with quantitative and qualitative evaluations. Source code is publicly available at https://github.com/trainsn/VDL-Surrogate .
... Visualization researchers usually regard simulation parameters as multidimensional vectors and use techniques designed for high-dimensional data to analyze the parameter space. These techniques contains radial plots [9,11,12], glyphs [8], scatter plots [23,29,36], parallel plots [28,42], matrices [33], and line charts [7]. One significant constraint of these techniques is that they cannot explore the input parameters that have not been simulated. ...
Preprint
Full-text available
We propose VDL-Surrogate, a view-dependent neural-network-latent-based surrogate model for parameter space exploration of ensemble simulations that allows high-resolution visualizations and user-specified visual mappings. Surrogate-enabled parameter space exploration allows domain scientists to preview simulation results without having to run a large number of computationally costly simulations. Limited by computational resources, however, existing surrogate models may not produce previews with sufficient resolution for visualization and analysis. To improve the efficient use of computational resources and support high-resolution exploration, we perform ray casting from different viewpoints to collect samples and produce compact latent representations. This latent encoding process reduces the cost of surrogate model training while maintaining the output quality. In the model training stage, we select viewpoints to cover the whole viewing sphere and train corresponding VDL-Surrogate models for the selected viewpoints. In the model inference stage, we predict the latent representations at previously selected viewpoints and decode the latent representations to data space. For any given viewpoint, we make interpolations over decoded data at selected viewpoints and generate visualizations with user-specified visual mappings. We show the effectiveness and efficiency of VDL-Surrogate in cosmological and ocean simulations with quantitative and qualitative evaluations. Source code is publicly available at https://github.com/trainsn/VDL-Surrogate.
... Traditional parameter space exploration methods first collect the simulation input and output pairs from ensemble runs, and perform parameter space exploration on the collected pairs. In the visualization field, to explore the parameter space of highdimensional ensemble data, researchers rely on visualization methods such as glyphs [11], matrices [12], line charts [13], parallel plots [2], [14], scatter plots [15]- [17], and radial plots [18]- [20]. The major limitation of these methods is the inability to analyze input parameters that have not been simulated. ...
Article
Full-text available
We propose GNN-Surrogate, a graph neural network-based surrogate model to explore the parameter space of ocean climate simulations. Parameter space exploration is important for domain scientists to understand the influence of input parameters (e.g., wind stress) on the simulation output (e.g., temperature). The exploration requires scientists to exhaust the complicated parameter space by running a batch of computationally expensive simulations. Our approach improves the efficiency of parameter space exploration with a surrogate model that predicts the simulation outputs accurately and efficiently. Specifically, GNN-Surrogate predicts the output field with given simulation parameters so scientists can explore the simulation parameter space with visualizations from user-specified visual mappings. Moreover, our graph-based techniques are designed for unstructured meshes, making the exploration of simulation outputs on irregular grids efficient. For efficient training, we generate hierarchical graphs and use adaptive resolutions. We give quantitative and qualitative evaluations on the MPAS-Ocean simulation to demonstrate the effectiveness and efficiency of GNN-Surrogate. Source code is publicly available at https://github.com/trainsn/GNN-Surrogate .
... Traditional parameter space exploration methods first collect the simulation input and output pairs from ensemble runs, and perform parameter space exploration on the collected pairs. In the visualization field, to explore the parameter space of highdimensional ensemble data, researchers rely on visualization methods such as glyphs [11], matrices [12], line charts [13], parallel plots [2], [14], scatter plots [15]- [17], and radial plots [18]- [20]. The major limitation of these methods is the inability to analyze input parameters that have not been simulated. ...
Preprint
Full-text available
We propose GNN-Surrogate, a graph neural network-based surrogate model to explore the parameter space of ocean climate simulations. Parameter space exploration is important for domain scientists to understand the influence of input parameters (e.g., wind stress) on the simulation output (e.g., temperature). The exploration requires scientists to exhaust the complicated parameter space by running a batch of computationally expensive simulations. Our approach improves the efficiency of parameter space exploration with a surrogate model that predicts the simulation outputs accurately and efficiently. Specifically, GNN-Surrogate predicts the output field with given simulation parameters so scientists can explore the simulation parameter space with visualizations from user-specified visual mappings. Moreover, our graph-based techniques are designed for unstructured meshes, making the exploration of simulation outputs on irregular grids efficient. For efficient training, we generate hierarchical graphs and use adaptive resolutions. We give quantitative and qualitative evaluations on the MPAS-Ocean simulation to demonstrate the effectiveness and efficiency of GNN-Surrogate. Source code is publicly available at https://github.com/trainsn/GNN-Surrogate.
... However, both approaches are limited by the number of models (< 10). For parameter-space analysis, Wang et al. proposed a nested parallel coordinates based visualization system [47], while Poco et al. used a visual reconciliation method for understanding the effect of input parameters on output similarity [37]. ...
Article
Full-text available
Experts in data and physical sciences have to regularly grapple with the problem of competing models. Be it analytical or physics-based models, a cross-cutting challenge for experts is to reliably diagnose which model outcomes appropriately predict or simulate real-world phenomena. Expert judgment involves reconciling information across many, and often, conflicting criteria that describe the quality of model outcomes. In this paper, through a design study with climate scientists, we develop a deeper understanding of the problem and solution space of model diagnostics, resulting in the following contributions: i) a problem and task characterization using which we map experts' model diagnostics goals to multi-way visual comparison tasks, ii) a design space of comparative visual cues for letting experts quickly understand the degree of disagreement among competing models and gauge the degree of stability of model outputs with respect to alternative criteria, and iii) design and evaluation of MyriadCues, an interactive visualization interface for exploring alternative hypotheses and insights about good and bad models by leveraging comparative visual cues. We present case studies and subjective feedback by experts, which validate how MyriadCues enables more transparent model diagnostic mechanisms, as compared to the state of the art.
Article
The comparison of meteorological data is a fundamental task within the techniques of meteorological forecasting due to the cyclical nature of the climate. However, interpretation of meteorological datasets can be difficult due to their large size. The goal of information visualization is to support a better understanding of the data what includes assisting the user in the data comparison process. However, few visualization techniques have been specifically developed to support the comparison process of big meteorological data. In this paper, we improved a visualization technique for large time-series comparison. The new technique is more suitable for the comparison of meteorological data.
Article
We describe a visual computing approach to radiation therapy (RT) planning, based on spatial similarity within a patient cohort. In radiotherapy for head and neck cancer treatment, dosage to organs at risk surrounding a tumor is a large cause of treatment toxicity. Along with the availability of patient repositories, this situation has lead to clinician interest in understanding and predicting RT outcomes based on previously treated similar patients. To enable this type of analysis, we introduce a novel topology-based spatial similarity measure, T-SSIM, and a predictive algorithm based on this similarity measure. We couple the algorithm with a visual steering interface that intertwines visual encodings for the spatial data and statistical results, including a novel parallel-marker encoding that is spatially aware. We report quantitative results on a cohort of 165 patients, as well as a qualitative evaluation with domain experts in radiation oncology, data management, biostatistics, and medical imaging, who are collaborating remotely.
Article
Full-text available
Terrestrial biosphere models (TBMs) have become an integral tool for extrapolating local observations and understanding of land-atmosphere carbon exchange to larger regions. The North American Carbon Program (NACP) Multi-scale synthesis and Terrestrial Model Intercomparison Project (MsTMIP) is a formal model intercomparison and evaluation effort focused on improving the diagnosis and attribution of carbon exchange at regional and global scales. MsTMIP builds upon current and past synthesis activities, and has a unique framework designed to isolate, interpret, and inform understanding of how model structural differences impact estimates of carbon uptake and release. Here we provide an overview of the MsTMIP effort and describe how the MsTMIP experimental design enables the assessment and quantification of TBM structural uncertainty. Model structure refers to the types of processes considered (e.g. nutrient cycling, disturbance, lateral transport of carbon), and how these processes are represented (e.g. photosynthetic formulation, temperature sensitivity, respiration) in the models. By prescribing a common experimental protocol with standard spin-up procedures and driver data sets, we isolate any biases and variability in TBM estimates of regional and global carbon budgets resulting from differences in the models themselves (i.e. model structure) and model-specific parameter values. An initial intercomparison of model structural differences is represented using hierarchical cluster diagrams (a.k.a. dendrograms), which highlight similarities and differences in how models account for carbon cycle, vegetation, energy, and nitrogen cycle dynamics. We show that, despite the standardized protocol used to derive initial conditions, models show a high degree of variation for GPP, total living biomass, and total soil carbon, underscoring the influence of differences in model structure and parameterization on model estimates.
Article
Full-text available
The rapid adoption of electronic health records (EHR) provides a comprehensive source for exploratory and predictive analytic to support clinical decision-making. In this paper, we investigate how to utilize EHR to tailor treatments to individual patients based on their likelihood to respond to a therapy. We construct a heterogeneous graph which includes two domains (patients and drugs) and encodes three relationships (patient similarity, drug similarity, and patient-drug prior associations). We describe a novel approach for performing a label propagation procedure to spread the label information representing the effectiveness of different drugs for different patients over this heterogeneous graph. The proposed method has been applied on a real-world EHR dataset to help identify personalized treatments for hypercholesterolemia. The experimental results demonstrate the effectiveness of the approach and suggest that the combination of appropriate patient similarity and drug similarity analytics could lead to actionable insights for personalized medicine. Particularly, by leveraging drug similarity in combination with patient similarity, our method could perform well even on new or rarely used drugs for which there are few records of known past performance.
Article
This paper establishes a general framework for metric scaling of any distance measure between individuals based on a rectangular individuals-by-variables data matrix. The method allows visualization of both individuals and variables as well as preserving all the good properties of principal axis methods such as principal components and correspondence analysis, based on the singular-value decomposition, including the decomposition of variance into components along principal axes which provide the numerical diagnostics known as contributions. The idea is inspired from the chi-square distance in correspondence analysis which weights each coordinate by an amount calculated from the margins of the data table. In weighted metric multidimensional scaling (WMDS) we allow these weights to be unknown parameters which are estimated from the data to maximize the fit to the original distances. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing a matrix and displaying its rows and columns in biplots.
Article
The term Visual Analytics has been around for almost five years by now, but still there are on-going discussions about what it actually is and in particular what is new about it. The core of our view on Visual Analytics is the new enabling and accessible analytic reasoning interactions supported by the combination of automated and visual analysis. In this paper, we outline the scope of Visual Analytics using two problem and three methodological classes in order to workout the need for and purpose of Visual Analytics. By examples of analytic reasoning interaction, the respective advan- tages and disadvantages of automated and visual analysis methods are explained leading to a glimpse into the future of how Visual Analytics methods will enable us to go beyond what is possible when separately using the two methods.
Article
Many statistical techniques, particularly multivariate methodologies, focus on extracting information from data and proximity matrices. Rather than rely solely on numerical characteristics, matrix visualization allows one to graphically reveal structure in a matrix.This article reviews the history of matrix visualization, then gives a more detailed description of its general framework, along with some extensions. Possible research directions in matrix visualization and information mining are sketched. Color versions of figures presented in this article, together with software packages, can be obtained from http:// gap. stat. sinica. edu. tw/ .
Article
Inter-comparison and similarity analysis to gauge consensus among multiple simulation models is a critical visualization problem for understanding climate change patterns. Climate models, specifically, Terrestrial Biosphere Models (TBM) represent time and space variable ecosystem processes, like, simulations of photosynthesis and respiration, using algorithms and driving variables such as climate and land use. While it is widely accepted that interactive visualization can enable scientists to better explore model similarity from different perspectives and different granularity of space and time, currently there is a lack of such visualization tools.In this paper we present three main contributions. First, we propose a domain characterization for the TBM community by systematically defining the domain-specific intents for analyzing model similarity and characterizing the different facets of the data. Second, we define a classification scheme for combining visualization tasks and multiple facets of climate model data in one integrated framework, which can be leveraged for translating the tasks into the visualization design. Finally, we present SimilarityExplorer, an exploratory visualization tool that facilitates similarity comparison tasks across both space and time through a set of coordinated multiple views. We present two case studies from three climate scientists, who used our tool for a month for gaining scientific insights into model similarity. Their experience and results validate the effectiveness of our tool.
Conference Paper
The world's corpora of data grow in size and complexity every day, making it increasingly difficult for experts to make sense out of their data. Although machine learning offers algorithms for finding patterns in data automatically, they often require algorithm-specific parameters, such as an appropriate distance function, which are outside the purview of a domain expert. We present a system that allows an expert to interact directly with a visual representation of the data to define an appropriate distance function, thus avoiding direct manipulation of obtuse model parameters. Adopting an iterative approach, our system first assumes a uniformly weighted Euclidean distance function and projects the data into a two-dimensional scatterplot view. The user can then move incorrectly-positioned data points to locations that reflect his or her understanding of the similarity of those data points relative to the other data points. Based on this input, the system performs an optimization to learn a new distance function and then re-projects the data to redraw the scatter-plot. We illustrate empirically that with only a few iterations of interaction and optimization, a user can achieve a scatterplot view and its corresponding distance function that reflect the user's knowledge of the data. In addition, we evaluate our system to assess scalability in data size and data dimension, and show that our system is computationally efficient and can provide an interactive or near-interactive user experience.
Article
Datasets with a large number of dimensions per data item (hundreds or more) are challenging both for computational and visual analysis. Moreover, these dimensions have different characteristics and relations that result in sub-groups and/or hierarchies over the set of dimensions. Such structures lead to heterogeneity within the dimensions. Although the consideration of these structures is crucial for the analysis, most of the available analysis methods discard the heterogeneous relations among the dimensions. In this paper, we introduce the construction and utilization of representative factors for the interactive visual analysis of structures in high-dimensional datasets. First, we present a selection of methods to investigate the sub-groups in the dimension set and associate representative factors with those groups of dimensions. Second, we introduce how these factors are included in the interactive visual analysis cycle together with the original dimensions. We then provide the steps of an analytical procedure that iteratively analyzes the datasets through the use of representative factors. We discuss how our methods improve the reliability and interpretability of the analysis process by enabling more informed selections of computational tools. Finally, we demonstrate our techniques on the analysis of brain imaging study results that are performed over a large group of subjects.