ArticlePDF Available

Applications of multivariate visualization to behavioral sciences

Authors:

Abstract and Figures

The complexity of psychological science often requires the collection and analysis of multidimensional data. Such data bring about a corresponding cognitive load that has led scientists to develop techniques of scientific visualization to ease the burden. This paper provides an introduction to scientific visualization techniques, a framework for understanding those techniques, and an assessment of the suitability of this approach for psychology. The framework employed builds on the notion of balancingnoise andsmooth in statistical analysis.
Content may be subject to copyright.
Behavior Research Methods, Instruments, &Computers
1995, 27 (2), 264-27/
Applications
of
multivariate visualization
to
behavioral
sciences
YU,
CHONG
HOand JOHNT.BEHRENS
Arizona State University, Tempe,
Arizona
The complexity of psychological science often requires the collection and analysis of multi-
dimensional data. Such data bring about acorresponding cognitive load that has led scientists to de-
velop techniques of scientific visualization
to
ease the burden. This paper provides an introduction
to scientific visualization techniques, aframework for understanding those techniques, and an as-
sessment of the suitability of this approach for psychology. The framework employed builds on the
notion of balancing noise and smooth in statistical analysis.
Widespread availability
of
desk-top computing allows
psychologists to develop and manipulate complex multi-
variate data sets. While researchers in the physical and
engineering sciences have dealt with increasing data
complexity by using scientific visualization, researchers
in the behavioral sciences have been slower to adopt
these tools (Butler, 1993). To address this discrepancy, this
paper defines scientific visualization, presents atheoret-
ical framework for understanding visualization, and re-
views a number
of
multivariate visualization techniques
in light
of
this framework. Because all graphics and ani-
mations available to illustrate the concepts discussed
here cannot be incorporated in this print version, a hy-
pertext version
of
this paper containing these illustra-
tions is available through World-Wide Web browsers.
The primary document and supporting software can be
found in the ASU resources section
of
the server at http:
//seamonkey. ed.asu.edu/-Behrens/.
WHAT IS SCIENTIFIC VISUALIZATION?
We define scientific visualization as the process
of
ex-
ploring and displaying data in a manner that builds a vi-
sual analogy to the physical world in the service
of
user
insight and learning. This entails finding abalance be-
tween the detail
of
the raw data and the parsimony
ofsta-
tistical summary. Each component
of
this definition will
now be addressed.
VISualization
as Data Exploration
Although most statistical training in psychology fo-
cuses on confirmatory data analysis (see Aiken, West, Sech-
rest, & Reno, 1990), there is in statistics a well-established
tradition called exploratory data analysis (EDA). Pio-
neered by the work
of
John Tukey (1977), this tradition
emphasizes the seeking
of
unexpected structure and the
Correspondence should be addressed to Yu, C. H. and J. T. Behrens,
Division
of
Psychology in Education, Program in Measurement, Sta-
tistics and Methodological Studies, 325 Payne Hall, Arizona State
University, Tempe, AZ 85287-0611 (e-mail: alex.yu@asu.edu).
development
of
rich descriptions through graphic sum-
mary, robust statistics, and model fit indicators. Writing
in a tone consonant with this tradition, Cleveland (1993)
has argued that "visualization is an approach to data analy-
sis that stresses apenetrating look at the structure
of
data" (p. 5). Such work is deemed especially important
in the examination
of
multidimensional data for which
algebraic summaries are often difficult to interpret.
Visualization as Analogy Making
Visualization
of
phenomena in the physical sciences
is often striking because of the similarity
of
the computer-
generated images to our expectation
of
how the process
being modeled should look. For example, current com-
puting technology allows the three-dimensional (3-D)
depiction
of
molecules and how they interact. Chemists
benefit from such visualizations, because they can use
their rich knowledge
of
how everyday objects fit together
when they must address how molecules and atoms fit to-
gether. The success
of
such endeavors raises the question
of
whether similar success can be achieved in fields out-
side the physical sciences for which isomorphism between
data and visual images is not straightforward.
This question is misplaced, however, since it fails to
recognize that in all cases
of
computer visualization, the
images are simply graphical analogies to phenomena.
For example, in the case
of
molecular visualization, the
color and shape
of
atoms is
portrayed
visually even
though such particles do not have color and shape in the
sense that these terms are commonly used. Indeed, these
are artistic embellishments that complement the analogy
of
physical objects that fit together. At the same time,
many physical systems seem to offer no direct analogy
that can be successfully visualized. For example, scien-
tists at the Netherlands Research Foundations visualized
as
multidimensional
phenomena
three-dimensional
flows
of
fluid dynamics (Hesselink, Post, &Wijk, 1994)
for which there was no obvious physical analogy. Even
though a single vector can be represented by an arrow, no
compelling physical metaphor exists for a field
of
vec-
tors and a tensor, the product
of
vectors.
Copyright 1995 Psychonomic Society, Inc. 264
MULTIVARIATE VISUALIZATION 265
Figure 1. Histograms with different bandwidths.
30
30
30
20
20
20
10
10
10
o
o
o
data that are important to the viewer, as is illustrated in
top histogram of Figure I. In this case, the summarization
is great, and the reader may wonder whether some im-
portant detail has been smoothed over. The central panel
of
this figure presents an intermediate number
of
bars. In
this view, the balancing
of
smooth and noise is essen-
tially the balancing
of
summary and detail. A computer
program to dynamically illustrate this concept is avail-
able through the World-Wide Web site noted at the be-
ginning
of
this paper.
Another factor determining the noise level
of
a graph
is the degree
of
data structure imposed by the data ana-
lyst. For example, a regression line summarizes the rela-
tionship between the variables and seeks to minimize
residuals, but it assumes homogeneity
of
variance and
linearity
of
the data. This assumption
of
structure may be
inappropriate in the early phases
of
data exploration. On
the other hand, some procedures may be too flexible, in
that they overfit the data and inappropriately suggest
structure that is unique to a sample. Just as in the case
of
balancing noise and smooth, balancing the imposition
of
structure versus the use
of
flexibility involves the sub-
jective judgment and expectation
of
the data analyst.
Building on these ideas, graphical techniques can be
conceived as occurring in the 2-D space
of
smoothness/
noise and dimensionality
of
the data being depicted. Table I
orders a number
of
statistical graphics along these dimen-
sions. The horizontal dimension
of
smoothness/noise is
conceivedas a continuum; the vertical dimensionof variable
dimensionality is conceived as discrete. In the following
section, we discuss a number
of
these statistical graph-
ics in light
of
their noise versus smooth characteristics.
NOISE
AND
SMOOTH
AS A WAY
TO
UNDERSTAND
GRAPIDCS
A
FRAMEWORK
FOR
UNDERSTANDING
STATISTICAL
GRAPIDCS
Visualization in the psychological sciences differs
from those in the physical and engineering sciences in
that
(l)
it will not necessarily have the goal
of
creating
images isomorphic to visual images
of
phenomena-as
is sought in the example of chemistry
molecules-and
(2) the analogies built into the language
of
psychologi-
cal research will need to be examined and exploited. For
example, the psychological lexicon includes many spa-
tial analogies, including concepts
of
measurement space
and the corresponding language
of
"high" or "low" scores
or "wide" distributions. Insofar as all visualization is based
on analogy and all analogy is incomplete, we hope that
the same progress made in the visualization
of
natural
phenomena can be made for visualization
of
psycho log-
ical phenomena.
Since we are faced with numerous graphical devices for
data analysis, it would be helpful to identify some dimen-
sions across which graphic displays differ. Here we present
a framework based on the idea
of
a balance in depicting de-
tail as opposed to summarizing when one is constructing
summaries that are either locally or globally accurate.
In mathematical statistics, much endeavor has been
devoted to resolve the tension between smoothed but bi-
ased summary and detailed but noisy data (Bowman &
Foster, 1993; HardIe, 1991; Nadaraya, 1965; Scott, 1992;
Silverman, 1986). In this view, data analysis is a process
of reducing large amounts
of
information to parsimonious
summaries while remaining accurate in the description
of the total data. This often requires a balancing act between
presenting masses
of
data that may be incomprehensible
to the viewer or summaries that average over too many de-
tails
of
the original data. Visualization seeks to meet this
challenge by portraying complex data in interpretable ways
so that aspects
of
both the messiness and smoothness ofthe
data can be discerned. To make sense
of
the many graphics
for multivariate visualization, we present a framework for
understanding graphics on the basis
of
the idea
of
bal-
ancing summary with raw data as well as balancing local
and global precision. Following statistical terminology,
we discuss this as the balance
of
smooth and noise.
The concepts
of
noise and smooth are perhaps best
understood by using the well-known histogram. The ap-
pearance
of
the histogram is largely controlled by the
number
of
bars used to depict the data. When many bars
are used, the pattern
of
the data may look jittery and
noisy as shown in the bottom histogram
of
Figure I.
Here the details are great and the reader may wonder
whether a simpler underlying form exists. On the other
hand, the use
oftoo
few bars may obscure patterns in the
266 YU AND BEHRENS
VARIATIONS IN GRAPIDCS
FOR
VISUALIZATIONS
One-Dimensional Graphs
The histogram is perhaps the most common graphic
for displayingthe distribution
of
a single variable. Although
constructing ahistogram seems to be straightforward,
the appearance
of
the histogram is arbitrarily tied to the
size
of
the interval used, as is shown in Figure I. As an al-
ternative to a histogram, statisticians have developed sev-
eral smoothing algorithms to estimate the underlying
shape ofthe data (Hardie, 1991; Nadaraya, 1965). The pro-
cess can be thought
of
as the construction
of
numerous
histograms
of
differing interval widths and the averaging
of
the heights
of
the different
bars-a
sort
of
average over
all possible histograms. Figure 2presents two density
smooths applied to the data depicted in Figure 1. Here
the difference between the density shapes is based on the
smoothing algorithm used to average across data points.
A histogram with a large interval width can be smoother
than a density curve with a small interval width. Given that
both kinds of graph use the same intervalwidth, histograms
and density curves are positioned on the continuum shown
in Table 1. Following current practice in the statistical lit-
erature, we will use the term bandwidth rather than the more
specific interval width, since this term is more appropri-
ate for discussions
of
continuous data and functions.
lWo-Dimensional Graphs
Bivariate data are usually presented in a scatterplot,
which is also subject to the bandwidth problem.
If
there
are thousands
of
data points, the scatterplot will appear
to be a messy cluster
of
ink. To address this problem, Carr
(1991) recommended grouping the data into bivariate in-
tervals and plotting ascatterplot with symbol size vary-
ing to indicating the number
of
data points in an interval.
Another way to simplify a noisy scatterplot is smooth-
ing. Again, bandwidth choice inevitably becomes an issue.
When encountering a noisy scatterplot, one can search
for a pattern by dividing the data into several portions
along the x-dimension, computing the median
of
yin each
portion, and then look at the trend by connecting the me-
dians (Tukey, 1977). Mihalisin, Timlin, and Schwegler
(1991) extended this idea by using the mean rather than
the median and introducing bandwidth as a variable. In
Figure 3, the relationship between xand yis depicted in
this fashion, called mean rendering. The data pattern is
clear in the upper right graph, where the bandwidth is
wide. The bandwidth
of
the lower graph is three times
smaller and thus gives a noisier appearance.
Besides median smoothing and mean rendering, a re-
gression line is another way
of
imposing structure on bi-
variate data. Regression assumes the linear function and
is even more forceful than median smoothing and mean
rendering, which allow local fluctuations departing from
linearity. Moreover, a mean rendering imposes more
structure on the data than does a median smoothing, be-
cause interpretation
of
the mean generally assumes nor-
mal distributions. The positions
of
these graphics on the
noise-smooth continuum are shown in Table 1.
DIFFERENTVISUALIZATION
MEmODS
FOR
MULTIVARIATE DATA
Multivariate visualization comes to the fore when re-
searchers have difficulty in comprehending many dimen-
Tablet
Noise--Smootb Continuum
(Include more data)
(Little Structure Imposed)
(Present less data)
(Much Structure Imposed)
Stereo-Ray Glyphs
Volume model
Surface plot
Contour plot
Image plot
Coplot Animated Mesh
Surface
Note-s-There are two dimensions
of
statistical graphing. The horizontal dimension illustrated here is noise-smooth; the
vertical dimension is the number
of
variables or the dimension
of
data.
MULTIVARIATE VISUALIZATION 267
y
overplotting occurs and the data pattern is buried by the
noisy graph. The overplotting problem can be overcome
through the use
of
avolume model, discussed below.
This addition
of
ameter is an example
of
the general
strategy
of
changing the appearance
of
the symbol that
represents an observation.
Some
statistical packages,
such as DataDesk, Xlisp-Stat, and S-Plus also allow the
user to change the shape or color
of
observations on the
basis
of
the values
of
athird variable.
Volume
Model
The volume model overcomes several problems oc-
curring in other techniques, such as overplotting and per-
spective limitation. Avolume model can be viewed as an
enhancement
of
a 3-D plot. In a conventional 3-D plot,
the data points are symbolized by opaque dots. In the
volumetric visualization, each data value is denoted by
intensity.
The
higher the data value is, and the more the
data that lie along the region, the more opaque the line
of
sight is. In this way, the researcher can construct a
transparent data cloud. At the early stage
of
exploratory
data analysis, avolumetric visualization is beneficial. A
volume model shows all data, and thus the researcher can
30
30
20
20
10
10
0
~
0
~
0
~
0
a
0
0
0
N
0
0
CD
q
0
-0
0
0
..
q
0
a
0
0
0
Figure 2. Uniform density smoothing (top)
and
Gaussian density
smoothing (bottom).
sions at one time. In this paper we discuss only stereo-ray
glyphs, volume model, surface plot, contour plot, image
plot, coplot, scatterplot matrix brushing,
and
animated
mesh surface that are presented in Table 1. We recom-
mend that the reader consult Keller and Keller (1993) for
additional techniques.
x
y
Stereo-Ray
Glyphs
A 3-D plot with x, y, and zvariables on three orthog-
onal
axes-called
aspin
plot-is
acommon way to il-
lustrate multivariate data. The user can rotate the plot to
get a sense
of
depth
and
see data hidden from other per-
spectives. Building onto the conventional spin plot, Carr
and Nicholson (1988) added one more dimension to a 2-D
or 3-D plot by using the analogy
of
aglyph or a meter,
which was proposed by Anderson (1960). In a meter, the
value increases as the needle moves from the left to the
right. In plots prepared by Carr and Nicholson, the change
of
the third variable is illustrated by attaching a ray glyph,
which resembles ameter, to each data point. The angle
ofthe
tail
of
each data point indicates the size
of
change
in the moderating variable. In order to view the 3-D plot
with an illusion
of
depth, Carr and Nicholson placed the
same two graphs side by side and recommended that the
user look at the graphs with a stereopticon. Stereo-ray
glyphs have at least two drawbacks. First, it is inconve-
nient for researchers to examine the graphs with astere-
opticon. Second, stereo-ray glyphs may not work when
x
y
X
Figure3.
Mean
renderingofa
scatterplot(top)with low-and high-
frequency bandwidths (center
and
bottom. respectively).
268 YU AND BEHRENS
detect whether there are nonlinear relationships and can
locate the clusters
of
data. In addition, Kaufman, Hohne,
Kruger, Rosenblum, and Schroder (1994) have argued
that a translucent volume model is perspective indepen-
dent. Moreover, the user can slice a vertical or a hori-
zontal cross-section to look at the relationships at certain
points. The strength
of
showing all data is also a weak-
ness, however, since it may take some practice to be able
to interpret such displays properly.
Surface Plot
When the completeness
of
a volume model is not nec-
essary, a surface, a contour, or an image plot may be
more desirable. A surface plot is easily confused with a
smoothed-mesh surface plot. In the former, the surface of
the raw data is depicted, whereas in the latter, a smoothed
summary surface is presented. In a surface plot, the data
values
of
xand zare plotted along the two horizontal
axes, while the data values of ydetermine the height
of
the vertical axis. The appearance
of
a surface plot is tied
to the grid size, just as the shape
of
ahistogram is af-
fected by the bandwidth. Small bandwidths will lead to
a surface plot that appears with many spikes, whereas
larger bandwidths lead to an appearance
of
smoother
mountains. Because a viewer's perception
of
the surface
plots depends on the viewpoint, they are sometimes
called perspectiveplots.
It
is desirable for the researcher
to vary the grid size and the perspective
of
the surface
plot while doing data exploration.
Contour Plot
To overcome the viewpoint limitation
of
asurface
plot, a contour plot takes a bird's eye view. In a contour
1.5
0.5
0.5
1
1.5
2
2.5
3
Figure
4. Image plot with a gray scale.
plot, the y-axis is hidden and the data values in yare rep-
resented by connected lines at discrete levels, as is done
in topographical maps. Although a contour plot is less
viewpoint dependent than a surface plot, it is still not as
perspective free as a volume model.
Another shortcoming
of
a contour plot is that it does
not easily show holes in the data. For example, when a
data set has a concentrated region oflow-value data, this
depression in magnitude is represented by concentric
rings-the
same symbols that are used to show a con-
centration
of
high data values. Moreover, the band height
GIVEN =CLG
(Cenlend
Lellming
60111)
-2 -1 o
100
60
P60
040
I
N20
T100
S
. .,
60 ....
'.
I~
I.:
~
......
60 :.
-::-
.
..
40
20
·2 ·1 o·2 ·1 0 1
-2·1
CPA
(Cenle
red
Pen:elved
Ability)
Figure
5. Coplotimplemented in S-Plus.
o
MULTIVARIATE VISUALIZATION 269
2 2 2
Figure
6. Meshed surface constructed from conditional regression lines.
of
the isolines and smoothing algorithms determine how
a contour plot appears. Therefore, researchers should con-
struct contour plots with different
band
heights and
smoothing algorithms in order to avoid being mentally
"stuck" in one depiction.
Image Plot
An image plot is a bird's-eye view
of
asurface plot,
too. In an image plot, the data values are often repre-
sented by different color hues. The advantage
of
this ap-
proach is that the maximum and minimum values are
easily highlighted. Choices
of
color hue, brightness, and
saturation need to be made carefully. Bertin (1983) found
that
if
the conventional color spectrum is used, red and
blue, which are located at the two ends, are perceived as
more similar than different, whereas yellow, the lightest
color at the center
of
the spectrum, looks more outstand-
ing. In a similar vein, Encarnacao, Foley, Bryson, Feiner,
and Gershon (1994) have argued that a color scale based
on perceived brightness is usually moreeffective. How-
ever, in some software packages, it is difficult,
ifnot
im-
possible, for the user to change the default setup
of
the
color scale. Accordingly, an image plot with a mono-
chrome scale varying in brightness, as in the gray scale
shown in Figure 4, is easier for the viewer to properly in-
terpret the data (cf. Lewandowsky, Herrmann, Behrens,
Li, Pickle, & Jobe, 1993).
Coplot
The preceding techniques portray multiple variables
by local smoothing. The alternative techniques to be intro-
duced below, on the other hand, are performed within the
context
of
regression, which is a type
of
global smoothing.
Cop/ot, which is an abbreviation for conditioning plot,
helps detect interaction effects among several variables.
When one views an interaction, different slopes are ap-
parent between xand yat different levels
of
z.
If
zis bro-
ken into a series
of
intervals, the regression
of
yon xin
each zinterval can be assessed with an eye for differ-
ences in slope across the series
of
plots.
Acoplot as implemented in the S-Plus software (Sta-
tistical Sciences, 1993) is presented in Figure 5. The top
panel is called the given panel; it shows a series
of
inter-
vals across a third variable. The panels below are called
dependence panels; these show a series
of
scatterplots
of
two other variables. In this example, the two variables on
the dependence panels are number
of
points obtained in
amathematics class and a scaled value
of
perceived abil-
ity in mathematics. In the given panel, there is a scale for
learning goal orientation
and
aseries
of
overlapping
lines. Each line represents the range oflearning goal that
is included in a corresponding scatterplot. The first line
reflects the range
of
the learning goal scores included in
the first (upper left) scatterplot, with the second line in-
dicating the range oflearning goal scores included on the
next scatterplot, and so on. The example presented in
Figure 5 shows how the slope relating points and per-
ceived ability fluctuates toward zero in the middle
of
the
learning goal dimension while exhibiting positive slope
elsewhere. Such a pattern would not be self-evident in
the examination
of
simple marginal distributions or un-
conditionalized scatterplots.
The reader may note that the intervals overlap--an as-
pect necessary to maintain the continuous influence
of
points on the conditional regression slope. Because the
degree
of
overlap reflects the degree
of
local condition-
alization for each scatterplot, this aspect
of
the plot is
modifiable in the S-Plus implementation. The length
of
the conditioning intervals also varies, because they re-
flect the density
of
points in different regions
of
the mul-
tidimensional space.
A coplot is a smoother technique than those discussed
above, because the regression lines impose certain struc-
tures on the data. On the other hand, the number
of
the
levels
of
dependence panels can also be viewed as a type
of
bandwidth.
Animated
Mesh Surface
A mesh surface plot is a simplification
of
asurface
plot. The left panel
of
Figure 6 illustrates how a mesh sur-
270 YU AND BEHRENS
m
(P.rc.1ood
.bilitr,
-24.28)
(Perc.bod
.bilitr,
-20.28)
dalHllillli!!llli!ililliliiimiiSiii ',cil:1!iiir.l!li1! HI llBlill • ilili il iiiiiii:iiiiiiiliiiiii,
(P.rceivod
.bilitr,
-13.28)
:kfJlE
\
~
~"
..
hd.;
1;{.1im,m'jj"_~~'_.
*"
--"',.
,$",-",,~
(P.roeivod
.bilitr,
O.12)
u!::ilS,i::iIE!::!.. m!l-IlHlIBI llllllll I Ui: .5
1IHf!!1";:
1!I!!lEEm
mal!
(P.roeb.4
obilitr,
1~.12)
Figure 7. Animated meshed surface.
face is formed by joining the regression lines
of
the pre-
dictor variable (perceived ability) against the criterion
variable (self-regulation) across all levels
of
the second
regressor (learning goal). In this example, three condi-
tional regression lines are drawn as recommended by
Aiken and West (1991). The first one is plotted with the
learning goal value one standard deviation above the mean.
The second one is plotted with the learning goal value at
the mean, and the third, with the learning goal value one
standard deviation below the mean. In this example, the
procedure is implemented in Mathematica (Wolfram,
1991). The remaining plots illustrate the lines extended
across the continuum to produce asurface and the sur-
face being rotated to improve perspective. One merit
of
this approach is that in the first step it shows the regres-
sion lines in the 3-D context
of
the data. The final plot
shows the surface with the data omitted.
By animation, this technique can be easily extended to
the visualization
of
4-D data. In Figure 7, there are three re-
gressors-perceived ability, extrinsic motivation, and per-
formance
goal-and
one outcome variable, deep thought
processing. In the first box, we connect the conditional re-
gression lines
of
performance goal against deep thought
processing across all levels of extrinsic motivation when
the perceived ability is low. The same procedure is re-
peated as the value
of
perceived ability increases. As a re-
sult, we produce a series
of
mesh surfaces like a movie.
The user can either play the entire movie to get an overall
impression or look at the graphs frame by frame. Interest-
ing results may be discovered with this procedure. For in-
stance, the fourth box
of
Figure 7 shows that near the mean
value
of
perceived ability (0.72) the mesh surface is flat
and all main effects are nonsignificant. This occurs despite
the fact that the third-order interaction is significant.
The animated mesh surface shown here requires ad-
vanced programming skills in Mathematica. To make this
technique accessible to more social science researchers,
we have developed aHyperCard front end program to
automate the computing process (Behrens & Yu, 1994).
This text and software are available in World- Wide- Web
format from the World-Wide Web site noted at the be-
ginning
of
the paper.
CONCLUSION
Visualization techniques are often considered valu-
able for meeting the demands
of
multivariate data be-
cause
of
their ability to portray numerous aspects
of
the
data simultaneously. The process
of
visualization can be
viewed as an adjustment
of
noise and smooth. However,
no optimal bandwidth or structure can be applied to most
situations. Researchers are encouraged to look at the
data in different ways. In this sense, visualization is a cre-
ative activity. We hope that the powerful tools demon-
strated in this paper will allow psychologists to explore
and present multidimensional data graphically, in addi-
tion to reporting algebraic expressions such as eigenval-
ues and slopes for interactions. Often such summaries
are too complicated to be interpreted directly, so that the
user is simply left with the conclusion that the result is
significant or not significant while remaining ignorant
of
the actual form
of
the function. Although visualiza-
tion techniques in physical sciences effectively exploit
the physical analogy
oftheir
subject with our perception
of
physical objects, we have argued that all visualization
is based on analogy and rules for statistical analogy are
already present in psychological research tools such as
the histogram and scatterplots. We therefore
believethat
the visualization techniques applied in other fields can
be applied successfully, with appropriate modification,
to psychological phenomena. The success
of
such en-
deavors will depend on detailed knowledge
of
psycho-
logical systems and statistical computing, as well as en-
ergetic creativity.
REFERENCES
AIKEN,
L. S., &
WEST,
s.G. (1991). Multiple regression: Testing and
interpreting interactions.
Newbury
Park, CA: Sage.
MULTIVARIATE VISUALIZATION 271
AIKEN,
L. S.,
WEST,
S. G.,
SECHREST,
L., & RENO,R. R. (1990). Grad-
uate
training
in statistics, methodology, and measurement in psy-
chology: Asurvey
of
Ph.D.
programs
in
North
America. American
Psychologist, 45, 721-734.
ANDERSON,
E. (1960). A
semigraphical
method
for the analysis
of
complex
problems. Technometrics, 2, 387-391.
BEHRENS,
J. T., & Yu, C. H. (1994, June). The visualization
of
multi-
way interactions and high-order terms in multiple regression. Paper
presented
at the
meeting
of
the
Psychometric
Society,
Urbana-
Champaign,
IL.
BERTIN,1. (1983). Semiology
of
graphics'. Madison: University
ofWis-
consin
Press.
BOWMAN,
A
W,
&
FOSTER,
P.
1.(1993). Adaptive smoothing and density-
based
tests
of
multivariate normality. Journal
of
the American Sta-
tistical Association, 88, 529-537.
BUTLER,
D. L. (1993). Graphics in psychology: Pictures, data, and es-
pecially concepts. Behavior Research Methods, Instruments, &Com-
puters, 25, 81-92.
CARR,
D. B. (1991). Looking at large data sets
using
binned
data
plots.
In A.
Buja
& P. A. Tukey (Eds.), Computing and graphics in statis-
tics (pp. 5-39). New York: Springer-Verlag.
CARR,
D. B., &
NICHOLSON,
W. L. (1988). Explor4: A
program
for ex-
ploring
four-dimensional
data
using
stereo-ray glyphs, dimensional
constraints,
rotation, and
masking.
In W S.
Cleveland
& M. E.
McGill (Eds.), Dynamic graphics for statistics (pp. 309-329). Bel-
mont, CA: Wadsworth.
CLEVELAND,
W. S. (1993). Visualizing data. Murray Hill, NJ: AT&T
Bell Lab.
ENCARNACAO,
J.,
FOLEY,
J.,
BRYSON,
S.,
FEINER,
S.
K:
&
GERSHON,
N.
(1994). Research issues in perception and
user
interface. IEEE Com-
puter
Graphics &Applications, 14, 67-69.
HARDLE,
W. (1991). Smoothing techniques: With implementation in S.
New
York: Springer-Verlag.
HESSELINK,
L.,
POST,
F.H., & WIJK,J. J. (1994). Research issues in vec-
tor
and
tensor
field visualization. IEEE'Computer Graphics &Ap-
plications, 14, 76-79.
KAUFMAN,
A,
HOHNE,
KH.,
KRUGER,
W.,
ROSENBLUM,
L., &
SCHRO-
DER,
P. (1994). Research issues in volume visualization. IEEE Com-
puter
Graphics &Applications, 14, 63-66.
KELLER,
P.R., &
KELLER,
M. M. (1993). Visual cues: Practical data vi-
sualization. New Jersey:
IEEE
Press.
LEWANDOWSKY,
S.,
HERRMANN,
D. J.,
BEHRENS,
J. T., LI, S. C.,
PICKLE,
L., &
JOBE,
1. B. (1993). Perceptions
of
clusters in statisti-
cal maps. Applied Cognitive Psychology, 7, 533-551.
MIHALISIN,
T., nMLIN, J., &
SCHWEGLER,
J. (1991). Visualization and
analysis
of
multivariate data: A technique for all fields. In G. M.
Nielsen
& L. Rosenblum (Eds.) Proceedings
of
1991 IEEE Visual-
ization Conference (pp. 171-178). Los Alamitos, CA: IEEE.
NADARAYA,
E. A(1965). On
non-parametric
estimation
of
density
functions
and
regressions curves. Theory
of
Probability &Its Appli-
cation,
10,186-190.
SCOTT,
D. W. (1992). Multivariate density estimation: Theory, prac-
tice, and visualization.
New
York: Wiley.
SILVERMAN,
B. W. (1986). Density estimation for statistics and data
analysis. London:
Chapman
&Hall.
STATISTICAL
SCIENCES
(1993). S-Plusfor Windows.Seattle, WA: Author.
TuKEY,
J. W (1977). Exploratory data analysis. Reading, MA: Addison-
Wesley.
WOLFRAM,
S. (1991). Mathematica: A system
for
doing mathematics by
computer. Reading, MA: Addison-Wesley.
(Manuscript received
November
21, 1994;
revision accepted for publication February
3,1995.)
... A common data visualization technique for examining the association between a dependent and an independent variable is the scatterplot (see Figure 2). However, overplotting-in which jammed data points appear to be a messy cluster of ink-obscures analysts from seeing any pattern (Yu & Behrens, 1995). In addition, it seems that most students hit a ceiling in terms of SES, and also there are several bivariate outliers. ...
Chapter
It is a common practice for educational researchers to employ multilevel modeling to analyze archival data that were collected by multistage sampling (e.g. Programme for International Student Assessment [PISA], Trends for International Math and Science Study [TIMSS], High School and Beyond [HSB], etc.). It is noteworthy that usually the sample size of this type of international and national studies is extremely large, and thus its ultra-high statistical power is associated with a high Type I error rate. Instead of counting on the p-value alone to make a dichotomous decision (to reject or not to reject the null hypothesis), it is advisable to utilize data visualization for pattern seeking. The objective of this chapter is to illustrate how various data visualization techniques can enable researchers to extract insight from data at each step of multilevel modeling. Specifically, this chapter illustrates techniques including linking and brushing, binning and median smoothing, and usage of a bubble plot, local filter, analysis of mean plot, residual plot, and many others.
... Sagiv and Schwartz (2000) published a co-plot map of the dimensional positions of the 57 countries that were included in an extended data collection phase. A co-plot map is a shortened version for a conditioning plot, a form of representation of data that enables the researcher to demonstrate three or more variables at the same time and see whether there are any effects of interaction among them (Yu and Behrens, 1995). Schwartz's co-plot map is presented below where the arrows point towards the 7 orientations from a center. ...
Thesis
Full-text available
Cultural value orientation studies (Kluckhohn and Strodtbeck, 1961; Hofstede, 1980; Trompenaars and Hampden-Turner, 1998; Hall, 1973 and 1976; etc.) examine how different cultures value different things in life, and how these values result in different patterns of accepted behavior in a society. The present doctoral dissertation focuses on cultural value orientation studies (CVOS) and their results from the perspective of their connection to and possible implications for foreign language education (FLE) in Hungary. The dissertation aims to assist language learners to overcome culturally loaded situations where high linguistic competence does not seem to be enough to repair communication breakdown. It proposes a cultural value orientation profile for Hungary based on triangulating the analysis of the existing literature on CVOS, more than 50 curricula vitae and motivational letters of Hungarian learners of English, and 14 interviews conducted with Hungarians working with foreigners on a regular basis, and 14 interviews with foreigners working with Hungarians on a regular basis. The results show that Hungarians tend to accept the unequal distribution of power in the society, have a rather mild individualistic tendency, a not too characteristic tendency towards masculinity, a mild preference for keeping the private and public life separate, a strong tendency towards achievement and high context dependence. In addition, Hungarians seem to strongly favor relationships over rules, accept emotionaldisplays, although they are mostly negative ones, strongly doubt there is opportunity for them to take control and change what is not good for them, and have a medium tendency for long-term orientation. Finally yet importantly, they strongly try to avoid uncertainty, and have a polychronic attitude towards and cyclical view of time. The implications of thisdoctoral research include raising awareness of possible misunderstandings between cultures with the help of representing cultural profiles in polar diagrams, and pinpointing cultural differences where possible clashes may occur for Hungarian speakers of English. This, in turn, is used to identify ways for Hungarian English language teachers to maintain or even boost the motivation of their learners by adapting foreign language methodology to their needs as future language users and intercultural communicators need to reach beyond their culturally coded learning and communication style. Suggestions for further research are also made along with pedagogical implications.Keywords: cultural value orientation, language pedagogy, intercultural speaker, genre of curriculum vitae andmotivational letters, foreign language methodology
... Cobb et al., 2003;Moritz, 2000;Shaughnessy and Noll, 2006;Wild & Pfannkuch, 1999) and sampling distributions (e.g. delMas et al., 1999; Lipson, 2003, Yu & Behrens, 1995 are also well reported to bring about the difficulties of inference. Studies show that even if the students have competency in the arithmetic (like computing with formulas), they still lack the interpretation of these concepts (Bright & Friel, 1998;Groth, 2003;Callingham, 1997). ...
Article
Full-text available
Difficulties in learning (and thus teaching) statistical inference are well reported in the literature. We argue the problem emanates not only from the way in which statistical inference is taught but also from what exactly is taught as statistical inference. What makes statistical inference difficult to understand is that it contains two logics that operate in opposite directions. There is a certain logic in the construction of the inference framework, and there is another in its application. The logic of construction commences from the population, reaches the sample through some steps and then comes back to the population by building and using the sampling distribution. The logic of application, on the other hand, starts from the sample and reaches the population by making use of the sampling distribution. The main problem in teaching statistical inference in our view is that students are taught the logic of application while the fundamental steps in the direction of construction are often overlooked. In this study, we examine and compare these two logics and argue that introductory statistical courses would benefit from using the direction of construction, which ensures that students internalize the way in which inference framework makes sense, rather than that of application.
... The remedy to this problem is 'median smoothing' -that is, changing the display of individual data points to summarized box plots (Tukey, 1977;Yu & Behrens, 1995;Yu, 2014). Figure 14 displays usage of this technique, indicating the relationship between Q1 and the total without Q1. ...
Article
This paper aims to illustrate how data visualization could be utilized to identify errors prior to modeling, using an example with multi-dimensional item response theory (MIRT). MIRT combines item response theory and factor analysis to identify a psychometric model that investigates two or more latent traits. While it may seem convenient to accomplish two tasks by employing one procedure, users should be cautious of problematic items that affect both factor analysis and IRT. When sample sizes are extremely large, reliability analyses can misidentify even random numbers as meaningful patterns. Data visualization, such problematic items in preliminary data cleaning.
... The remedy to this problem is 'median smoothing' -that is, changing the display of individual data points to summarized box plots (Tukey, 1977;Yu & Behrens, 1995;Yu, 2014). Figure 14 displays usage of this technique, indicating the relationship between Q1 and the total without Q1. ...
Article
Full-text available
This paper aims to illustrate how data visualization could be utilized to identify errors prior to modeling, using an example with multi-dimensional item response theory (MIRT). MIRT combines item response theory and factor analysis to identify a psychometric model that investigates two or more latent traits. While it may seem convenient to accomplish two tasks by employing one procedure, users should be cautious of problematic items that affect both factor analysis and IRT. When sample sizes are extremely large, reliability analyses can misidentify even random numbers as meaningful patterns. Data visualization, such as median smoothing, can be used to identify problematic items in preliminary data cleaning.
... Diversos objectos distintos são mostrados dentro do lattice através do mapeamento dos dados de elevada dimensão para alguma característica geométrica ou atributo destes objectos. Os mais comuns são: Glyphs [112] (figura 1.7), ícones [116], Faces [110] Chernoff, Data Jacks e m-Arm Glyph [113]. ...
Article
O tema deste relatório é a visualização da informação. Esta é uma área actualmente muito activa e vital no ensino, na pesquisa e no desenvolvimento tecnológico. A ideia básica é utilizar imagens geradas pelo computador como meio para se obter uma maior compreensão e apreensão da informação que está presente nos dados (geometria) e suas relações (topologia). É um conceito simples, porém poderoso que tem criado imenso impacto em diversas áreas da engenharia e ciência. O relatório está dividido em duas partes. Na primeira parte, é abordado o problema da visualização exactamente no que diz respeito à subtil correlação existente entre as técnicas (e respectivas metáforas), o utilizador e os dados. Na segunda parte, são analisadas algumas aplicações, projectos, ferramentas e sistemas de Visualização de Informação. Para categoriza-los, serão considerados sete tipos de dados básicos subjacentes a eles: unidimensional, bidimensional, tridimensional, multi-dimensional, temporal, hierárquico, rede e workspace.
Book
The term “scientific visualization” was coined by a panel of the Association for Computing Machinery (ACM) organized by the National Sciences Foundation’s Division of Advanced Scientific Computing. At first the term “scientific visualization” was exclusively referred to as visualization in scientific and engineering computing, such computer modeling and simulation. Later visualization practices include data sources from other disciplines and eventually this movement merged with the movement of “information visualization,” which also started in the early 1990s. According to Few, “data visualization” is the umbrella term encompassing both scientific visualization and information visualization. It is important to point out that this book is not a manual of creating statistical graphs. Today software companies release new products or a new version of their products rapidly, and thus the best way to learn the interface is to visit their websites or view a tutorial on YouTube. Rather, the author intends to focus on the principles of data visualization, which tends to be stable for a while and can be well–applied across software modules. For a data analyst, what should be done and why so are more important than how it is done.
Article
Visualization tools are said to be helpful for researchers to unveil hidden patterns and relationships among variables, and also for teachers to present abstract statistical concepts and complicated data structures in a concrete manner. However, higher-dimension visualization techniques can be confusing and even misleading, especially when human-instrument interface and cognitive issues are under-applied. In this article, the efficacy of function-based, data-driven, spatial-oriented, and temporal-oriented visualization techniques are discussed based upon extensive review. Readers can find practical implications for both research and instructional practices. For research purposes, the spatial-based graphs, such as Trellis displays in S-Plus, are preferable over the temporal-based displays, such as the 3D animated plot in SAS/Insight. For teaching purposes, the temporal-based displays, such as the 3D animation plot in Maple, seem to have advantages over the spatial-based graphs, such as the 3D triangular coordinate plot in SyStat.
Book
Full-text available
Increased attention is being paid to the need for statistically educated citizens: statistics is now included in the K-12 mathematics curriculum, increasing numbers of students are taking courses in high school, and introductory statistics courses are required in college. However, increasing the amount of instruction is not sufficient to prepare statistically literate citizens. A major change is needed in how statistics is taught. To bring about this change, three dimensions of teacher knowledge need to be addressed: their knowledge of statistical content, their pedagogical knowledge, and their statistical-pedagogical knowledge, i.e., their specific knowledge about how to teach statistics. This book is written for mathematics and statistics educators and researchers. It summarizes the research and highlights the important concepts for teachers to emphasize, and shows the interrelationships among concepts. It makes specific suggestions regarding how to build classroom activities, integrate technological tools, and assess students' learning. This is a unique book. While providing a wealth of examples through lessons and data sets, it is also the best attempt by members of our profession to integrate suggestions from research findings with statistics concepts and pedagogy. The book's message about the importance of listening to research is loud and clear, as is its message about alternative ways of teaching statistics. This book will impact instructors, giving them pause to consider: "Is what I'm doing now really the best thing for my students? What could I do better?" J. Michael Shaughnessy, Professor, Dept of Mathematical Sciences, Portland State University, USA This is a much-needed text for linking research and practice in teaching statistics. The authors have provided a comprehensive overview of the current state-of-the-art in statistics education research. The insights they have gleaned from the literature should be tremendously helpful for those involved in teaching and researching introductory courses. Randall E. Groth, Assistant Professor of Mathematics Education, Salisbury University, USA.
Conference Paper
The creation of compelling visualisation paradigms is a craft often dominated by intuition and issues of aesthetics, with relatively few models to support good design. The majority of problem cases are approached by simply applying a previously evaluated visualisation technique. A large body of work exists covering the individual aspects of visualisation design such as the human cognition aspects visualisation methods for specific problem areas, psychology studies and so forth, yet most frameworks regarding visualisation are applied after-the-fact as an evaluation measure. We present an extensible framework for visualisation aimed at structuring the design process, increasing decision traceability and delineating the notions of function, aesthetics and usability. The framework can be used to derive a set of requirements for good visualisation design and evaluating existing visualisations, presenting possible improvements. Our framework achieves this by being both broad and general, built on top of existing works, with hooks for extensions and customizations. This paper shows how existing theories of information visualisation fit into the scheme, presents our experience in the application of this framework on several designs, and offers our evaluation of the framework and the designs studied.
Article
Full-text available
Methods of adaptive smoothing of density estimates, where the amount of smoothing applied varies according to local features of the underlying density, are investigated. The difficulties of applying Taylor series arguments in this context are explored. Simple properties of the estimates are investigated by numerical integration and compared with the fixed kernel approach. Optimal smoothing strategies, based on the multivariate Normal distribution, are derived. As an application of these techniques, two tests of multivariate Normality—one based on integrated squared error and one on entropy—are developed, and some power calculations are carried out.