Content uploaded by Martin A. Koschat

Author content

All content in this area was uploaded by Martin A. Koschat on Nov 04, 2015

Content may be subject to copyright.

A Case for Simple Tables

Martin A. KOSCHAT

The task of devising and interpreting tables is an integral part of

statistical practice. Yet tables receive little attention as a topic of

statistical research and statistical education. This neglect seems

to be reﬂected in the quality of the tables that accompany sci-

entiﬁc and nonscientiﬁc presentations. This article argues that

in statistics tables are important tools for communicating infor-

mation, and hence should receive more attention in statistical

research, education, and practice. I discuss basic tables, graph-

ically enhanced tables and, in the context of OLAP, dynamic

tables. Examples supporting the case should be of interest to

statistical practitioners and educators alike.

KEY WORDS: Data analysis; OLAP; Statistical graphics.

1. INTRODUCTION

A statistical analysis starts with exploration and ends with

presentation. Exploration includes gaining an understanding of

the data sources, inspecting the data for data integrity as well

as unusual features that can guide the development of a formal

analysis, and staging the data for further analysis. Presentation

includes communicating the ﬁndings of the analysis to its in-

tended audience.

Tables and graphs ﬁgure prominently in both instances.

George Box’s (1988) dictum still holds true: “A ﬁrst analysis of

experimental results should, I believe, invariably be conducted

using ﬂexible data analytical techniques—looking at graphs and

simple statistics—that so far as possible allow the data to ‘speak

for themselves’. The unexpected phenomena that such an ap-

proach often uncovers can be of the greatest importance in shap-

ing and sometimes redirecting the course of an ongoing inves-

tigation.” This is indeed the course of action initially taken by

most statistical practitioners. In the end, academic as well as

nonacademic articles and reports that communicate the ﬁndings

of statistical analyses usually include tables and graphs.

The beneﬁts of the graphical display of information are widely

acknowledged, and statistical graphics has been an area of ac-

tive and lively research for many years. A substantial bibli-

ography of academic research articles, a dedicated major re-

search journal—namely the Journal of Graphical and Computa-

tional Statistics—and books (Tukey 1977; Mosteller and Tukey

1977; Chambers, Cleveland, Kleiner, and Tukey 1983; Cleve-

land 1994) that by now are seminal texts in the ﬁeld of statistics,

Martin Koschat is Executive Vice President, Information Management, Time/

Warner Retail Sales and Marketing, Sports Illustrated Building, Seventh Floor,

135 West 50th Street, New York, NY 10020-1201 (E-mail: martin

koschat@

timeinc.com). It is a pleasure to acknowledge the thoughtful comments and

helpful suggestions received from Andreas Buja, Anita Hussey, Naomi Robbins,

Raja Velu, the editor, associate editor, and two anonymous reviewers.

attest to a robust history and the continued vitality of statistical

graphics as an area of research.

In the statistical community, the beneﬁts of tables as a format

for displaying information are acknowledged less enthusiasti-

cally. It would seem that the value of tables as a format for

displaying information is recognized mostly indirectly through

the frequency of use. Tables are indeed widely used, and in many

statistical reports and research papers more space is devoted to

tables than graphs. Yet this obvious prominence is barely re-

ﬂected in the broader statistical discourse. Tables as displays of

information are rarely a topic in statistical education, are rarely

a point of discussion in statistical practice and, within the ﬁeld

of statistics, do not appear to receive much, if any, attention from

researchers.

An early and noteworthy contribution on the design of sta-

tistical tables to the statistical literature is Walker and Durost’s

(1936) text, considered by many to be a minor masterpiece.

Since then it would appear that scholarly articles on this topic

have been published mostly by a small handful of researchers,

notably Ehrenberg (1981, 1986), Wainer (1992, 1993, 1997a,

1998), and Tufte (2003). The latter two authors also acknowl-

edge tables as a valuable format for communicating information

in their books (Tufte1983, 1990, 1997; Wainer 1997b), but in the

statistical community these texts are mostly referred to for their

consideration of statistical graphics. Outside statistics, books on

document design occasionally have sections on the construction

of tables (Bigwood, Spore, and Seely 2003; Harris 2000; Shriver

1997).

The purpose of this article is to address the status quo and

contribute to a broader discussion on the value of tables and the

attention that tables should receive in education, practice, and

research. I perceive there to be a need, as well as an opportunity,

and will elaborate on both aspects in the next few sections. This is

obviously an ambitious objective fraught with some uncertainty

as to its outcome. A a more modest objective is the hope that

statistical practitioners and educators will ﬁnd useful ideas in

the examples presented.

2. THE CASE

The prescription for the design of a table is straightforward.

Arrange numbers—and it is usually numbers—in parallel rows

and perpendicular columns.Atableisasimple structure for

arranging numbers. These two deﬁning attributes—the well-

known structure and the use of numbers—also provide the major

rationale for using tables.

As to the structure of tables, it is probably moot to speculate

whether tables’ simple ingenuity is responsible for their wide

use or whether it is tables’ wide availability and use that have

created familiarity and wide acceptance. The fact remains that

even users untrained in any quantitative discipline readily com-

prehend that there is an implied commonality to all the numbers

in the same row, often pertaining to a “case,” and all the num-

bers in the same column, often with the meaning of a “variable.”

© 2005 American Statistical Association DOI: 10.1198/000313005X21429 The American Statistician, February 2005, Vol. 59, No. 1 31

Figure 1. Estimated revenue (in $U.S. ,000) in total (Total) and by major source, advertising (Ad) and circulation (Circ), for the 20 leading U.S.

consumer magazines for 2000 and 2001 as well as year-over-year (YOY) changes. Source: Advertising Age (www.adage.com). Generating Tool:

Microsoft Excel.

This universal understanding of a table’s structure, shared by few

other statistical constructs, is a simple but powerful argument for

using tables.

The act of displaying numbers often gets a bad rap. On the

one hand, there is the notion that the sensibilities of the quanti-

tatively challenged need to be protected; this is often a concern

in business research. On the other hand, analysts believe that

an analysis that starts with basic numbers necessarily needs to

yield something more profound than numbers; this is a notion

that is often encountered in the sciences. There are, however,

good arguments for using numbers in displays, and we mention

three.

A numerical display often presents data in their original form

and subjects them to minimal intervention from an analyst pur-

suing a directed investigation. An analysis of data is, of course,

nothing but a directed investigation, and a thorough analysis may

well require the transformation of original data through graph-

ical constructs or the application of a sophisticated mathemati-

cal model. Although such transformations have signiﬁcant and

well-known beneﬁts, the appreciation for these beneﬁts needs

to be balanced by a consideration for the bias that such trans-

formations may introduce and that is usually transparent to the

end-user of an analysis. The outcome of an analysis is as much

driven by the analytical assumptions and choices as by the data.

At a minimum, there is a distinct beneﬁt to complementing the

presentation of a formal analysis by an informative presentation

of the underlying data in their original form or a meaningful and

unambiguous numerical summary of such data.

Data presented in numerical form can also be easily manip-

ulated and transformed. The reader can easily take the data as

input for a graph or a formal model of her choosing. Numerical

displays support and encourage such a ﬂow. It is easy to translate

numbers into graphs or parameters of a model. It is much harder,

if not impossible, to take the reverse route.

Finally, numbers often—not always—require less of an ex-

planation than glyphs or modeled constructs. In many areas of

statistical application, such as business, analysts and analyti-

cal clients can easily put the number presented to them into

a context where the number has some immediate meaning. In

business, it helps, of course, that the raw numbers encountered

usually fall into a few standard and well-understood categories.

Further, business people often “live” their numbers, particularly

if these happen to be sales numbers. Not only do these people

fully comprehend the meaning of a sales number, they want to

see it!

I do not wish to suggest that the presentation of numerical

information is uniformly better than a graphical display or the

presentation of a formal analysis. I do want to emphasize, how-

ever, that there are often good reasons for displaying numerical

information in a simply structured format even if only to com-

plement graphs or to support the communication of model-based

results.

3. SOME CONSIDERATIONS FOR CONSTRUCTING

TABLES

In many computer programs for statistical analysis, raw and

derived data tend to be organized in rows and columns, thus nat-

urally forming tables. Analysts often succumb to the temptation

to grab such tables in their original form and to drop them into a

document or a presentation without further concern for making

appropriate adjustments. The results of this approach are often

painful to behold as Figure 1, a table taken from an actual pre-

32 General

Figure 2. The same three numbers are arranged as a row and as a

column. The column presentation is better suited to rank-order the three

numbers.

sentation to the senior management of a major U.S. publishing

house, shows.

This table attempts to compare changes in the revenue break-

down of the 20 leading consumer magazines in the U.S. for two

consecutive years. The table fails on several basic dimensions.

For the table to be an effective display of data, its entries must be

easy to compare. Also, entries and comparisons that are of spe-

cial interest should be prominently displayed and easy to ﬁnd.

Neither is the case in Figure 1. This simple observation leads

to a few basic principles useful in the construction of informa-

tive tables. These principles pertain to the choice of rows and

columns, the arrangement of rows and columns, the presenta-

tion of numbers, and the use of simple graphical elements for

structuring the display.

An application of these principles resulted in the table in Fig-

ure 3. This table now effectively shows the gestalt of the market

comprised of the 20 magazines. It quickly communicates the rel-

ative size of each magazine and the relative contribution of each

source to total revenues. Most importantly, it clearly commu-

nicates that 2001 was a difﬁcult year for magazine publishing,

with the difﬁculty mostly due to a softening of the advertising

market.

We use Figure 3 to brieﬂy discuss each of the guiding princi-

ples.

3.1 Choice of Columns and Rows

A ﬁrst step in the construction of a table requires a decision

on which entries to arrange in rows and which entries to arrange

in columns. In general, numerical comparisons are easier made

within columns than within rows. Figure 2 illustrates this point.

The same set of three numbers is arranged as a row and also as a

column showing that numbers are easier to compare and to rank

order in the column presentation.

Apart from supporting the case, the example also indicates

why this should be so. The ranking of two numbers is entirely

determined by the left-most digit in which the two numbers

differ. The assessment of which is larger is then reduced to the

inspection of a single digit. The determination of the position of

this distinguishing digit and the comparison is easily done in the

column presentation even for fairly large numbers, provided, of

course, all digits have been properly aligned.

In the consumer magazine example, one may surmise that

the comparison of different magazines is probably of greater

interest than the comparison of revenue sources. Hence I left the

column/row breakdown as it was in the original table.

3.2 Arrangement of Rows and Columns

Of equal importance, albeit for different reasons, are the

relative arrangement of rows and the relative arrangement of

columns. Rows whose entries one wishes to compare should

ideally be displayed close together, and the same holds true for

the arrangement of columns. Often the data themselves suggest

such a grouping. For example, it may be useful and informative

to arrange rows such that the entries in the principal column of

interest appear sorted. Also, because we tend to read from left

to right and top to bottom, a table’s left upper quadrant is likely

to receive most of a reader’s initial attention. Hence one should

consider arranging rows and columns such that the entries of

greatest interest fall into the left upper quadrant.

In the redesign of the table in Figure1Ilefttheoriginalorder

of the rows intact because it was determined by each magazine’s

total revenue in 2001, resulting in a display with the largest and

presumably most important magazines in the leading rows. I

rearranged the columns so that now all columns pertaining to a

particular revenuestream are grouped together, with the columns

related to total revenue, arguably the most important revenue

ﬁgure, moved farthest to the left.

3.3 Presentation of Numbers

It is usually not possible to arrange all entries that one should,

or might want to, compare in adjacent and vertically aligned

positions. In such instances the reader has to commit, however

brieﬂy, at least one of the numbers to be compared to memory.

The number of distinct digits that most people retain easily after

a single pass is more or less limited to seven (Miller 1956). It is

therefore good practice to transform the data such that ﬁve digits

or fewer represent each table entry, if possible. Often this can be

accomplished by adjusting the scale and, by rounding, limiting

the number of digits retained. Thus, rounding is an important

step with an additional beneﬁt. Usually the left-most digits of a

number are more important than the digits to the right. Retaining

too many digits hinders the reader from paying attention to the

more important digits.

Of course, readers perform some mental rounding on their

own, thus focusing on the relevant digits and retaining a com-

pact mental image of numbers. This task is made easier by the

addition of commas for everythree digits displayed. Generally, it

is the responsibility of the table’s designer to make this process-

ing as easy as possible by displaying numbers concisely. This

means creating a numerical display that retains as few digits as

possible but as many as necessary and that adds useful structure

such as commas to the digits retained.

In the examplehere, I expressedall revenueﬁgures in $ mill.—

rounded to the nearest million—added commas where appro-

priate, and took care to assure that corresponding digits were

aligned within each column.

3.4 Simple Graphical Elements

A table entry is characterized not only by its numerical value

but also by its position within the table. The effectiveness of a

table presentation depends in part on how easy it is to determine

an entry’s positions within the table and to link the entry to its

row and column labels. Two simple graphical elements, lining

and shading, help as the redesigned table in Figure 3 illustrates.

The American Statistician, February 2005, Vol. 59, No. 1 33

Figure 3. Estimated revenue (in $U.S. 000,000) in total and by source for the 20 leading U.S. consumer magazines for 2000 and 2001 as well

as year-over-year changes. Source: Advertising Age (www.adage.com). Generating tool: Microsoft Excel.

Lining refers to the judicious and parsimonious addition of

lines to the basic rectangular data display. The emphasis is on

“judicious” and “parsimonious” because, as in Figure 1, the

problem is often not that there are too few but that there are too

many lines. Separating each pair of adjacent rows and columns

by a line results in a useless grid that does nothing to help the

reader orient herself. On the other hand, the horizontal lines

added to the table in Figure 3 deﬁne horizontal bands of ﬁve

rows each, with two complementary beneﬁts. On the one hand,

the bands are sufﬁciently wide and distinct to be easily trace-

able as the reader’s glance moves from left to right. On the other

hand, each band is sufﬁciently narrow to let each row’s posi-

tion be easily determined within the band. If the sole purpose

of lining is to help the reader orient herself, the number of rows

within each band should be between three and ﬁve.

Lines can also be used to group rows with common themes.

Assuming the number of rows within each group is sufﬁciently

small, the actual number of rows within each band will then

be determined by the subject matter context that deﬁnes the

groupings, and it may well vary by band.

Shading refers to changing the background color for selected

rows and columns. The table in Figure 3 has shaded columns

containing revenue changes. In this example, the beneﬁts of

shading are two-fold. First, similar to lining, the shading of se-

lected columns creates bands that help determine the column

positions of individual entries. Second, shading creates groups

of rows or columns that the reader might wish to compare. This

is particularly useful in instances where the rows or columns one

wishes to compare are not adjacent to one another.

Shading is a frequently used typographical option (Wheildon

1990). It is a delicate addition to a table, and it has to be used

cautiously and judiciously. In particular, it is well known that the

readability of text deteriorates as the background tint gets too

dark. In general, lightly hued background colors are preferable

to gray. If color is not an option, careful attention needs to be

paid to choosing a gray scale that is dark enough to serve the

purpose of structuring the table and that is also light enough so

that numbers can be read easily.

There are other typographical elements one can consider with

beneﬁts perhaps similar to or complementary to lining and shad-

ing. Adherents of the “minimal use of ink” school of thought

might argue that in Figure 3 an effect similar to lining could

have been achieved simply by increasing the line spacing every

ﬁve rows. Arguably, an effect similar to shading could have been

achieved by, instead of shading selected cells, choosing a font

distinct in type, style, or size for the entries in these cells. Figure

6 includes an example of such alternatives.

I would like to emphasize that the implementation of these

suggestions are well within the reach of most statistical ana-

lysts. Although one could argue that Microsoft Excel is single-

handedly responsible for creating most of the abominations

looking like the table in Figure 1, one also needs to acknowl-

edge that this spreadsheet program has the ﬂexibility to help an

analyst prepare tables like the one in Figure 3 with a modest

amount of effort.

34 General

Figure 4. Number of data displays in three arbitrarily chosen issues published in 2003 by major business publications sold in the U.S. Within

each row—but not across rows!—circles are proportional in area to frequency counts. Generating tool: S-Plus 6 (2001).

The suggestions of this section are not intended as a compre-

hensive style guide for the design of tables. Some of the ref-

erences mentioned in the introduction present interesting and,

in speciﬁc applications, useful design ideas that have not been

covered here. For example, Tufte (2003) shows a table on can-

cer survival rates that breaks the standard rectangular layout

of a simple table to good effect. Also, our suggestions do not

provide automatic rules that absolve the analyst from reconcil-

ing often-conﬂicting considerations. However, in my experience

these guidelines together with a modest amount of effort sig-

niﬁcantly improve on the design of the default tables copiously

generated by spreadsheet and other data manipulation programs.

I hope that these suggestions be used as part of a broader effort

Figure 5. Breakdown of annual sell-through efﬁciencies

(sales/inventory) for a monthly magazine for the 111 stores of a small

Midwestern supermarket chain. Generating tool: S-Plus 6 (2001).

that is informed by a variant of the Socratic Principle: The un-

examined table is not worth showing!

4. ADDING GRAPHICAL INFORMATION TO

TABLES

A well-crafted table provides a concise presentation of data.

If done well, it guides the viewer in the exploration of numerical

information, in the process revealing valuable structural insights.

I have already noted the value of simple graphical additions for

structuring tables. A table’s ability to communicate the gestalt

of data can be further enhanced by the addition of graphical ele-

ments that themselves contain information. Such enhancements

may be additions to table entries or additions to the table as a

whole. A few examples will illustrate the point. (I do not lay any

claim to originality and vaguely—hence no references—recall

having seen variants of the graphics used in these examples else-

where.)

Figure 4 includes a table that records for each of four popular

business publications sold in the U.S. the frequency of several

data displays used by the magazines. Within each row—but not

across rows!—a circle proportional in area to its numerical value

surrounds each entry. In conjunction with horizontal shading, the

circles induce a comparison of the share of each display within

each magazine. The graphical enhancement quickly communi-

cates the relative prominence of each display in each publication.

Also, note the angled column labels. Such angled labels can be

fairly long, but are still readable without a need for turning the

table.

The second example is motivated by common statistical prac-

tice. In order to capture the variation in a magazine’s sell-through

efﬁciency—calculated as the ratio of sales and inventory—

across the stores of a small supermarket chain, we broke up the

efﬁciency range into contiguous intervals of equal length and

counted the number of stores falling into each interval. Figure 5

shows the resulting table that has been enhanced by horizontal

bars proportional in length to the respective frequency counts.

The resulting bar chart is, of course, nothing but the rotated

mirror image of a standard histogram.

The American Statistician, February 2005, Vol. 59, No. 1 35

Figure 6. Numerical summaries of p185

HER−2

concentration for paired partitions of subgroups of patients and control subjects (source: Holzer

et al. 2005). Also displayed are simultaneous conﬁdence intervals based on the Bonferroni principle (Bickel and Doksum 2000). Generating tool:

S-Plus 6 (2001).

The tabulation made explicit in Figure 5 is an essential step in

the construction of a histogram. This frequency tabulation, when

ﬁrst proposed more than 300 years ago by John Graunt (see, e.g.,

Tapia and Thompson 1978), was duly acknowledged as an im-

portant and original scientiﬁc accomplishment. In a standard

histogram this step is implicit with the result that, in my expe-

rience, most readers—even those who have received statistical

training in the form of an introductory statistics course—have

difﬁcultiesinterpreting them. On the other hand, fewpeople have

difﬁculties interpreting the table in Figure 5 and subsequently

interpreting the accompanying bar chart.

The ﬁnal example shows how tables that present statistical

results can be graphically enhanced. Summaries resulting from

statistical models are typically presented as tables containing

parameter estimates and associated standard errors. The intent

of such a presentation is often to invite a comparison of selected

parameters to determine whether there is evidence that they are

different.

A simple and robust method for the comparison of pairs of

parameters is based on an inspection of the corresponding conﬁ-

dence intervals. If these intervals do not overlap it may be taken

as evidence that the underlying parameters are indeed different.

Hence there is value in adding conﬁdence intervals to the dis-

play. Adding the conﬁdence intervals as two additional numeric

columns, however, results in an imperfect display. Because a

comparison of conﬁdence intervals requires a comparison of the

upper endpoint of one with the lower endpoint of the other in-

terval, a comparison of numbers in different columns would be

required. A graphical addition to the base table provides a useful

enhancement of the numerical display.

Figure 6 shows a table that includes the number of cases, the

means and the standard errors for the blood serum levels of a

cancer marker for subgroups of a sample comprised of patients

and control individuals. In this table I used different font size

and different line spacing to structure the row presentation. In

the presentation of the numerical table entries I also broke the

horizontal alignment. The vertical offset of the numbers within

each row improves item identiﬁcation by making it easier to

determine where one number ends and the next number starts.

The distinguishing feature of this table is the addition of si-

multaneous conﬁdence intervals based on the Bonferroni prin-

ciple. The joint coverage probability of these intervals exceeds

99%. As can be seen, these intervals all overlap and therefore

do not provide evidence that these means differ from each other.

(Denote by U

k

,k =1,...,Kconﬁdence intervals whose prob-

ability of jointly covering corresponding parameters p

k

equals

or exceeds 1 − α. Note, that if the individual coverage probabil-

ity of each interval is at least 1 − α/K, the Bonferroni principle

guarantees that the joint coverage probability equals or exceeds

1 − α. The decision rule “Do not reject H

0

: p

1

= ··· = p

K

if the U

k

contain at least one common point; otherwise reject”

deﬁnes a level α test.)

Here the table and the graphical addition effectively comple-

ment each other. On the one hand, the graphical display helps

make the desired comparison. On the other hand, the table read-

ily displays all the information a reader would need should she

choose to change the conﬁdence level and with it the conﬁdence

intervals.

5. DYNAMIC TABLES

As the volume and the complexity of data increase, a single

simple table will usually no longer do the data justice. One could

attempt to deal with the problem by devising some appropriately

complex multiway layout. Such an approach may, however, de-

feat its intended purpose by resulting in a data display that lacks

36 General

Figure 7. Example of an OLAP cube with three dimensions and two variables for each dimension. The contents of the cells may be measurements

such as inventory, sales or revenues.

the self-explanatory immediacy of a simple table. An alternative

tactic consists of providing an environment that permits an an-

alyst to create and organize a multitude of simple tables in real

time. Through selection and modiﬁcation, the analyst can then

guide herself to a small set of tables that capture and highlight

interesting structure in the data.

Such a proposal leads to interesting questions for statistical

methodological research. Fortunately the implementation of this

proposal does not have to await the outcome of such research be-

cause systems that are viable for analytical practice are already

available. Within decision-support software, there are programs

developed around the concept of OLAP (on-line application pro-

gramming). From the user’s point of view the three essential el-

ements of OLAP are its structuring of data, a user interface that

exploits this structure, and the speed with which the data can be

accessed and manipulated.

In OLAP, data—usually summary measurements—are orga-

nized in a multiway layout along dimensions where each dimen-

sion consists of hierarchically structured or nested categorical

variables, each with a distinct set of levels (Shoshani 2003).

Figure 7 provides an example, again from magazine publishing,

Figure 8a. Commercial OLAP interface (Powerplay by Cognos 2003)

to a cube that captures magazine unit sales broken down by the variables

associated with the four dimensions “Class of Trade,” “Time,” “Publisher,”

and “Edit.”

that illustrates the concept. The measurement could be inventory,

unit sales, revenues, or proﬁts for a particular magazine during

a ﬁscal quarter in a particular retail chain. These measurements

are organized along the dimensions “Class of Trade,” “Time,”

and “Editorial Category.” In our example, each dimension has

two variables. In general, there can be more than two variables

per dimension, and there can be more than three dimensions. In

OLAP, data organized in this manner are, for obvious reasons,

referred to as a “cube.”

Through a dedicated interface, the analyst interacts with a

cube by selecting subranges and summarizing over subranges

to generate subcubes. The analytical objective is to generate

informative subcubes of low dimensionality. Preferred low-

dimensional subcubes are tables.

In an application like the one described in Figure 7 the objec-

tive might be to track changes in magazine sales over a particular

time period for a particular class of trade and, if changes occur,

to identify the sources of these changes. The analyst often starts

with a high level summary view and, by selecting and ﬁltering,

“drills down” to interesting and revealing views.

The analyst interacts with the data with the help of a user

interface that is designed around the data structure. Figure 8(a)

shows an example of a Web-enabled user interface from a com-

mercial OLAP tool (PowerPlay by Cognos, 2003). A table is

created using three control buttons for choosing the measure

one wishes to display, as well as the horizontal and vertical vari-

ables by which the measure is broken down. By clicking on one

of these control buttons the user is presented with a menu from

which to choose the dimension, as well as the variable of each

dimension. In addition, each dimension has a control button that

allows ﬁltering by the levels of each associated variable.

Working off a cube with a structure similar to the one shown

in Figure 7 and starting with the view shown in Figure 8(a) I

decided to investigate what over the course of 2002 happened in

bookstores. I ﬁltered on the “Time” dimension to select the year

2002 and the “Class of Trade” dimension to select Bookstores.

I chose editorial categories and quarters as the horizontal and

vertical variables to produce the sale table shown in Figure 8(b).

The American Statistician, February 2005, Vol. 59, No. 1 37

Figure 8b. Unit sales in 2002 in Bookstores broken down by editorial

categories and quarters.

It is noteworthy to emphasize the speed with which this activity

could be performed; it took less than a minute.

A quick inspection of this table reveals a marked drop in unit

sales during the second quarter. At this point one may wish to

investigate whether this sales fall-off can be traced to speciﬁc

sources such as particular editorial categories. This is easily done

with some minor adjustments to the sale table. Figure 8(c) shows

the sale table with the rows ranked by annual sales volume and

the raw sales numbers replaced by each column’s row percent-

age, which is nothing other than each editorial category’s market

share. Because these shares did not change much across these

four quarters, we can conclude that all editorial categories ex-

perienced comparable sales losses during the second quarter.

The analysts may now wish to investigate whether the ob-

served sales drop was a singular event or a reﬂection of inherent

seasonality. Displaying quarterly data for more than one year

helps address this question. Alternatively, the analyst might want

to investigate whether the sales fall-off can be traced to speciﬁc

stores. If the underlying cube contains store-level detail she can

easily drill down to perform the required store level analysis.

Again, with a suitable OLAP interface such as the one shown

in these examples the necessary comparisons can be easily per-

formed in real-time.

The OLAP interface example shown here is just one commer-

cially available example. Several of the established statistical

software vendors now offer OLAP add-on functionality to their

statistical offerings. Also, the ubiquitous Microsoft Excel with

its Pivot Table provides OLAP functionality.

The criteria for designing a good OLAP interface are similar

to the ones laid out by Koschat and Swayne (1996) in a slightly

different context. The interface should include a visual represen-

Figure 8c. Market share for each editorial category by quarter.

tation of every dimension and every variable so that variables

and levels are easily available for selection, and so that feedback

is provided to the analyst about the current presentation of data.

On the other hand, the interface should not be cluttered with

irrelevant information. Additionally, if analysts go down ana-

lytical dead ends, it should always be easy for them to retrace

their steps (Norman 1988). Overall, the interface design should

strive for the principles of “direct manipulation” (Schneiderman

1998), which give the analyst a sensation of interacting directly

with the data rather than with a computer or a system.

6. DISCUSSION

At the outset, I do not wish to suggest that the mere display of

data is a substitute for a comprehensive statistical analysis, which

includes statistical inference. Also, I do not claim that a table

is always the preferred display of data. I would like to argue,

however, that tables should receive more attention within the

statistical discipline. In particular, tables can and should receive

more thoughtful consideration in statistical practice, statistical

education, and statistical research.

It would appear that not all is well in the practice of displaying

information. Figure 4 (p. 35) provides an interesting commen-

tary on the type of data displays that the readership of a broad

cross-section of business publications is exposed to. These read-

ers tend to be well educated and should comprise a fairly sophis-

ticated audience. Starting with graphs, their seeming abundance

is misleading because most graphs tend to be time series plots

and time series bar charts. This is a rather modest subset of the

options available for a meaningful graphical display of informa-

tion. About the only positive in Figure 4 is the observation that

nowadays the number of pie charts used by all publications is

mercifully small.

Even though the count of tables is uniformly smaller than

that of graphs, the amount of space devoted to the display of

tables is larger than that devoted to the display of graphs. The

quality of the table displays varies. Not surprisingly, tables that

appear as a regular feature in most issues of a magazine such as

stock tables are usually carefully laid out while one-of-a-kind

tables are often hard to parse with plenty of room for improve-

ment. Overall, it seems that the everyday practice of data display

lags the lofty expectations established by a quarter of a century

worth of systematic research on displaying and communicating

information.

Information displays devised for what might be expected to

be a more knowledgeable audience—namely statisticians—do

not necessarily fare better either. Gelman, Pasorica, and Dodhia

(2002) point out that the principal means of displaying scientiﬁc

ﬁndings in statistical research journals are often-perfunctory ta-

bles, and the authors suggest graphical displays as an alternative.

It is tempting to blame statistics education for this less-than-

perfect state of affairs, and it is hard to see how one can resist this

temptation. I recently had the opportunity to review 11 popular

texts used in introductory business statistics courses for their

treatment of the display of information. These texts spent be-

tween 1.2% and 8.9% of their pages on this topic. Most texts

simply conﬁned themselves to introducing the standard graphi-

cal displays. Only two of the texts actually discussed principles

of good graphical design. These two texts also explicitly ac-

knowledged tables as a means of communicating information

38 General

but did not provide any guidance on how to construct infor-

mative tables. A third text had a short but serviceable section

on Pivot Tables. This state of affairs, in my opinion, calls for

improvement.

It is argued that graphs and certainly tables are simple con-

structs akin to hammers and chisels that do not warrant the same

amount of time as the more complicated tools of inference which

are, after all, also part of the curriculum. It is certainly true that

the principle underlying the design of graphs and tables (like that

of hammers and chisels) is quickly communicated. The proper

use of these simple tools on the other hand may well require

dedicated and sustained training. Signiﬁcant beneﬁts could be

reaped if in introductory statistics courses more time were spent

on the proper design and use of graphs and tables. It is too much

to hope that such a focus would yield the next John Tukey, Ed-

ward Tufte, or Howard Wainer (or Leonardo, for that matter) but

it stands to reason that it would improve the quality of data dis-

plays that are routinely generated in statistical practice, and that

it would beneﬁcially raise expectations regarding a meaningful

display of information.

I suspect that, for obvious reasons, academic educators grav-

itate in their choice of course topics toward those that readily

tie into active research. It therefore needs to be emphasized that

dynamic tables are the focus of active research. Though OLAP

is not the outgrowth of traditional statistical research, it is a sub-

ject of active research that readily connects with the statistical

research tradition. This connection is the result of OLAP’s phi-

losophy of data access, as well as the structure of the underlying

data.

Regarding the philosophy of data access and presentation,

there is a well-established and successful precedent in statistical

graphics of representing complex data by using simple displays

that can be easily manipulated. The philosophy driving this re-

search is captured in the visualization programs XGobi (Swayne,

Buja, and Cook 1998) and GGobi (Swayne, Temple Lang, Buja,

and Cook 2003). In these programs the user generates through

a user-friendly and intuitive GUI in real time multiple simple

views, such as dot plots, scatterplots and time series plots, that

represent different views of complex, high-dimensional data.

The information contained in these simple views is enhanced by

functions such as linking and painting, which allow the simple

displays to jointly represent a more complex whole. In addition,

the programs include search utilities based on concepts such as

the Grand Tour (Asimov 1985) and Projection Pursuit (Cook,

Buja, Cabrera, and Hurley 1995) which help steer the analyst

towards views that could be interesting in the speciﬁc analytical

context.

While these programs are concerned with graphs, an OLAP

tool deals with tables. Other than that, there is a remarkable

commonality between these utilities. It would seem obvious that

many of the thoughts and insights that informed the development

of dynamic graphics could be beneﬁcially applied to the develop-

ment of OLAP. For example, it should be an interesting research

proposal to devise search utilities similar to the Grand Tour or

Projection Pursuit that help an analyst steer toward interesting

and meaningful tables.

OLAP has a second point of contact with statistical research.

Linear models (e.g., Rao and Toutenburg 1999) provided a rich

platform for inference in the analysis of multiway layouts, the

underlying data structure of OLAP. This holds out the prospect

for utilities that, bundled with OLAP, may help reconcile an

analyst’s belief that there is structure in a particular view of the

data with formal statistical inference. This reconciliation has

traditionally been a challenge for statistical graphical analysis.

Thus OLAP provides not only an interface to data but it has the

potential of supporting a broad array of tools for interfacing with

information. This ability to interface with information quickly

and efﬁciently and to produce views of the data in a format that

is easily understood will be critical in the years to come. In most

organizations the amount of data generated grows at a rapid clip

while the number of analysts dealing with this data ﬂow tends

to stay constant. Analysts and statisticians will need to become

more efﬁcient. Concepts such as OLAP will be critical in this

effort and it stands to reason that OLAP should therefore be an

integral part of a statistics curriculum.

[Received March 2004. Revised October 2004.]

REFERENCES

Asimov, D. (1985), “The Grand Tour: A Tool for Viewing Multidimensional

Data,” SIAM Journal of Scientiﬁc and Statistical Computing, 6, 128–143.

Bickel, P. J., and Doksum K. A. (2000), Mathematical Statistics: Basic Ideas

and Selected Topics I, Upper Saddle River, NJ: Pearson Education.

Bigwood, S., Spore, M., and Seely, J. (2003), Presenting Numbers, Tables and

Charts, Oxford, UK: Oxford University Press.

Box, G. E. P. (1988), “Signal-to-Noise-Ratios, Performance Criteria, and Trans-

formations,” Technometrics, 30, 1–17.

Chambers, J., Cleveland, W. S., Kleiner, B., and Tukey, P. (1983), Graphical

Methods for Data Analysis, Belmont, CA: Wadsworth.

Cleveland W. S. (1994), The Elements of Graphing Data, Summit, NJ: Hobart

Press.

Cook, D., Buja, A., Cabrera, J., and Hurley, C. (1995), “Grand Tour and Projec-

tion Pursuit,” Journal of Computational and Graphical Statistics, 4, 155–172.

Ehrenberg, A. S. C. (1981), “The Problem of Numeracy,” The American Statis-

tician, 35, 67–71.

(1986), “Reading a Table: An Example,” Applied Statistics, 35, 237–

244.

Gelman, A., Pasarica, C., and Dodhia, R. (2002), “Let’s Practice What We

Preach: Turning Tables into Graphs,” The American Statistician, 56, 121–

131.

Harris, R. L. (2000), Information Graphics: A Comprehensive Illustrated Ref-

erence, Oxford, UK: Oxford University Press.

Holzer, G., Pfandlsteiner, T., Koschat, M. A., Blahovec, H., Trieb, K., and Kotz,

R. (2005), “Soluble p185

Her−2

in Malignant Bone Tumors,” Pediatric Blood

and Cancer, 44, 163–166.

Koschat, M. A., and Swayne, D.F. (1996), “Interactive Graphical Methods in

the Analysis of Customer Panel Data” (with discussion), Journal of Business

and Economic Statistics, 14, 113–132.

Miller, G. A. (1956), “The Magical Number Seven, Plus or Minus Two: Some

Limits on Our Capacity for Processing Information,” The Psychological Re-

view, 62, 81–97.

Mosteller, F., and Tukey J. W. (1977), Data Analysis and Regression, Reading,

MA: Addison-Wesley.

Norman, D. A. (1988), The Psychology of Everyday Things, New York, NY:

Basic Books.

Powerplay by Cognos (2003), Palo Alto, CA: Hewlett-Packard Corporation.

Rao, C. R., and Toutenburg, H. (1999), Linear Models: Least Squares and Al-

ternatives, New York: Springer.

S-Plus 6.0 (1988–2001), Seattle, WA: Insightful Corporation.

Schneiderman, B. (1998), Designing the User Interface: Strategies for Effective

Human-Computer Interaction (3rd ed.), Boston, MA: Addison-Wesley.

Shoshani, A. (2003), “Multidimensionality in Statistical, OLAP and Scientiﬁc

Databases,” in Multidimensional Databases: Problems and Solutions, ed. M.

Rafanelli, Hershey, PA: Idea Group Publishing.

The American Statistician, February 2005, Vol. 59, No. 1 39

Shriver, K. A. (1997), Dynamics in Document Design: Creating Texts for Read-

ers, New York: Wiley.

Swayne, D. F., Cook, D., Buja, A. (1998), “XGobi: Interactive Dynamic Graph-

ics in the X WindowSystem,” JournalofComputationaland Graphical Statis-

tics, 7, 113–130.

Swayne, D. F., Temple Lang, D., Buja, A., and Cook, D. (2003), “GGobi: Evolv-

ing from XGobi into an Extensible Framework for Interactive Data Visual-

ization,” Computational Statistics and Data Analysis, 43, 423–444.

Tapia, R. A., and Thompson, J. R. (1978), Nonparametric Probability Density

Estimation, Baltimore: The Johns Hopkins University Press.

Tufte, E. R. (1983), The Visual Display of Quantitative Information, Cheshire,

CT: Graphics Press.

(1990), Envisioning Information, Cheshire, CT: Graphics Press.

(1997), Visual Explanations, Cheshire, CT: Graphics Press.

(2003), The Cognitive Style of PowerPoint, Cheshire, CT: Graphics

Press.

Tukey, J. W. (1977), Exploratory Data Analysis, Reading, MA: Addison-Wesley.

Walker, H. M., and Durost, W. N. (1936), Statistical Tables: Their Structure and

Use, New York, NY: Bureau of Publications, Teachers College, Columbia

University.

Wainer, H. (1992), “Understanding Graphs and Tables,” Educational Re-

searcher, 21, 14–23.

(1993), “Tabular Presentation,” Chance, 6, 52–56.

(1997a), “Improving Tabular Displays, with NAEP Tables as Examples

and Inspirations,” Journal of Educational and Behavioral Statistics, 22, 1–30.

(1997b), Visual Revelations: Graphical Tales of Fate and Deception

from Napoleon Bonaparte to Ross Perot, New York: Springer.

(1998), “Rounding Tables,” Chance, 11, 46–50.

Wheildon, C. (1990), Communicating or Just Making Pretty Shapes—A Study

of the Validity—or Otherwise—of Some Elements of Typographic Design (3rd

ed.), North Sydney, Australia: Newspaper Advertising Bureau of Australia.

40 General