Content uploaded by Jason W Osborne

Author content

All content in this area was uploaded by Jason W Osborne on Apr 30, 2014

Content may be subject to copyright.

Osborne, J. W., & Overbay, A. (2004). The power of outliers (and why Copyright © 2004.

researchers should ALWAYS check for them). Practical Assessment, Research, All rights reserved

and Evaluation, 9(6) Online at http://pareonline.net/getvn.asp?v=9&n=6

The Power of Outliers (and Why Researchers Should

Always Check for Them)

Jason W. Osborne and Amy Overbay

North Carolina State University

There has been much debate in the literature regarding what to do with extreme or influential data points.

The goal of this paper is to summarize the various potential causes of extreme scores in a data set (e.g., data

recording or entry errors, motivated mis-reporting, sampling errors, and legitimate sampling), how to detect

them, and whether they should be removed or not. Another goal of this paper was to explore how

significantly a small proportion of outliers can affect even simple analyses. The examples show a strong

beneficial effect of removal of extreme scores. Accuracy tended to increase significantly and substantially,

and errors of inference tended to drop significantly and substantially once extreme scores were removed.

The presence of outliers can lead to inflated error

rates and substantial distortions of parameter and

statistic estimates when using either parametric or

nonparametric tests (e.g., Zimmerman, 1994, 1995,

1998). Casual observation of the literature suggests

that researchers rarely report checking for outliers of

any sort. This inference is supported empirically by

Osborne, Christiansen, and Gunter (2001), who found

that authors reported testing assumptions of the

statistical procedure(s) used in their studies--including

checking for the presence of outliers--only 8% of the

time. Given what we know of the importance of

assumptions to accuracy of estimates and error rates,

this in itself is alarming. There is no reason to believe

that the situation is different in other social science

disciplines.

What are Outliers and Fringeliers and why

do we care about them?

Although definitions vary, an outlier is generally

considered to be a data point that is far outside the norm

for a variable or population (e.g., Jarrell, 1994;

Rasm

ussen, 1988; Stevens, 1984). Hawkins described

an outlier as an observation that “deviates so much

from other observations as to arouse suspicions that it

Author notes: We would like to thank Amy Holt and Shannon

Wildman for their assistance with various aspects of data analysis.

Parts of this manuscript were developed while the first author was on

the faculty of the University of Oklahoma. Correspondence relating

to this manuscript should be addressed to Jason W. Osborne via

email at jason_osborne@ncsu.edu.

was generated by a different mechanism” (Hawkins,

1980, p.1). Outliers have also been defined as values

that are “dubious in the eyes of the researcher” (Dixon,

1950, p. 488) and contaminants (Wainer, 1976).

Wainer (1976) also introduced the concept of the

“fringelier,” referring to “unusual events which occur

more often than seldom” (p. 286). These points lie near

three standard deviations from the mean and hence may

have a disproportionately strong influence on parameter

estimates, yet are not as obvious or easily identified as

ordinary outliers due to their relative proximity to the

distribution center. As fringeliers are a special case of

outlier, for much of the rest of the paper we will use the

generic term “outlier” to refer to any single data point

of dubious origin or disproportionate influence.

Outliers can have deleterious effects on statistical

analyses. First, they generally serve to increase error

variance and reduce the power of statistical tests.

Second, if non-randomly distributed they can decrease

normality (and in multivariate analyses, violate

assumptions of sphericity and multivariate normality),

altering the odds of making both Type I and Type II

errors. Third, they can seriously bias or influence

estimates that may be of substantive interest (for more

information on these issues, see Rasmussen, 1988;

Schwager & Margolin, 1982; Zimmerman, 1994).

Screening data for univariate, bivariate, and

multivariate outliers is simple in these days of

ubiquitous computing. The consequences of not doing

so can be substantial.

1

Osborne, J. W., & Overbay, A. (2004). The power of outliers (and why Copyright © 2004.

researchers should ALWAYS check for them). Practical Assessment, Research, All rights reserved

and Evaluation, 9(6) Online at http://pareonline.net/getvn.asp?v=9&n=6

What causes outliers and what should we do

about them?

Outliers can arise from several different

mechanisms or causes. Anscombe (1960) sorts outliers

into two major categories: those arising from errors in

the data, and those arising from the inherent variability

of the data. Not all outliers are illegitimate

contaminants, and not all illegitimate scores show up as

outliers (Barnett & Lewis, 1994). It is therefore

important to consider the range of causes that may be

responsible for outliers in a given data set. What

should be done about an outlying data point is at least

partly a function of the inferred cause.

Outliers from data errors. Outliers are often

caused by human error, such as errors in data

collection, recording, or entry. Data from an interview

can be recorded incorrectly, or miskeyed upon data

entry. One survey the first author was involved with

(reported in Brewer, Nauenberg, & Osborne, 1998)

gathered data on nurses’ hourly wages, which at that

time averaged about $12.00 per hour with a standard

deviation of about $2.00. In our data set one nurse had

reported an hourly wage of $42,000.00. This figure

represented a data collection error (specifically, a

failure for the respondent to read the question

carefully). Errors of this nature can often be corrected

by returning to the original documents--or even the

subjects if necessary and possible--and entering the

correct value. In cases like that of the nurse who made

$42,000 per hour, another option is available--

recalculation or re-estimation of the correct answer.

We had used anonymous surveys, but because the

nature of the error was obvious, we were able to

convert this nurse’s salary to an hourly wage because

we knew how many hours per week she worked and

how many weeks per year she worked. Thus, if

sufficient information is available, recalculation is a

method of saving important data and eliminating an

obvious outlier. If outliers of this nature cannot be

corrected they should be eliminated as they do not

represent valid population data points.

Outliers from intentional or motivated mis-

reporting. There are times when participants

purposefully report incorrect data to experimenters or

surveyers. A participant may make a conscious effort

to sabotage the research (Huck, 2000), or may be

acting from other motives. Social desirability and self-

presentation motives can be powerful. This can also

happen for obvious reasons when data are sensitive

(e.g., teenagers under-reporting drug or alcohol use,

mis-reporting of sexual behavior). If all but a few

teens under-report a behavior (for example, the

frequency of sexual fantasies teenage males

experience…), the few honest responses might appear

to be outliers when in fact they are legitimate and valid

scores. Motivated over-reporting can occur when the

variable in question is socially desirable (e.g., income,

educational attainment, grades, study time, church

attendance, sexual experience).

Environmental conditions can motivate over-

reporting or mis-reporting, such as if an attractive

female researcher is interviewing male undergraduates

about attitudes on gender equality in marriage.

Depending on the details of the research, one of two

things can happen: inflation of all estimates, or

production of outliers. If all subjects respond the same

way, the distribution will shift upward, not generally

causing ouliers. However, if only a small subsample of

the group responds this way to the experimenter, or if

multiple researchers conduct interviews, then outliers

can be created.

Outliers from sampling error. Another cause of

outliers or fringeliers is sampling. It is possible that a

few members of a sample were inadvertently drawn

from a different population than the rest of the sample.

For example, in the previously described survey of

nurse salaries, RNs who had moved into hospital

administration were included in the database we

sampled from, although we were particularly interested

in floor nurses. In education, inadvertently sampling

academically gifted or mentally retarded students is a

possibility, and (depending on the goal of the study)

might provide undesirable outliers. These cases should

be removed as they do not reflect the target population.

Outliers from standardization failure. Outliers

can be caused by research methodology, particularly if

something anomalous happened during a particular

subject’s experience. One might argue that a study of

stress levels in schoolchildren around the country

might have found some significant outliers if it had

been conducted during the fall of 2001 and included

New York City schools. Researchers experience such

challenges all the time. Unusual phenomena such as

construction noise outside a research lab or an

experimenter feeling particularly grouchy, or even

events outside the context of the research lab, such as a

student protest, a rape or murder on campus,

observations in a classroom the day before a big

holiday recess, and so on can produce outliers. Faulty

or non-calibrated equipment is another common cause

of outliers. These data can be legitimately discarded if

the researchers are not interested in studying the

particular phenomenon in question (e.g., if I were not

interested in studying my subjects’ reactions to

construction noise outside the lab).

Outliers from faulty distributional assumptions.

Incorrect assumptions about the distribution of the data

can also lead to the presence of suspected outliers (e.g.,

Iglewicz & Hoaglin, 1993). Blood sugar levels,

disciplinary referrals, scores on classroom tests where

2

Osborne, J. W., & Overbay, A. (2004). The power of outliers (and why Copyright © 2004.

researchers should ALWAYS check for them). Practical Assessment, Research, All rights reserved

and Evaluation, 9(6) Online at http://pareonline.net/getvn.asp?v=9&n=6

students are well-prepared, and self-reports of low-

frequency behaviors (e.g., number of times a student

has been suspended or held back a grade) may give rise

to bimodal, skewed, asymptotic, or flat distributions,

depending upon the sampling design. Similarly, the

data may have a different structure than the researcher

originally assumed, and long or short-term trends may

affect the data in unanticipated ways. For example, a

study of college library usage rates during the month of

September may find outlying values at the beginning

and end of the month, with exceptionally low rates at

the beginning of the month when students have just

returned to campus or are on break for Labor Day

weekend (in the USA), and exceptionally high rates at

the end of the month, when mid-term exams have

begun. Depending upon the goal of the research, these

extreme values may or may not represent an aspect of

the inherent variability of the data, and may have a

legitimate place in the data set.

Outliers as legitimate cases sampled from the

correct population. Finally, it is possible that an

outlier can come from the population being sampled

legitimately through random chance. It is important to

note that sample size plays a role in the probability of

outlying values. Within a normally distributed

population, it is more probable that a given data point

will be drawn from the most densely concentrated area

of the distribution, rather than one of the tails (Evans,

1999; Sachs, 1982). As a researcher casts a wider net

and the data set becomes larger, the more the sample

resembles the population from which it was drawn, and

thus the likelihood of outlying values becomes greater.

In other words, there is only about a 1% chance

you will get an outlying data point from a normally-

distributed population; this means that, on average,

about 1% of your subjects should be 3 standard

deviations from the mean.

In the case that outliers occur as a function of the

inherent variability of the data, opinions differ widely

on what to do. Due to the deleterious effects on power,

accuracy, and error rates that outliers and fringeliers

can have, it might be desirable to use a transformation

or recoding/truncation strategy to both keep the

individual in the data set and at the same time

minimize the harm to statistical inference (for more on

transformations, see Osborne, 2002)

Outliers as potential focus of inquiry. We all

know that interesting research is often as much a matter

of serendipity as planning and inspiration. Outliers can

represent a nuisance, error, or legitimate data. They

can also be inspiration for inquiry. When researchers

in Africa discovered that some women were living with

HIV just fine for years and years, untreated, those rare

cases are outliers compared to most untreated women,

who die fairly rapidly. They could have been

discarded as noise or error, but instead they serve as

inspiration for inquiry: what makes these women

different or unique, and what can we learn from them?

In a study the first author was involved with, a teenager

reported 100 close friends. Is it possible? Yes. Is it

likely? Not generally, given any reasonable definition

of “close friends.” So this data point could represent

either motivated mis-reporting, an error of data

recording or entry (it wasn’t), a protocol error

reflecting a misunderstanding of the question, or

something more interesting. This extreme score might

shed light on an important principle or issue. Before

discarding outliers, researchers need to consider

whether those data contain valuable information that

may not necessarily relate to the intended study, but

has importance in a more global sense.

Identification of Outliers

There is as much controversy over what

constitutes an outlier as whether to remove them or not.

Simple rules of thumb (e.g., data points three or more

standard deviations from the mean) are good starting

points. Some researchers prefer visual inspection of

the data. Others (e.g., Lornez, 1987) argue that outlier

detection is merely a special case of the examination of

data for influential data points.

Simple rules such as z = 3 are simple and relatively

effective, although Miller (1991) and Van Selst and

Jolicoeur (1994) demonstrated that this procedure

(nonrecursive elimination of extreme scores) can

produce problems with certain distributions (e.g.,

highly skewed distributions characteristic of response

latency variables) particularly when the sample is

relatively small. To help researchers deal with this

issue, Van Selst and Jolicoeur (1994) present a table of

suggested cutoff scores for researchers to use with

varying sample sizes that will minimize these issues

with extremely non-normal distributions. We tend to

use a z = 3 guideline as an initial screening tool, and

depending on the results of that screening, examine the

data more closely and modify the outlier detection

strategy accordingly.

Bivariate and multivariate outliers are typically

measured using either an index of influence or

leverage, or distance. Popular indices include

Mahalanobis’ distance and Cook’s D are both

frequently used to calculate the leverage that specific

cases may exert on the predicted value of the

regression line (Newton & Rudestam, 1999).

Standardized or studentized residuals in regression can

also be useful, and often the z=3 rule works well for

residuals as well.

For ANOVA-type paradigms, most modern

statistical software will produce a range of statistics,

including standardized residuals. In ANOVA the

biggest issue after screening for univariate outliers is

3

Osborne, J. W., & Overbay, A. (2004). The power of outliers (and why Copyright © 2004.

researchers should ALWAYS check for them). Practical Assessment, Research, All rights reserved

and Evaluation, 9(6) Online at http://pareonline.net/getvn.asp?v=9&n=6

the issue of within-cell outliers, or the distance of an

individual from the subgroup. Standardized residuals

represent the distance from the sub-group, and thus are

effective in assisting analysts in examining data for

multivariate outliers. Tabachnick and Fidell (2000)

discuss data cleaning in the context of other analyses.

How to deal with outliers

There is a great deal of debate as to what to do

with identified outliers. A thorough review of the

various arguments is not possible here. We argue that

what to do depends in large part on why an outlier is in

the data in the first place. Where outliers are

illegitimately included in the data, it is only common

sense that those data points should be removed. (see

also Barnett & Lewis, 1994). Few should disagree

with that statement.

When the outlier is either a legitimate part of the

data or the cause is unclear, the issue becomes murkier.

Judd and McClelland (1989) make several strong

points for removal even in these cases in order to get

the most honest estimate of population parameters

possible (see also Barnett & Lewis, 1994). However,

not all researchers feel that way (see Orr, Sackett, &

DuBois, 1991). This is a case where researchers must

use their training, intuition, reasoned argument, and

thoughtful consideration in making decisions.

Keeping legitimate outliers and still not violating

your assumptions. One means of accommodating

outliers is the use of transformations (for a more

thorough discussion of best practices in using data

transformations, see Osborne, 2002). By using

transformations, extreme scores can be kept in the data

set, and the relative ranking of scores remains, yet the

skew and error variance present in the variable(s) can

be reduced (Hamilton, 1992).

However, transformations may not be appropriate

for the model being tested, or may affect its

interpretation in undesirable ways. Taking the log of a

variable makes a distribution less skewed, but it also

alters the relationship between the original variables in

the model. For example, if the raw scores originally

related to a meaningful scale, the transformed scores

can be difficult to interpret (Newton & Rudestam,

1999; Osborne 2002). Also problematic is the fact that

many commonly used transformations require non-

negative data, which limits their applications. For this

reason, many researchers turn to other methods to

accommodate outlying values.

One alternative to transformation is truncation,

wherein extreme scores are recoded to the highest (or

lowest) reasonable score. For example, a researcher

might decide that in reality, it is impossible for a

teenager to have more than 15 close friends. Thus, all

teens reporting more than this value (even 100) would

be re-coded to 15. Through truncation the relative

ordering of the data is maintained, and the highest or

lowest scores remain the highest or lowest scores, yet

the distributional problems are reduced.

Robust methods. Instead of transformations or

truncation, researchers sometimes use various “robust”

procedures to protect their data from being distorted by

the presence of outliers. These techniques

“accommodate the outliers at no serious

inconvenience—or are robust against the presence of

outliers” (Barnett & Lewis, 1994, p. 35). Certain

parameter estimates, especially the mean and Least

Squares estimations, are particularly vulnerable to

outliers, or have “low breakdown” values. For this

reason, researchers turn to robust or “high breakdown”

methods to provide alternative estimates for these

important aspects of the data.

A common robust estimation method for

univariate distributions involves the use of a trimmed

mean, which is calculated by temporarily eliminating

extreme observations at both ends of the sample

(Anscombe, 1960). Alternatively, researchers may

choose to compute a Windsorized mean, for which the

highest and lowest observations are temporarily

censored, and replaced with adjacent values from the

remaining data (Barnett & Lewis, 1994).

Assuming that the distribution of prediction errors

is close to normal, several common robust regression

techniques can help reduce the influence of outlying

data points. The least trimmed squares (LTS) and the

least median of squares (LMS) estimators are

conceptually similar to the trimmed mean, helping to

minimize the scatter of the prediction errors by

eliminating a specific percentage of the largest positive

and negative outliers (Rousseeuw & Leroy, 1987),

while Windsorized regression smoothes the Y-data by

replacing extreme residuals with the next closest value

in the dataset (Lane, 2002).

Many options exist for analysis of non-ideal

variables. In addition to the above-mentioned options,

analysts can choose from non-parametric analyses, as

these types of analyses have few if any distributional

assumptions, although research by Zimmerman and

others (e..g, Zimmerman, 1995) do point out that even

non-parametric analyses suffer from outlier cases.

The effects of outlier removal

The rest of this paper is devoted to a

demonstration of the effects of outliers and fringeliers

on the accuracy of parameter estimates, and Type I and

Type II error rates.

In order to simulate a real study where a researcher

samples from a particular population, we defined our

4

Osborne, J. W., & Overbay, A. (2004). The power of outliers (and why Copyright © 2004.

researchers should ALWAYS check for them). Practical Assessment, Research, All rights reserved

and Evaluation, 9(6) Online at http://pareonline.net/getvn.asp?v=9&n=6

Table 1

The effects of outliers on correlations

Population

r:

N:

Average

initial r

Average

cleaned r

t

% more

accurate

% errors

before

cleaning

% errors

after

cleaning

t

r = -.06 52 .01 -.08 2.5** 95% 78% 8% 13.40***

104 -.54 -.06 75.44*** 100% 100% 6% 39.38***

416 0 -.06 16.09*** 70% 0% 21% 5.13***

r = .46 52 .27 .52 8.1*** 89% 53% 0% 10.57***

104 .15 .50 26.78*** 90% 73% 0% 16.36***

416 .30 .50 54.77*** 95% 0% 0% --

Note: 100 samples were drawn for each row. Outliers were actual members of the population who scored at least z = 3 on the relevant variable.

With N = 52, a correlation of .274 is significant at p < .05. With N = 104, a correlation of .196 is significant at p < .05. With N = 416, a

correlation of .098 is significant at p < .05, twotailed. ** p < .01, *** p < .001.

population as the 23,396 subjects in the data file from

the National Education Longitudinal Study of 1988

produced by the National Center for Educational

Statistics with complete data on all variables of

interest. For the purposes of the analyses reported

below, this population was sorted into two groups:

“normal” individuals whose scores on relevant

variables was between z =-3 and z = 3, and “outliers,”

who scored at least z = 3 on one of the relevant

variables.

In order to simulate the normal process of

sampling from a population, but standardize the

proportion of outliers in each sample, one hundred

samples of N=50, N=100, and N=400 each were

randomly sampled (with replacement between each

sampling) from the population of “normal” subjects.

Then an additional 4% were randomly selected from

the separate pool of outliers bringing each sample to

N=52, N=104, or N=416, respectively. This

procedure produced samples that could easily have

been drawn at random from the full population.

The following variables were calculated for each

of the analyses below:

Accuracy was assessed by checking whether the

original or cleaned correlation was closer to the

population correlation. In these calculations the

absolute difference was examined.

Error rates were calculated by comparing the

outcome from a sample to the outcome from the

population. If a particular sample yielded a different

conclusion than was warranted by the population,

that was considered an error of inference.

The effect of outliers on correlations. The first

example looks at simple zero-order correlations. The

goal was to see the effect of outliers on two different

types of correlations: correlations close to zero (to

demonstrate the effects of outliers on Type I error

rates), and correlations that were moderately strong

(to demonstrate the effects of outliers on Type II

error rates). Toward this end, two different

correlations were identified for study in the NELS

data set: the correlation between locus of control and

family size (r = -.06), and the correlation between

composite achievement test scores and

socioeconomic status (r = .46). Variable distributions

were examined and found to be reasonably normal.

Correlations were then calculated in each

sample, both before removal of outliers and after.

For our purposes, r = -.06 was not significant at any

of the sample sizes, and r = .46 was significant at all

sample sizes. Thus, if a sample correlation led to a

decision that deviated from the “correct” state of

affairs, it was considered an error or inference.

As Table 1 demonstrates, outliers had adverse

effects upon correlations. In all cases, removal of the

outliers had significant effects upon the magnitude of

the correlations, and the cleaned correlations were

more accurate (i.e., closer to the known population

correlation) 70 - 100% of the time. Further, in most

cases the incidence of errors of inference was lower

with cleaned than uncleaned data.

The effect of outliers on t-tests and ANOVAs.

The second example deals with analyses that look at

group mean differences, such as t-tests and ANOVA.

For the purpose of simplicity, these analyses are

simple t-tests, but these results would generalize to

any ANOVA. For these analyses two different

conditions were examined: when there were no

significant differences between the groups in the

population (sex differences in socioeconomic status

(SES) produced a mean group difference of 0.0007

with a SD of 0.80 and with 24501 df produced a t of

0.29), and when there were significant group

differences in the population (sex differences in

mathematics achievement test scores produced a

mean difference of 4.06 and SD of 9.75 and 24501 df

produced a t of 10.69, p < .0001). For both variables

the effects of having outliers in only one cell as

5

Osborne, J. W., & Overbay, A. (2004). The power of outliers (and why Copyright © 2004.

researchers should ALWAYS check for them). Practical Assessment, Research, All rights reserved

and Evaluation, 9(6) Online at http://pareonline.net/getvn.asp?v=9&n=6

compared to both cells were examined. Distributions

for both dependent variables were examined and

found to be reasonably normal.

For these analyses, t-tests were calculated in

each sample, both before removal of outliers, and

after. For our purposes, t-tests looking at SES should

not produce significant group differences, whereas t-

tests looking at mathematics achievement test scores

should. Two different issues were examined: mean

group differences and the magnitude of the t. If an

analysis from a sample led to a different conclusion it

was considered an error.

The results in Table 2 illustrate the effects of

outliers on t-tests and ANOVAs. Removal of outliers

produced a significant change in the mean differences

between the two groups when the groups were equal

in the population, but tended not to when there were

strong group differences. Removal of outliers

produced significant change in the t statistics

primarily when there were strong group differences.

In both cases the tendency was for both group

differences and t statistics to become more accurate

in a majority of the samples. Interestingly, there was

little evidence that outliers produced Type I errors

when group means were equal, and thus removal had

little discernable effect. But when there were strong

group differences, outlier removal tended to have a

significant beneficial effect on error rates, although

not as substantial an effect as seen in the correlation

analyses.

The presence of outliers in one or both cells,

surprisingly, failed to produce any differential

effects. The expectation had been that the presence

of outliers in a single cell would increase the

incidence of Type I errors.

Why this effect was not shown could have to do

with the type of outliers in these analyses, or other

factors, such as the absolute equality of the two

groups on SES, which may not reflect the situation

most researchers face.

To remove, or not to remove?

Although some authors argue that removal of

extreme scores produces undesirable outcomes, they

are in the minority, especially when the outliers are

illegitimate. When the data points are suspected of

being legitimate, some authors (e.g., Orr, Sackett, &

DuBois, 1991) argue that data are more likely to be

representative of the population as a whole if outliers

are not removed.

Conceptually, there are strong arguments for

removal or alteration of outliers. The analyses

reported in this paper also empirically demonstrate

the benefits of outlier removal. Both correlations and

t-tests tended to show significant changes in statistics

as a function of removal of outliers, and in the

overwhelming majority of analyses accuracy of

estimates were enhanced. In most cases errors of

inference were significantly reduced, a prime

argument for screening and removal of outliers.

Although these were two fairly simple statistical

procedures, it is straightforward to argue that the

benefits of data cleaning extend to simple and

multiple regression, and to different types of

ANOVA procedures. There are other procedures

outside these, but the majority of social science

research utilizes one of these procedures. Other

research (e.g., Zimmerman, 1995) has dealt with the

effects of extreme scores in less commonly-used

procedures, such as nonparametric analyses.

References

Anscombe, F.J. (1960). Rejection of outliers.

Technometrics, 2, 123-147.

Barnett, V, & Lewis, T. (1994). Outliers in statistical data

(3

rd

ed.). New York: Wiley.

Brewer, C. S., Nauenberg, E., & Osborne, J. W. (1998,

June). Differences among hospital and non-hospital

RNs participation, satisfaction, and organizational

commitment in western New York. Paper presented at

the National meeting of the Association for Health

Service Research, Washington DC.

Dixon, W. J. (1950). Analysis of extreme values. Annals of

Mathematical Statistics, 21, 488-506.

Evans, V.P. (1999). Strategies for detecting outliers in

regression analysis: An introductory primer. In B.

Thompson (Ed.), Advances in social science

methodology: (Vol. 5, pp. 213-233). Stamford, CT.:

JAI Press.

Hamilton, L.C. (1992). Regressions with graphics: A

second course in applied statistics. Monterey, CA.:

Brooks/Cole.

Hawkins, D.M. (1980). Identification of outliers. London:

Chapman and Hall.

Huck, S.W. (2000). Reading statistics and research (3

rd

ed.). New York: Longman.

Iglewicz, B., & Hoaglin, D.C. (1993). How to detect and

handle outliers. Milwaukee, WI.: ASQC Quality

Press.

Jarrell, M. G. (1994). A comparison of two procedures, the

Mahalanobis Distance and the Andrews-Pregibon

Statistic, for identifying multivariate outliers.

Research in the schools, 1, 49-58.

Judd, C. M., & McClelland, G. H. (1989). Data analysis:

A model comparison approach. San Diego, CA.:

Harcourt Brace Jovanovich.

Lane, K. (2002, February). What is robust regression and

how do you do it? Paper presented at the Annual

Meeting of the Southwest Educational Research

Association, Austin, TX.

Lornez, F. O. (1987). Teaching about influence in simple

regression. Teaching Sociology, 15(2), 173-177.

6

Osborne, J. W., & Overbay, A. (2004). The power of outliers (and why Copyright © 2004.

researchers should ALWAYS check for them). Practical Assessment, Research, All rights reserved

and Evaluation, 9(6) Online at http://pareonline.net/getvn.asp?v=9&n=6

Miller, J. (1991). Reaction time analysis with outlier

exclusion: Bias varies with sample size. The quarterly

journal of experimental psychology, 43(4), 907-912.

Newton, R.R., & Rudestam, K.E. (1999). Your statistical

consultant: Answers to your data analysis questions.

Thousand Oaks, CA.: Sage.

Orr, J. M., Sackett, P. R., & DuBois, C. L. Z. (1991).

Outlier detection and treatment in I/O Psychology: A

survey of researcher beliefs and an empirical

illustration. Personnel Psychology, 44, 473-486.

Osborne, J. W. (2002). Notes on the use of data

transformations. Practical Assessment, Research, and

Evaluation., 8, Available online at

http://ericae.net/pare/getvn.asp?v=8&n=6.

Osborne, J. W., Christiansen, W. R. I., & Gunter, J. S.

(2001). Educational psychology from a statistician's

perspective: A review of the quantitative quality of

our field. Paper presented at the Annual Meeting of

the American Educational Research Association,

Seattle, WA.

Rasmussen, J. L. (1988). Evaluating outlier identification

tests: Mahalanobis D Squared and Comrey D.

Multivariate Behavioral Research, 23(2), 189-202.

Rousseeuw, P., & Leroy, A. (1987). Robust regression and

outlier detection. New York: Wiley.

Sachs, L. (1982). Applied statistics: A handbook of

techniques (2

nd

ed). New York: Springer-Verlag.

Schwager, S. J., & Margolin, B. H. (1982). Detection of

multivariate outliers. The annals of statistics, 10, 943-

954.

Stevens, J. P. (1984). Outliers and influential data points in

regression analysis. Psychological Bulletin, 95, 334-

344.

Tabachnick, B.G., & Fidell, L. S. (2000). Using

multivariate statistics, 4

th

edition. Pearson Allyn &

Bacon.

Van Selst, M., & Jolicoeur, P. (1994). A solution to the

effect of sample size on outlier elimination. The

quarterly journal of experimental psychology, 47(3),

631-650.

Wainer, H. (1976). Robust statistics: A survey and some

prescriptions. Journal of Educational Statistics, 1(4),

285-312.

Zimmerman, D. W. (1994). A note on the influence of

outliers on parametric and nonparametric tests.

Journal of General Psychology, 121(4), 391-401.

Zimmerman, D. W. (1995). Increasing the power of

nonparametric tests by detecting and downweighting

outliers. Journal of Experimental Education, 64(1),

71-78.

Zimmerman, D. W. (1998). Invalidation of parametric and

nonparametric statistical tests by concurrent violation

of two assumptions. Journal of Experimental

Education, 67(1), 55-68.

7

Osborne, J. W., & Overbay, A. (2004). The power of outliers (and why Copyright © 2004.

researchers should ALWAYS check for them). Practical Assessment, Research, All rights reserved

and Evaluation, 9(6) Online at http://pareonline.net/getvn.asp?v=9&n=6

Table 2

The effects of outliers on t-tests

N Initial mean

difference

Cleaned

mean

difference

t % more

accurate

mean

difference

Avg

Initial

t

Average

Cleaned

t

t % Type

I or II

errors

before

cleaning

% T

y

or

err

o

af

t

clea

n

OUTLIERS

Equal group means,

outliers in one cell

52

0.34

0.18

3.70***

66.0%

-0.20

-0.12

1.02

2.0%

1.

0

104 0.22 0.14 5.36*** 67.0% 0.05 -0.08 1.27 3.0% 3.

0

416 0.09 0.06 4.15*** 61.0% 0.14 0.05 0.98 2.0% 3.

0

Equal group means,

outliers in both cells

52

0.27

0.19

3.21***

53.0%

0.08

-0.02

1.15

2.0%

4.

0

104 0.20 0.14 3.98*** 54.0% 0.02 -0.07 0.93 3.0% 3.

0

416 0.15 0.11 2.28* 68.0% 0.26 0.09 2.14* 3.0% 2.

0

Unequal group

means, outliers in

one cell

52

4.72

4.25

1.64

52.0%

0.99

1.44

-4.70***

82.0%

7

2

104 4.11 4.03 0.42 57.0% 1.61 2.06 -2.78** 68.0% 4

5

416 4.11 4.21 -0.30 62.0% 2.98 3.91 -12.97*** 16.0% 0.

0

Unequal group

means, outliers in

both cells

52

4.51

4.09

1.67

56.0%

1.01

1.36

-4.57***

81.0%

7

5

104 4.15 4.08 0.36 51.0% 1.43 2.01 -7.44*** 71.0% 4

7

416 4.17 4.07 1.16 61.0% 3.06 4.12 -17.55*** 10.0% 0.

0

Note: 100 samples were drawn for each row. Outliers were actual members of the population who scored at least z = 3

on the relevant variable.

* p < .05, ** p < .01, *** p < .001

8