Page 1

A New ‘‘Logicle’’ Display Method Avoids

Deceptive Effects of Logarithmic Scaling for Low

Signals and Compensated Data

David R. Parks,1* Mario Roederer,2and Wayne A. Moore1

1Department of Genetics, Stanford University, Stanford, California 94305

2Vaccine Research Center, National Institutes of Health, Bethesda, Maryland 20892

Received 26 September 2005; Revision Received 18 January 2006; Accepted 20 January 2006

Background: In immunofluorescence

and most other flow cytometry applications, fluores-

cence signals of interest can range down to essentially

zero. After fluorescence compensation, some cell popu-

lations will have low means and include events with neg-

ative data values. Logarithmic presentation has been

very useful in providing informative displays of wide-ran-

ging flow cytometry data, but it fails to adequately dis-

play cell populations with low means and high variances

and, in particular, offers no way to include negative data

values. This has led to a great deal of difficulty in inter-

preting and understanding flow cytometry data, has

often resulted in incorrect delineation of cell popula-

tions, and has led many people to question the correct-

ness of compensation computations that were, in fact,

correct.

Results: We identified a set of criteria for creating data

visualization methods that accommodate the scaling diffi-

culties presented by flow cytometry data. On the basis of

these, we developed a new data visualization method that

provides important advantages over linear or logarithmic

scaling for display of flow cytometry data, a scaling we

measurements

refer to as ‘‘Logicle’’ scaling. Logicle functions represent a

particular generalization of the hyperbolic sine function

with one more adjustable parameter than linear or loga-

rithmic functions. Finally, we developed methods for

objectively and automatically selecting an appropriate

value for this parameter.

Conclusions: The Logicle display method provides more

complete, appropriate, and readily interpretable represen-

tations of data that includes populations with low-to-zero

means, including distributions resulting from fluorescence

compensation procedures, than can be produced using ei-

ther logarithmic or linear displays. The method includes a

specific algorithm for evaluating actual data distributions

and deriving parameters of the Logicle scaling function

appropriate for optimal display of that data. It is critical to

note that Logicle visualization does not change the data

values or the descriptive statistics computed from them.

q 2006 International Society for Analytical Cytology

Key terms: data display; flow cytometry; fluorescence

compensation; data scaling; data transformation

Practical experience has demonstrated that marker dis-

tributions measured by flow cytometry are often more-or-

less log-normal or are composed of mixtures of log-normal

distributions. Logarithmic data scales, which show log-

normal distributions as symmetrical peaks, are widely

used and accepted as those facilitating analysis of fluores-

cence measurements in biological systems (1).

On the other hand, cell populations with low mean,

high variance, and approximately normally distributed flu-

orescence values occur commonly in various kinds of flow

cytometry data. In particular, data values for cell popula-

tions that are essentially unstained or are negative for a

particular dye, after fluorescence compensation, should

be distributed more-or-less normally around a low value

representing the autofluorescence of the cells in that data

dimension. Data sets resulting from computed compensa-

tion commonly (and properly) include populations whose

distributions extend below zero. (When analog compensa-

tion is used, such distributions should also appear, but the

electronic implementations distort or truncate the distri-

butions, so that negative values are suppressed.)

Logarithmic displays, however, cannot accommodate

zero or negative values and often show a peak above the

actual mean or median of the population with a pileup of

events on the baseline (see Fig. 1). This effect has been

the source of considerable confusion and has been com-

monly referred to as the ‘‘log artifact.’’ Linear scaling is

more appropriate and more easily interpreted for display

*Correspondence to: David R. Parks, Stanford University, Beckman Cen-

ter B007, Stanford, CA 94305-5318, USA.

E-mail: drparks@stanford.edu

Published online7April 2006

interscience.wiley.com).

DOI: 10.1002/cyto.20258

inWiley InterScience(www.

q 2006 International Society for Analytical CytologyCytometry Part A 69A:541–551 (2006)

Page 2

of fluorescence compensated data on cell populations that

are low to negative for a particular dye.

Thus, there is a need for display scales that combine the

desirable attributes of the log scale for large real signals with

those of the linear scale for unstained and near-background

signals. The Logicle method presented here solves this pro-

blem by plotting data on axes that are asymptotically linear

in the region, around a data value of zero and asymptoti-

cally logarithmic at higher (positive and negative) values.

Figure 1 illustrates the utility of Logicle scaling in facili-

tating accurate interpretation of flow cytometry data. The

Logicle displays in the left panels show well-defined cell

populations in the gated regions. There are very few

events (well under 1%) on the baselines, and the medians

of events in each region (marked with crosses) are appro-

priately central to the visual data. In contrast, logarithmic

presentation of the same data sets (center and right

panels) makes the actually compact cell populations look

split into above-baseline and on-baseline ‘‘populations.’’ In

each data set, about 45% of the data events are on the

baselines (and in the dot displays almost invisible). The

medians of the populations are nowhere near their visual

centers. Logarithmic scaling, therefore, produces unintui-

tive data displays, and can lead to incorrect data evalua-

tions and attempts to define separate populations that are

not in reality separate. Additional benefits of Logicle data

display are discussed later in the Results section.

Background on Multicolor Fluorescence and

Compensation

In a flow cytometer, each fluorescence detector accepts

light from a particular laser excitation and in a particular

range of emission wavelengths optimized to detect a parti-

cular dye. However, each dye whose excitation is nonzero

at that laser wavelength and whose emission is not zero in

the detector’s emission band will contribute signal on that

detector. Therefore, although fluorescent dye combina-

tions used in flow cytometry are selected to minimize

spectral overlaps in multicolor measurements, each dye

will typically contribute signal on several detectors, and

each detector will receive some signal from several dyes.

For each cell in a biological analysis, we generally want

to separate the signal contributions from the different

dyes, so that an estimate of the amount of each fluorescent

FIG. 1. Two samples of mouse spleen cells stained for CD5 and IgD and gated in light scatter and other fluorescent markers for viable lymphocytes (upper

panels) or viable non-T-cells (lower panels). The measurements were made on a FACSAria and compensated and analyzed in FlowJo. The arrow in the

upper–center logarithmic display panel points out a data artifact resulting from applying the compensation calculation to log data, in which low or negative

data values have been truncated. This does not occur when the full data is retained.

542

PARKS ET AL.

Cytometry Part A DOI 10.1002/cyto.a

Page 3

reagent is obtained. The process of converting from fluo-

rescence color measurements to dye estimates is com-

monly called fluorescence compensation. Although the

technique was originally developed for analysis of two-

color single laser measurements (2), it is particularly criti-

cal in multicolor work. By evaluating the response of each

of the detectors to a series of compensation control sam-

ples, each of which is labeled with only one dye, we con-

struct a matrix of relative spectral overlaps. For each cell,

we multiply a set of detector color measurements by the

inverse of the spectral overlap matrix to obtain the corre-

sponding set of dye estimates for the cell. This calculation

is based on simple linear algebra, so any particular set of

color measurement values yields a specific set of dye esti-

mates. The estimated dye amounts are exactly those

whose total signal on each detector would yield the color

measurements actually observed.

Statistical Uncertainties in Dye Estimates

As is so often the case, this algebraic analysis is not com-

plete in the real world. The fundamental deviation comes

from the quantum nature of light and the finite amount of

light detected. Thus, the detected signal is subject to what

is commonly called counting statistics, governed by the

Poisson distribution. In practice, the limiting step is the

number of photoelectrons emitted at the cathode of the

photomultiplier tube. The standard deviation of actual

measurements in relation to their theoretical expectation

scales with the square root of the number of photoelec-

trons detected.

For cells with just autofluorescence or very low dye

levels, the effects of photon statistics, possible electronic

noise, and real differences in low-level fluorescence among

cells in a particular population often result in signal distri-

butions with low means and high relative variances. This

problem becomes magnified after fluorescence compensa-

tion, since the compensated value is subject to error contri-

butions from multiple measurements. This can readily lead

to a standard deviation of the dye estimate which is greater

than the mean for that estimate. This phenomenon has

been discussed and illustrated by Roederer (3). The end

result is that distributions of compensated dye estimates for

cells that are unstained by a given dye are often nearly nor-

mal and centered near zero, and may have large variances

compared to the corresponding distributions for totally

unstained cells. In particular, this process can properly

result in negative dye estimates for some cells even though,

of course, negative dye amounts are not possible. These

negative values must not be disregarded, since truncating

them will deform the data distributions and result in incor-

rect computation of signal means.

The overall result is that cell samples measured by flow

cytometry often contain cell populations whose signal dis-

tributions are appropriately represented in logarithmic

displays along with populations whose distributions can-

not be properly shown in a logarithmic display. Logicle

functions and methods were developed to provide unified

displays in which these different populations can all be

represented in a clear and intuitive way.

MATERIALS AND METHODS

Test Particles and Cell Samples

Spherotech Rainbow multidye particles (Spherotech,

Libertyville, IL) were used for the data in Figure 5. Reagent

capture beads carrying a monoclonal rat-anti-mouse j anti-

body and matched blank beads (both from BD Biosciences,

San Jose, CA), were used to produce the data in Figure 6.

All of these particles are about 3 lm in diameter.

Cell samples, used to generate the illustrations, are de-

scribed in the figure captions.

Instrumentation

Illustrative data were obtained using a FACSAria (BD

Biosciences, San Jose, CA), which employs linear digital

data acquisition with 14 bit sampling at 10 MHz rate. Area

signals are produced as sums, over a range of ?50–100

samples, and are presented as 18-bit linear values. Since

background subtraction is included in the evaluation, zero

and negative data values can occur. These are preserved in

the floating point FCS files.

Data Analysis and Modeling

Analysis of data from the flow cytometer was carried

out in FlowJo (Tree Star, Ashland, OR) version 4.3 or later.

This package includes the ability to import floating point

FCS files into Logicle scaling, so that negative data values

are retained. FlowJo provides support for computed fluo-

rescence compensation, including automatic selection of

appropriate Logicle scale functions for each compensated

data dimension. In producing Figure 2, the spectral matri-

ces used in computing fluorescence compensation were

edited externally and imported into FlowJo to generate il-

lustrative data distributions.

Modeling and plotting of Logicle functions and various

other functions for comparisons were carried out in

Microsoft Excel.

RESULTS

Criteria for a New Data Display Method

We developed the following criteria for defining a new

scaling function that would yield better displays for much

flow cytometry data than can be produced using tradi-

tional logarithmic or linear scaling.

? The display formula supports a family of functions

that can be optimized for viewing different data sets.

? The function becomes logarithmic for large data

values, to ensure a wide dynamic range and to pro-

vide good visualizations of the often log-normal distri-

butions, at high fluorescence intensities.

? The function becomes linear near zero, and extends to

negative data values and is symmetrical around zero,

providing near-linear visualization, appropriate for lin-

ear–normal distributions at low fluorescence intensities.

? The transition between the linear to logarithmic

regions is as smooth as possible, to avoid introducing

artifacts in the display.

543

‘‘LOGICLE’’ DATA DISPLAY METHOD

Cytometry Part A DOI 10.1002/cyto.a

Page 4

? As the linearization strength is increased to accommo-

date a wider range of linearized data values, the reason-

ably linear region of the data values grows faster than

the size of the linearized region in the display. Thus,

the user has a visual indication that a greater degree of

linearization is in use, but the display space is balanced

between more linear and more logarithmic regions.

Specification of Logicle Functions

By considering these criteria and examining the behavior

of a number of functions, we concluded that particular gen-

eralizations of the hyperbolic sine function (sinh), which

we came to call Logicle functions, can best meet the crite-

ria. The hyperbolic sine function itself has the desirable

properties of being essentially linear near zero, becoming

exponential for large values (leading to a logarithmic display

scale there), and making a very smooth transition between

these regions (i.e., it is continuous in all derivatives), but it

does not provide enough flexibility to meet the display

needs encountered in flow cytometry.1

The hyperbolic sine function itself is given as follows:

sinhðxÞ ¼ ðex? e?xÞ=2

ð1Þ

This can be generalized to what we call biexponential

functions,

Sðx;a;b;c;d;fÞ ¼ aebx? ce?dxþ f

ð2Þ

Interpreting the condition of maximal linearity around

data value zero to mean that the second derivative of the

function should be zero therein, we identified a subset of

biexponential functions with this property and call them

Logicle scaling functions.

Besides the constraint just specified, there are four fur-

ther choices that need to be made to fix the five parame-

ters in Eq. (2) (a, b, c, d, and f), and thereby define a speci-

fic display. How these choices appear in an actual Logicle

display is illustrated in Figure 2. The parameters described

later and in Figure 2 are not simply a, b, c, d, and f, but,

once specified, they uniquely determine the function in

Eq. (2). The first choice is the maximum data value in the

displayed scale (T). The second is the range of the display

in relation to the width of high data value decades (M or

m in decade or natural log formulations, respectively). If

this is held constant among plots optimized to different

data sets, the nearly logarithmic area at the upper end of

each display will be essentially the same, while the region

near data zero is adjusted to optimize for different data

sets. We have found that a total plot width of 4.5 ‘‘dec-

ades’’ is usually a good choice for displaying flow cytome-

try data.

The third choice is the strength and range of lineariza-

tion around zero (Wor w). The linear slope at zero (in, for

example, data units per pixel or data units per mm in a

printout) and the range of data values in the nearly linear

zone are determined by this selection. In displaying a par-

ticular data set, the linearized range must be adequate to

cover broad population distributions that do not display

well on log scales. This, in particular, is the selection that

is critical in matching displays to particular data sets and

in ensuring that the linearized zone covers the range of

statistical spread in the data. If the transition toward log

behavior occurs in too low data values, the artifacts seen

in logarithmic displays will not be suppressed.

The fourth choice is to specify the range of negative

values to be included in the display (which also defines

the position of the data zero in the plot). This range must

be great enough to avoid truncating populations of inter-

est. In practice, as shown in Figure 2, we find that it is de-

sirable to link the third and fourth choices as a single

1NOTE: In cytometry, we normally label ‘‘logarithmic’’ axes with values

from the corresponding exponential function, rather than with the loga-

rithm itself, e.g., decade labels like 10, 100, 1,000 and not 1, 2, and 3. The

Logicle functions defined in the equations given later are data value func-

tions. Their inverses provide Logicle display functions in the same way

that exponential scaling functions provide logarithmic data displays.

FIG. 2. How the Logicle parameters relate to the resulting Logicle scale and data display. M and W are expressed in decades, i.e., base 10 log units. Their

natural log forms are m 5 M ln(10) and w 5 W ln(10). The data curves are only for illustration here but are used and described in Figure 5. [Color figure

can be viewed in the online issue, which is available at www.interscience.wiley.com.]

544

PARKS ET AL.

Cytometry Part A DOI 10.1002/cyto.a

Page 5

value. This assures that the lowest negative data values in

view correspond to the approximate edge of the linear-

ized zone. As discussed earlier under Statistical Uncertain-

ties, negative values should occur only as a result of statis-

tical spreading, and, therefore, they should be displayed

within the near-linear zone.

Assuming that the top-of-scale value and the nominal

‘‘decade’’ width of the display have been selected, linking

the third and fourth choices results in a family of functions

with only one parameter to be adjusted to match the parti-

cular data set being displayed.

Using natural log units, an expression for the Logicle

scaling function that embodies all of the constraints and

choices described earlier is given as follows:

S(x;w) ¼ Te?ðm?wÞðex?w? p2e?ðx?wÞ=pþ p2? 1Þ

for x ? w

ð3Þ

In Eq. (3), T is the top of scale data value (e.g., 10,000 for

common 4 decade data or 262,144 for an 18 bit data

range).

w 5 2p ln(p)/(p 1 1) is the width of the negative data

range and the range of linearized data in natural log units.

p is introduced for compactness in presenting the Logicle

function, but p and w together represent a single adjusta-

ble parameter.

m is the breadth of the display in natural log units. For a

4.5 decade, display range m 5 4.5 ln(10) 5 10.36.

The display is defined for x in the range from 0 to m.

Negative data values appear in the space from x 5 0 to

x 5 w, and positive data values are plotted between x 5

w and x 5 m (where the top data value T occurs). The

form shown as Eq. (3) is for the positive data zone, where

x ? w. For the negative zone where x < w, we enforce

symmetry by computing the Logicle function for the cor-

responding positive value (w 2 x) and changing the sign.

The data zero at x 5 w is where the second derivative is

zero, i.e., the most linear area.

To select an appropriate value for w to generate a good

display for a particular data set, we obtain a reference value

marking the low end of the distribution to be displayed. As

described later, we typically select the data value at the fifth

percentile of all events that are below zero as this reference

value. Designating this (negative) value as ‘‘r,’’ and using its

absolute value abs(r), w is computed as follows:

w ¼ ðm ? lnðT=absðrÞÞÞ=2

ð4Þ

Equations (3) and (4) can be rewritten using base 10 rep-

resentation in order to express the parameters in terms of

‘‘decades’’ of signal level or display:

SðX;WÞ ¼ T ? 10?ðM?WÞð10X?W?p2? 10?ðX?WÞ=pþp2?1Þ

for X ? W

ð5Þ

In Eq. (5), W 5 2p log(p)/(p 1 1) is the width of the nega-

tive data range and the range of linearized data in ‘‘dec-

ades’’ and M is the breadth of the display in ‘‘decades.’’ For

a 4.5 decade, display range M 5 4.5.

We obtain W from the negative range reference value

‘‘r’’ as follows:

W ¼ ðM ? logðT=absðrÞÞÞ=2

ð6Þ

Figure 2 illustrates the relationship between these parame-

ters and the resulting Logicle display.

Specifying a logarithmic display requires two values cor-

responding to T and M, and the scaling near the upper

end of a Logicle plot approximates that of a logarithmic

display with the same values of T and M. The additional

linearization width, W, adapts the Logicle scale to the char-

acteristics of different data sets.

Logicle functions with different values of W are plotted

in Figure 3 along with the linear and exponential func-

tions that match them around data zero and at high data

values, respectively. Use of an exponential function for

scaling is what results in a logarithmic scale. Note that

each Logicle curve closely follows its matched linear func-

tion at low signal values, confirming good linearity in the

region around data zero. At middle signal values that vary

depending on the value of W, the Logicle functions depart

from linearity and move smoothly toward the exponential

line. At high signal levels, the Logicle curves become indis-

tinguishable from the exponential line.

Figure 4 shows a Logicle curve for W 5 1.0 and its

matched linear and exponential curves displayed with a lin-

ear signal level scale. The signal level scale is expanded

(top of scale is 300 rather than 10,000) to show in detail

the matching of the Logicle and linear curves at low signal

levels and the divergence of the Logicle curve at higher

levels and the beginning of its approach to the exponential

curve.

Strategy for Selecting the Width Parameter

As we have discussed, proper estimates of dye signals

using measurements on individual cells may be negative,

but actual negative dye amounts are impossible. There-

fore, any negative values present in the compensated

data must be due to purely statistical effects. This is true

despite the presence of essentially arbitrary positive stain-

ing distributions. Thus, for a population with near zero

mean and significant statistical spread, the most negative

values indicate the necessary range of the negative part

of the scale, and they also indicate the range of lineariza-

tion needed to ensure that the population will be dis-

played in a compact and unimodal form. The positive

part of the population is less helpful, since it may overlap

with other populations in the data set and may not pro-

vide a clear upper end with which to define a suitable

range for linearization.

A simple strategy of choosing the fifth percentile of the

negative data values to set this scale seems to work well

and combines adequate sensitivity to extreme values with

reasonable sampling stability. Using this strategy (the one

currently implemented in FlowJo and illustrated in Figure 2

545

‘‘LOGICLE’’ DATA DISPLAY METHOD

Cytometry Part A DOI 10.1002/cyto.a

Page 6

based on the leftmost of the four data distributions), the

visible negative data range extends somewhat below the

fifth percentile of negatives reference data value, so that

almost all the negative data (out to roughly 1.5 times the

negative reference data value) is actually seen in the plot.

In cases where no negative data values occur or the neg-

ative values are all close to zero, our experience indicates

that a minimal Logicle scale sufficient to linearize data in

the range of cell autofluorescence provides a more readily

interpreted view of the data than does a purely logarith-

mic scale.

In some data sets, there are few negative data values,

but some aberrant events yielding extreme negative values

also occur. In such cases, the fifth percentile of negatives

value may lead to a value of W, too high for optimal display

of the main data set. Gating out the unrepresentative nega-

tive data points and reapplying the automatic scale selec-

tion to the gated data cures this problem.

FIG. 3. Logicle, linear, and expo-

nential scaling functions. The Logi-

cle functions are plotted for W 5 0,

W 5 0.5, W 5 1.0, and W 5 1.5. The

display range covers 4.5 ‘‘decades,’’

and the signal level scale is logarith-

mic, so only the positive data values

can be represented. The black diago-

nal line in each panel is a pure expo-

nential, i.e., the scaling function for

a standard logarithmic display. The

green broken lines are pure linear

functions with zero crossings, and

slopes matched to the correspond-

ing Logicle curves (red). [Color fig-

ure can be viewed in the online

issue, which is available at www.

interscience.wiley.com.]

546

PARKS ET AL.

Cytometry Part A DOI 10.1002/cyto.a

Page 7

To achieve consistency in data display when analyzing

experiments that include a number of samples to be com-

pared, it is appropriate to fix the Logicle scale (for each

dimension) based on the most extreme sample present

(usually one with the maximum number of labels in use)

and use these fixed scales to analyze all similarly stained

samples in the experiment. The current implementation

in FlowJo bases the scale selection on a single user-speci-

fied (gated) data set. A simple and probably desirable vari-

ant of this method which has not yet been implemented

in user software would operate on a group of data sets

designated to be analyzed together. The Logicle width pa-

rameter would be evaluated for each dimension in each

data set, and the largest resulting width in each dimension

would be selected for the common displays. In general,

when there are multiple populations in a single sample or

multiple samples to be viewed on the same display scale,

the population or sample with the greatest negative

extent should drive the selection of W.

The method we have chosen for defining the negative

end of the display scale in relation to the linearization

width makes it possible to evaluate the appropriateness of

a particular scaling for a specific data set, by examining

the negative data region. If a substantial fraction of the

negative data values pile up at the low end of the scale,

the value of W is too low to properly display this data, and

a higher value of W should be used. If there is a lot of

empty negative data space below the lowest population of

interest, the linearized region around zero is more com-

pressed than necessary. The population will be properly

compact and unimodal, but it would be advantageous to

lower Wand obtain a more expanded view.

The Effective Dynamic Range of a Logicle Display

We can give a precise expression for the range of varia-

tion in scale across a Logicle plot in a form analogous to

the ‘‘dynamic range’’ of a logarithmic plot. An ordinary log-

arithmic scale is often characterized by the number of

‘‘decades’’, i.e., by the common logarithm of the ratio of

the maximum to the minimum data values. Clearly, with

Logicle scales that extend through zero, such a formula

cannot work. However, if we consider the variation in the

number of data units corresponding to a given width on

the display, we get a relevant and useful ratio correspond-

ing to the range of expansion or compression of the data

across the plot. Mathematically, this is the ratio of the

highest and lowest values of the slope or derivative of the

scale function within the plot. For an ordinary logarithmic

scale, this method yields exactly the same results as the

usual procedure, i.e., the common logarithm of this ratio

of slopes is the same as the number of decades, as defined

earlier. For a Logicle scale, the ratio of maximum to mini-

mum derivatives (at the top of scale and data zero, respec-

tively) varies as a function of the linearization width W.

Working from the expression in Eq. (3), the derivative is

given as follows:

S0ðX;WÞ ¼ Te?ðm?wÞðex?wþ pe?ðx?wÞ=pÞ

for x ? w

ð7Þ

The effective dynamic range discussed earlier is S0(m;w)/

S0(w;w), i.e., the ratio of derivatives at x 5 m and x 5 w.

For the Logicle curves illustrated in Figure 3 with M 5

4.5 decades, the effective dynamic ranges are 4.2, 3.5, 2.8,

and 2.1 decades for width values W 5 0.0, 0.5, 1.0, and

1.5, respectively. (The dynamic range of the logarithmic

plot with comparable scaling in the upper range would be

4.5 decades.)

Illustrations and Interpretation of Logicle Displays

Figure 5 shows a comparison of logarithmic and Logicle

displays of four signal level distributions that have differ-

ent means but the same real width of about 2,000 signal

level units. Note that the two higher level curves look

essentially the same in the two displays, since they occur

at signal levels where the Logicle scale is nearly logarith-

mic. However, the lowest curve is shown very differently

in the two graphs. In the Logicle plot, the mean data value

occurs at the visual center of the peak and very few data

events (less than 1%) fall at the low edge of the scale. In

contrast, the logarithmic display for this data set fails to

convey an accurate view of the data, in that the mean of

the data appears in a highly counter-intuitive location far

from the apparent peak of the plot. Also, of course, 49%

of very low and negative data values are piled up in an

uninterpretable spike at the left edge of the display. This

kind of behavior constitutes what may be referred to as a

‘‘log artifact’’ or, more colorfully, the ‘‘valley of death.’’ The

second curve from the bottom is intermediate in that it is

well represented in the Logicle display, but shows a mod-

erate amount of ‘‘log artifact’’ in the logarithmic display.

Figure 6 illustrates the value of Logicle displays for intui-

tive and accurate interpretation of fluorescence compen-

sated data and their particular value in the analysis of data

acquired in high resolution linear data systems. A mixture

FIG. 4. Logicle, linear, and exponential scaling functions. The same

Logicle function, shown in the W 5 1.0 panel of Figure, 3 is presented

with a linear signal level scale. To visualize the relationships between the

different functions clearly, the signal level is shown only from 2100 to

1300, and the display range is shown only from 0 to 3. [Color figure can

be viewed in the online issue, which is available at www.interscience.

wiley.com.]

547

‘‘LOGICLE’’ DATA DISPLAY METHOD

Cytometry Part A DOI 10.1002/cyto.a

Page 8

of unlabeled microspheres and antibody capture micro-

spheres (BD Biosciences) loaded with FITC antibody were

analyzed on a FACSAria cytometer (BD Biosciences),

which produces floating point data with values up to 218

or 262,144 and may include (background subtracted) data

values below zero. In the upper panels, uncompensated

data is shown in Logicle, 4-decade log, 5.5 decade log

pseudocolor dot display, and 5.5 decade log contour dis-

play. Computed compensation based partly on this sample

itself leads to the matching compensated data set shown

in the lower panels. In the uncompensated Logicle dis-

play, the population of unlabeled particles forms a com-

pact two-dimensional peak centered near zero and in-

cludes some negative values for events whose measured

signal was below the average background. The 4-decade

log display piles up all data values below 26 (5 262,144/

10,000) at 26, and the 5.5-decade displays pile up zero

and negative data values at 1. The arrow in the 5.5 decade

log pseudocolor dot display points out the distracting but

otherwise harmless ‘‘picket fencing’’ in the low region,

where display pixels are denser than actual data values.

In Logicle displays of compensation control samples, it

is easy to confirm that compensation is correct. The com-

pensated Logicle display (lower left) shows clearly that, as

expected for an FITC compensation control, the centers

of the distributions for unlabeled particles and FITC-

labeled particles match at a value near zero in the <PE-A>

dimension. It is obvious that the FITC high population has

greater spread in the <PE-A> dimension (as would be ex-

pected from the discussion earlier under Statistical Uncer-

tainties in Dye Estimates) and that the threshold amount

of real PE needed for identification of PE positive events

would be greater on the FITC high population than on the

FITC negative population. In the logarithmic displays of

the compensated data, the apparent center of the FITC

labeled population looks higher in the <PE-A> dimension

than does the center of the unlabeled population. This is

another manifestation of the ‘‘log artifact.’’ In fact, the PE

dimension medians of the two populations are equal.

Adjusting compensation by eye using logarithmic displays

is unlikely to lead to correct compensation.

An appropriate selection of the Logicle width parameter

assures that almost all data events will be displayed on

scale. Less than 1% of the events in the compensated Logi-

cle display in Figure 6 fall on the baselines. In the logarith-

mic displays, 45–80% of the events in the two populations

fall on the baselines where their frequencies and actual

measurement values cannot be interpreted visually. The

events piled up on the low margins of the logarithmic

color dot plots are almost invisible, while the pileup con-

tours on the margins of the logarithmic contour plot make

it look like that there may be separate populations there.

DISCUSSION

Additional Benefits of Logicle Methods

Full range Logicle displays of fluorescence compensated

data are useful in detecting errors, in avoiding erroneous

interpretations and in monitoring for quality control pur-

poses. Since negative data values should be generated

FIG. 5. Logarithmic and Logicle presentations of four data distributions whose means are different but whose real widths are the same. The distributions

were generated by applying different ‘‘compensation’’ amounts to a distribution for single test particles. The highest curve at about 30,000 units represents

zero compensation. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.]

548

PARKS ET AL.

Cytometry Part A DOI 10.1002/cyto.a

Page 9

purely by statistical processes producing more-or-less nor-

mal distributions, data distributions in the negative zone

should reflect this and not include peaks or other addi-

tional structure. Any such structure points to a problem in

the data itself or in the data processing which should be

corrected before proceeding with the analysis. In particu-

lar, errors in defining the compensation matrix or apply-

ing the wrong matrix for the data will frequently produce

clear visual artifacts in the negative data range.

Logicle coding could provide a compact way to store

and transfer high dynamic range data of the types appro-

priate for Logicle display while retaining appropriate reso-

lution over the whole data range. For example, recent

instruments from BD Biosciences produce data values

from 218down through zero to negative values, presented

as 32 bit real numbers. Logicle coding at 10–12 bits could

retain all the relevant resolution in most data acquired on

such instruments.

Alternative Approaches and Proposals

As described later, several possible data display methods

and functions occurred to us or have been suggested by

the work of others. In the course of investigating these

and evaluating their limitations, we framed the list of crite-

ria presented at the beginning of the Results section that

led us to define Logicle functions. None of the other meth-

ods fulfills these criteria, and no proposal for alternative

displays that we are aware of has adequately addressed

the issue of how to choose the scale parameter(s) opti-

mally to match particular data. Bagwell (4) discusses the

factors involved in making adequate choices among his

Hyperlog functions, but recommends generic scale choice

rather than optimization to particular data.

We considered the method of adding a constant to all

data values, thus making all or nearly all of the negative

values positive and then taking the logarithm, but found

that, while it mitigates the distortions of populations with

high variance and small mean that occur in logarithmic

displays, it still produces the ‘‘log artifact.’’ It also does not

have good linearity in the near zero region.

Another fairly obvious approach is to simply pick a tran-

sition point and use the logarithm for higher data values

and a linear scale for smaller values. If the splice is made

so that the resulting function is smooth, i.e., continuous

in the first derivative so as to minimize distortion of distri-

butions at this boundary, then the function is completely

determined by the choice of splice point. We found that

the derivative matching requirement in a linear-log splice

leads to functions with too little flexibility or adjustability

to meet the criteria we set. Splice functions that do not

match at least the first derivative at the splice would tend

to generate significant artifacts in the display.

One application area where functions close to linear

around zero and logarithmic for high data values have

been developed is in coding and compression of audio sig-

nals where the process is called ‘‘companding’’ (5). Such

audio is, of course, bipolar, so negative values must be

FIG. 6. Comparison of Logicle, 4-decade log, and 5.5-decade log displays for uncompensated and compensated versions of a single stain compensation control

sample. The sample consists of a mixture of unlabeled microspheres and reagent capture microspheres (BD Biosciences) loaded with FITC antibody. The arrow

in the third upper panel points out ‘‘picket fencing’’ in the 5.5-decade log display. The 80% and 64% on baselines designation includes events on the lower part

of either the horizontal or vertical axis. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.]

549

‘‘LOGICLE’’ DATA DISPLAY METHOD

Cytometry Part A DOI 10.1002/cyto.a

Page 10

handled, and human hearing has a more-or-less logarith-

mic response to high signal values, so recording such

values to high resolution is not important. There are two

versions in use. The American one is the same as the offset

log described earlier. The European version uses the log-

linear splice approach, which is also discussed earlier.

These techniques, as defined, are not flexible enough to

deal adequately with flow cytometry data.

Hyperbolic sine functions with zero offset and scale

adjustment, but without the generalization and constraints

that yield the Logicle formulation, have been used to pro-

vide a variance stabilizing transformation for microarray

expression level data (6,7,8). This transformation (called

‘‘glog’’ by Munson) was found to be useful in making valid

tests for significance of gene expression changes, but it

was not investigated as a method for data display and vis-

ual interpretation. When used for data display, this trans-

formation behaves similarly to the log-linear splice, and

does not provide the kind of adjustment needed to opti-

mize visual interpretability.

We also investigated and rejected an approach that com-

bines the linear and logarithmic properties by adding to-

gether a linear function, an exponential function and a

constant, and then using the inverse function as a scale.

This functional form has subsequently been promoted by

Bagwell using the name Hyperlog (4). In regions where

the exponential term is large, the linear term is essentially

irrelevant and, conversely, when the exponential term is

small, the linear term dominates. This turns out to closely

approximate the behavior of the Logicle functions, and a

version similar to any given Logicle function can be

obtained by replacing the e2xterm in the Logicle function

in Eq. (3) with a truncated power series expansion. The

expansion is given as follows:

e?x¼ 1 ? x þ x2=2! ? x3=3! þ ...;

using just the 1 2 x terms, we replace e2(x2w)/pwith 1 2

(x 2 w)/p in Eq. 3 and obtain

S1ðx;wÞ ¼ Te?ðm?wÞ

3 ðex?w? p2ð1 ? ðx ? wÞ=pÞ þ p2? 1Þ

S1ðx;wÞ ¼ Te?ðm?wÞðex?wþ pðx ? wÞ ? 1Þ

for x ? w

or

ð8Þ

In Bagwell’s presentation (4), the corresponding equation

(unnumbered) uses b for p and y for x 2 w. Figure 7 com-

pares this function with the corresponding exponential,

linear, and Logicle functions. At x 5 w, it has the same

data value of zero and the same slope as the correspond-

ing Logicle function. However, it does not fulfill our criter-

ion that the second derivative should be zero at the data

zero, so that near zero it departs from linearity more

quickly than does the corresponding Logicle function.

Also, at the high end, it approaches true log more slowly

than the corresponding Logicle function. Therefore, we

did not consider it further.

CONCLUSIONS

We have reached several conclusions regarding the

effects of Logicle display on the quality of data interpreta-

tion and accuracy of statistical results:

1. Logicle display per se has no effect on statistical re-

sults, since these are computed on the underlying

data—not on the position of displayed events in

plots.

2. Similarly, use of Logicle displays cannot change the

overlap(orlackthereof)ofdifferentcellpopulations.

3. In many cases, use of Logicle transformation will

improve the validity of statistical results compared

to data analysis software which truncates low and

negative values outside displayed log or linear scale

ranges and, therefore, cannot compute correct sta-

tistics for populations including such data values.

4. Logicle displays help to confirm correct compensa-

tion in that the visual centers of positive and nega-

tive populations in single stain compensation con-

trols line up when compensation is correct. This is

not true in logarithmic displays.

5. Logicle displays may lead to better selection of

population boundaries (gates), and therefore im-

prove validity of results. Logarithmic displays dis-

tort broad, low-mean populations, to give a peak

above the true center of the distribution and pileup

of low to negative events at the scale minimum

(baseline). This can lead to improper, or at least

suboptimal, gate boundary selection. Logicle dis-

FIG. 7. Detailed comparison of Logicle and exponential-plus-linear dis-

play functions. The Logicle, exponential, and linear curves in this plot are

drawn from the same data shown in Figure 4, but only the display scale

range from 1 to 3 is shown here. The exponential 1 linear function was

chosen to match the Logicle function as well as possible, near zero and at

high values. The scale expansion allows the small differences between

the Logicle and Exp 1 Lin curves to be seen clearly. Note that both the

Logicle and exponential 1 linear functions have the same slope as the lin-

ear line at display scale 5 1 (signal level zero) and that the Logicle curve

stays somewhat closer to both the linear and log curves. Bagwell’s hyper-

log display function (4) is equivalent to the exponential 1 linear formula-

tion illustrated here. [Color figure can be viewed in the online issue,

which is available at www.interscience.wiley.com.]

550

PARKS ET AL.

Cytometry Part A DOI 10.1002/cyto.a

Page 11

plays avoid this tendency by being nearly linear in

the region near zero.

6. Since Logicle scales go smoothly from linear to loga-

rithmic, they do not introduce artifacts that might

obscure real distinctions between populations or

give the impression of population distinctions that

are not real.

7. Logical transformed data is quite likely to be more

suitable than plain log or linear scaling for auto-

mated analysis, such as peak finding and cluster

analysis, since local distortions and edge pileups are

avoided or at least minimized.

8. The methods described here for automatically

selecting the Logicle width parameter to match par-

ticular data generally work well, but further work is

needed in this area to provide more flexible user

control of the transformation.

The Logicle scaling functions and Logicle display meth-

ods provide visualizations of flow cytometric data that are

readily interpreted by viewers and convey full and accu-

rate information regarding the underlying distributions of

the data and patterns of expression.

ACKNOWLEDGMENTS

The authors acknowledge the help of Adam Treister of

Tree Star Inc., Richard R. Hardy of the Fox Chase Cancer

Center, Martin Bigos of the Gladstone Institute for Virol-

ogy at the University of California, San Francisco and

Joseph Trotter of BD Biosciences. They also thank John

Mantovani and Leonore A. Herzenberg of Stanford Univer-

sity for help in producing the figures and the manuscript.

LITERATURE CITED

1. Parks DR, Bigos M. Analysis of flow cytometry data. In: Herzenberg L,

Blackwell C, Weir D, editors. The Handbook of Experimental Immunol-

ogy, 5th ed. Boston: Blackwell; 1996.

2. Loken MR, Parks DR, Herzenberg LA. Two-color immunofluorescence

using a fluorescence-activated cell sorter. J Histochem Cytochem 1977;

25:899–907.

3. Roederer M. Spectral compensation for flow cytometry: Visualiza-

tion artifacts, limitations, and caveats. Cytometry 2001;45:194–

205.

4. Bagwell CB. Hyperlog—a flexible log-like transform for negative, zero,

and positive valued data. Cytometry A, 2005;64:34–42.

5. Smith SW. The Scientist and Engineer’s Guide to Digital Signal Process-

ing, 2nd ed. San Diego, CA: California Technical Publishing; 1999. Chap-

ter 22, p 362–364.

6. Durbin BP, Hardin JS, Hawkins DM, Rocke DM. A variance-stabilizing

transformation for gene-expression microarray data. Bioinformatics 2002;

18 (Suppl. 1):S105–S110.

7. Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M. Var-

iance stabilization applied to microarray data calibration and to the quan-

tification of differential expression. Bioinformatics 2002;18 (Suppl. 1):

S96–S104.

8. Munson P. J. ‘‘Consistency’’ Test for Determining the Significance of

Gene Expression Changes on Replicate Samples and Two Convenient

Variance-Stabilizing Transformations. Genelogic Workshop on Low

Level Analysis of Affymetrix Genechip data. Bethesda, MD; 2001.

551

‘‘LOGICLE’’ DATA DISPLAY METHOD

Cytometry Part A DOI 10.1002/cyto.a