Page 1

A New ‘‘Logicle’’ Display Method Avoids

Deceptive Effects of Logarithmic Scaling for Low

Signals and Compensated Data

David R. Parks,1* Mario Roederer,2and Wayne A. Moore1

1Department of Genetics, Stanford University, Stanford, California 94305

2Vaccine Research Center, National Institutes of Health, Bethesda, Maryland 20892

Received 26 September 2005; Revision Received 18 January 2006; Accepted 20 January 2006

Background: In immunofluorescence

and most other flow cytometry applications, fluores-

cence signals of interest can range down to essentially

zero. After fluorescence compensation, some cell popu-

lations will have low means and include events with neg-

ative data values. Logarithmic presentation has been

very useful in providing informative displays of wide-ran-

ging flow cytometry data, but it fails to adequately dis-

play cell populations with low means and high variances

and, in particular, offers no way to include negative data

values. This has led to a great deal of difficulty in inter-

preting and understanding flow cytometry data, has

often resulted in incorrect delineation of cell popula-

tions, and has led many people to question the correct-

ness of compensation computations that were, in fact,

correct.

Results: We identified a set of criteria for creating data

visualization methods that accommodate the scaling diffi-

culties presented by flow cytometry data. On the basis of

these, we developed a new data visualization method that

provides important advantages over linear or logarithmic

scaling for display of flow cytometry data, a scaling we

measurements

refer to as ‘‘Logicle’’ scaling. Logicle functions represent a

particular generalization of the hyperbolic sine function

with one more adjustable parameter than linear or loga-

rithmic functions. Finally, we developed methods for

objectively and automatically selecting an appropriate

value for this parameter.

Conclusions: The Logicle display method provides more

complete, appropriate, and readily interpretable represen-

tations of data that includes populations with low-to-zero

means, including distributions resulting from fluorescence

compensation procedures, than can be produced using ei-

ther logarithmic or linear displays. The method includes a

specific algorithm for evaluating actual data distributions

and deriving parameters of the Logicle scaling function

appropriate for optimal display of that data. It is critical to

note that Logicle visualization does not change the data

values or the descriptive statistics computed from them.

q 2006 International Society for Analytical Cytology

Key terms: data display; flow cytometry; fluorescence

compensation; data scaling; data transformation

Practical experience has demonstrated that marker dis-

tributions measured by flow cytometry are often more-or-

less log-normal or are composed of mixtures of log-normal

distributions. Logarithmic data scales, which show log-

normal distributions as symmetrical peaks, are widely

used and accepted as those facilitating analysis of fluores-

cence measurements in biological systems (1).

On the other hand, cell populations with low mean,

high variance, and approximately normally distributed flu-

orescence values occur commonly in various kinds of flow

cytometry data. In particular, data values for cell popula-

tions that are essentially unstained or are negative for a

particular dye, after fluorescence compensation, should

be distributed more-or-less normally around a low value

representing the autofluorescence of the cells in that data

dimension. Data sets resulting from computed compensa-

tion commonly (and properly) include populations whose

distributions extend below zero. (When analog compensa-

tion is used, such distributions should also appear, but the

electronic implementations distort or truncate the distri-

butions, so that negative values are suppressed.)

Logarithmic displays, however, cannot accommodate

zero or negative values and often show a peak above the

actual mean or median of the population with a pileup of

events on the baseline (see Fig. 1). This effect has been

the source of considerable confusion and has been com-

monly referred to as the ‘‘log artifact.’’ Linear scaling is

more appropriate and more easily interpreted for display

*Correspondence to: David R. Parks, Stanford University, Beckman Cen-

ter B007, Stanford, CA 94305-5318, USA.

E-mail: drparks@stanford.edu

Published online7April 2006

interscience.wiley.com).

DOI: 10.1002/cyto.20258

inWiley InterScience(www.

q 2006 International Society for Analytical CytologyCytometry Part A 69A:541–551 (2006)

Page 2

of fluorescence compensated data on cell populations that

are low to negative for a particular dye.

Thus, there is a need for display scales that combine the

desirable attributes of the log scale for large real signals with

those of the linear scale for unstained and near-background

signals. The Logicle method presented here solves this pro-

blem by plotting data on axes that are asymptotically linear

in the region, around a data value of zero and asymptoti-

cally logarithmic at higher (positive and negative) values.

Figure 1 illustrates the utility of Logicle scaling in facili-

tating accurate interpretation of flow cytometry data. The

Logicle displays in the left panels show well-defined cell

populations in the gated regions. There are very few

events (well under 1%) on the baselines, and the medians

of events in each region (marked with crosses) are appro-

priately central to the visual data. In contrast, logarithmic

presentation of the same data sets (center and right

panels) makes the actually compact cell populations look

split into above-baseline and on-baseline ‘‘populations.’’ In

each data set, about 45% of the data events are on the

baselines (and in the dot displays almost invisible). The

medians of the populations are nowhere near their visual

centers. Logarithmic scaling, therefore, produces unintui-

tive data displays, and can lead to incorrect data evalua-

tions and attempts to define separate populations that are

not in reality separate. Additional benefits of Logicle data

display are discussed later in the Results section.

Background on Multicolor Fluorescence and

Compensation

In a flow cytometer, each fluorescence detector accepts

light from a particular laser excitation and in a particular

range of emission wavelengths optimized to detect a parti-

cular dye. However, each dye whose excitation is nonzero

at that laser wavelength and whose emission is not zero in

the detector’s emission band will contribute signal on that

detector. Therefore, although fluorescent dye combina-

tions used in flow cytometry are selected to minimize

spectral overlaps in multicolor measurements, each dye

will typically contribute signal on several detectors, and

each detector will receive some signal from several dyes.

For each cell in a biological analysis, we generally want

to separate the signal contributions from the different

dyes, so that an estimate of the amount of each fluorescent

FIG. 1. Two samples of mouse spleen cells stained for CD5 and IgD and gated in light scatter and other fluorescent markers for viable lymphocytes (upper

panels) or viable non-T-cells (lower panels). The measurements were made on a FACSAria and compensated and analyzed in FlowJo. The arrow in the

upper–center logarithmic display panel points out a data artifact resulting from applying the compensation calculation to log data, in which low or negative

data values have been truncated. This does not occur when the full data is retained.

542

PARKS ET AL.

Cytometry Part A DOI 10.1002/cyto.a

Page 3

reagent is obtained. The process of converting from fluo-

rescence color measurements to dye estimates is com-

monly called fluorescence compensation. Although the

technique was originally developed for analysis of two-

color single laser measurements (2), it is particularly criti-

cal in multicolor work. By evaluating the response of each

of the detectors to a series of compensation control sam-

ples, each of which is labeled with only one dye, we con-

struct a matrix of relative spectral overlaps. For each cell,

we multiply a set of detector color measurements by the

inverse of the spectral overlap matrix to obtain the corre-

sponding set of dye estimates for the cell. This calculation

is based on simple linear algebra, so any particular set of

color measurement values yields a specific set of dye esti-

mates. The estimated dye amounts are exactly those

whose total signal on each detector would yield the color

measurements actually observed.

Statistical Uncertainties in Dye Estimates

As is so often the case, this algebraic analysis is not com-

plete in the real world. The fundamental deviation comes

from the quantum nature of light and the finite amount of

light detected. Thus, the detected signal is subject to what

is commonly called counting statistics, governed by the

Poisson distribution. In practice, the limiting step is the

number of photoelectrons emitted at the cathode of the

photomultiplier tube. The standard deviation of actual

measurements in relation to their theoretical expectation

scales with the square root of the number of photoelec-

trons detected.

For cells with just autofluorescence or very low dye

levels, the effects of photon statistics, possible electronic

noise, and real differences in low-level fluorescence among

cells in a particular population often result in signal distri-

butions with low means and high relative variances. This

problem becomes magnified after fluorescence compensa-

tion, since the compensated value is subject to error contri-

butions from multiple measurements. This can readily lead

to a standard deviation of the dye estimate which is greater

than the mean for that estimate. This phenomenon has

been discussed and illustrated by Roederer (3). The end

result is that distributions of compensated dye estimates for

cells that are unstained by a given dye are often nearly nor-

mal and centered near zero, and may have large variances

compared to the corresponding distributions for totally

unstained cells. In particular, this process can properly

result in negative dye estimates for some cells even though,

of course, negative dye amounts are not possible. These

negative values must not be disregarded, since truncating

them will deform the data distributions and result in incor-

rect computation of signal means.

The overall result is that cell samples measured by flow

cytometry often contain cell populations whose signal dis-

tributions are appropriately represented in logarithmic

displays along with populations whose distributions can-

not be properly shown in a logarithmic display. Logicle

functions and methods were developed to provide unified

displays in which these different populations can all be

represented in a clear and intuitive way.

MATERIALS AND METHODS

Test Particles and Cell Samples

Spherotech Rainbow multidye particles (Spherotech,

Libertyville, IL) were used for the data in Figure 5. Reagent

capture beads carrying a monoclonal rat-anti-mouse j anti-

body and matched blank beads (both from BD Biosciences,

San Jose, CA), were used to produce the data in Figure 6.

All of these particles are about 3 lm in diameter.

Cell samples, used to generate the illustrations, are de-

scribed in the figure captions.

Instrumentation

Illustrative data were obtained using a FACSAria (BD

Biosciences, San Jose, CA), which employs linear digital

data acquisition with 14 bit sampling at 10 MHz rate. Area

signals are produced as sums, over a range of ?50–100

samples, and are presented as 18-bit linear values. Since

background subtraction is included in the evaluation, zero

and negative data values can occur. These are preserved in

the floating point FCS files.

Data Analysis and Modeling

Analysis of data from the flow cytometer was carried

out in FlowJo (Tree Star, Ashland, OR) version 4.3 or later.

This package includes the ability to import floating point

FCS files into Logicle scaling, so that negative data values

are retained. FlowJo provides support for computed fluo-

rescence compensation, including automatic selection of

appropriate Logicle scale functions for each compensated

data dimension. In producing Figure 2, the spectral matri-

ces used in computing fluorescence compensation were

edited externally and imported into FlowJo to generate il-

lustrative data distributions.

Modeling and plotting of Logicle functions and various

other functions for comparisons were carried out in

Microsoft Excel.

RESULTS

Criteria for a New Data Display Method

We developed the following criteria for defining a new

scaling function that would yield better displays for much

flow cytometry data than can be produced using tradi-

tional logarithmic or linear scaling.

? The display formula supports a family of functions

that can be optimized for viewing different data sets.

? The function becomes logarithmic for large data

values, to ensure a wide dynamic range and to pro-

vide good visualizations of the often log-normal distri-

butions, at high fluorescence intensities.

? The function becomes linear near zero, and extends to

negative data values and is symmetrical around zero,

providing near-linear visualization, appropriate for lin-

ear–normal distributions at low fluorescence intensities.

? The transition between the linear to logarithmic

regions is as smooth as possible, to avoid introducing

artifacts in the display.

543

‘‘LOGICLE’’ DATA DISPLAY METHOD

Cytometry Part A DOI 10.1002/cyto.a

Page 4

? As the linearization strength is increased to accommo-

date a wider range of linearized data values, the reason-

ably linear region of the data values grows faster than

the size of the linearized region in the display. Thus,

the user has a visual indication that a greater degree of

linearization is in use, but the display space is balanced

between more linear and more logarithmic regions.

Specification of Logicle Functions

By considering these criteria and examining the behavior

of a number of functions, we concluded that particular gen-

eralizations of the hyperbolic sine function (sinh), which

we came to call Logicle functions, can best meet the crite-

ria. The hyperbolic sine function itself has the desirable

properties of being essentially linear near zero, becoming

exponential for large values (leading to a logarithmic display

scale there), and making a very smooth transition between

these regions (i.e., it is continuous in all derivatives), but it

does not provide enough flexibility to meet the display

needs encountered in flow cytometry.1

The hyperbolic sine function itself is given as follows:

sinhðxÞ ¼ ðex? e?xÞ=2

ð1Þ

This can be generalized to what we call biexponential

functions,

Sðx;a;b;c;d;fÞ ¼ aebx? ce?dxþ f

ð2Þ

Interpreting the condition of maximal linearity around

data value zero to mean that the second derivative of the

function should be zero therein, we identified a subset of

biexponential functions with this property and call them

Logicle scaling functions.

Besides the constraint just specified, there are four fur-

ther choices that need to be made to fix the five parame-

ters in Eq. (2) (a, b, c, d, and f), and thereby define a speci-

fic display. How these choices appear in an actual Logicle

display is illustrated in Figure 2. The parameters described

later and in Figure 2 are not simply a, b, c, d, and f, but,

once specified, they uniquely determine the function in

Eq. (2). The first choice is the maximum data value in the

displayed scale (T). The second is the range of the display

in relation to the width of high data value decades (M or

m in decade or natural log formulations, respectively). If

this is held constant among plots optimized to different

data sets, the nearly logarithmic area at the upper end of

each display will be essentially the same, while the region

near data zero is adjusted to optimize for different data

sets. We have found that a total plot width of 4.5 ‘‘dec-

ades’’ is usually a good choice for displaying flow cytome-

try data.

The third choice is the strength and range of lineariza-

tion around zero (Wor w). The linear slope at zero (in, for

example, data units per pixel or data units per mm in a

printout) and the range of data values in the nearly linear

zone are determined by this selection. In displaying a par-

ticular data set, the linearized range must be adequate to

cover broad population distributions that do not display

well on log scales. This, in particular, is the selection that

is critical in matching displays to particular data sets and

in ensuring that the linearized zone covers the range of

statistical spread in the data. If the transition toward log

behavior occurs in too low data values, the artifacts seen

in logarithmic displays will not be suppressed.

The fourth choice is to specify the range of negative

values to be included in the display (which also defines

the position of the data zero in the plot). This range must

be great enough to avoid truncating populations of inter-

est. In practice, as shown in Figure 2, we find that it is de-

sirable to link the third and fourth choices as a single

1NOTE: In cytometry, we normally label ‘‘logarithmic’’ axes with values

from the corresponding exponential function, rather than with the loga-

rithm itself, e.g., decade labels like 10, 100, 1,000 and not 1, 2, and 3. The

Logicle functions defined in the equations given later are data value func-

tions. Their inverses provide Logicle display functions in the same way

that exponential scaling functions provide logarithmic data displays.

FIG. 2. How the Logicle parameters relate to the resulting Logicle scale and data display. M and W are expressed in decades, i.e., base 10 log units. Their

natural log forms are m 5 M ln(10) and w 5 W ln(10). The data curves are only for illustration here but are used and described in Figure 5. [Color figure

can be viewed in the online issue, which is available at www.interscience.wiley.com.]

544

PARKS ET AL.

Cytometry Part A DOI 10.1002/cyto.a

Page 5

value. This assures that the lowest negative data values in

view correspond to the approximate edge of the linear-

ized zone. As discussed earlier under Statistical Uncertain-

ties, negative values should occur only as a result of statis-

tical spreading, and, therefore, they should be displayed

within the near-linear zone.

Assuming that the top-of-scale value and the nominal

‘‘decade’’ width of the display have been selected, linking

the third and fourth choices results in a family of functions

with only one parameter to be adjusted to match the parti-

cular data set being displayed.

Using natural log units, an expression for the Logicle

scaling function that embodies all of the constraints and

choices described earlier is given as follows:

S(x;w) ¼ Te?ðm?wÞðex?w? p2e?ðx?wÞ=pþ p2? 1Þ

for x ? w

ð3Þ

In Eq. (3), T is the top of scale data value (e.g., 10,000 for

common 4 decade data or 262,144 for an 18 bit data

range).

w 5 2p ln(p)/(p 1 1) is the width of the negative data

range and the range of linearized data in natural log units.

p is introduced for compactness in presenting the Logicle

function, but p and w together represent a single adjusta-

ble parameter.

m is the breadth of the display in natural log units. For a

4.5 decade, display range m 5 4.5 ln(10) 5 10.36.

The display is defined for x in the range from 0 to m.

Negative data values appear in the space from x 5 0 to

x 5 w, and positive data values are plotted between x 5

w and x 5 m (where the top data value T occurs). The

form shown as Eq. (3) is for the positive data zone, where

x ? w. For the negative zone where x < w, we enforce

symmetry by computing the Logicle function for the cor-

responding positive value (w 2 x) and changing the sign.

The data zero at x 5 w is where the second derivative is

zero, i.e., the most linear area.

To select an appropriate value for w to generate a good

display for a particular data set, we obtain a reference value

marking the low end of the distribution to be displayed. As

described later, we typically select the data value at the fifth

percentile of all events that are below zero as this reference

value. Designating this (negative) value as ‘‘r,’’ and using its

absolute value abs(r), w is computed as follows:

w ¼ ðm ? lnðT=absðrÞÞÞ=2

ð4Þ

Equations (3) and (4) can be rewritten using base 10 rep-

resentation in order to express the parameters in terms of

‘‘decades’’ of signal level or display:

SðX;WÞ ¼ T ? 10?ðM?WÞð10X?W?p2? 10?ðX?WÞ=pþp2?1Þ

for X ? W

ð5Þ

In Eq. (5), W 5 2p log(p)/(p 1 1) is the width of the nega-

tive data range and the range of linearized data in ‘‘dec-

ades’’ and M is the breadth of the display in ‘‘decades.’’ For

a 4.5 decade, display range M 5 4.5.

We obtain W from the negative range reference value

‘‘r’’ as follows:

W ¼ ðM ? logðT=absðrÞÞÞ=2

ð6Þ

Figure 2 illustrates the relationship between these parame-

ters and the resulting Logicle display.

Specifying a logarithmic display requires two values cor-

responding to T and M, and the scaling near the upper

end of a Logicle plot approximates that of a logarithmic

display with the same values of T and M. The additional

linearization width, W, adapts the Logicle scale to the char-

acteristics of different data sets.

Logicle functions with different values of W are plotted

in Figure 3 along with the linear and exponential func-

tions that match them around data zero and at high data

values, respectively. Use of an exponential function for

scaling is what results in a logarithmic scale. Note that

each Logicle curve closely follows its matched linear func-

tion at low signal values, confirming good linearity in the

region around data zero. At middle signal values that vary

depending on the value of W, the Logicle functions depart

from linearity and move smoothly toward the exponential

line. At high signal levels, the Logicle curves become indis-

tinguishable from the exponential line.

Figure 4 shows a Logicle curve for W 5 1.0 and its

matched linear and exponential curves displayed with a lin-

ear signal level scale. The signal level scale is expanded

(top of scale is 300 rather than 10,000) to show in detail

the matching of the Logicle and linear curves at low signal

levels and the divergence of the Logicle curve at higher

levels and the beginning of its approach to the exponential

curve.

Strategy for Selecting the Width Parameter

As we have discussed, proper estimates of dye signals

using measurements on individual cells may be negative,

but actual negative dye amounts are impossible. There-

fore, any negative values present in the compensated

data must be due to purely statistical effects. This is true

despite the presence of essentially arbitrary positive stain-

ing distributions. Thus, for a population with near zero

mean and significant statistical spread, the most negative

values indicate the necessary range of the negative part

of the scale, and they also indicate the range of lineariza-

tion needed to ensure that the population will be dis-

played in a compact and unimodal form. The positive

part of the population is less helpful, since it may overlap

with other populations in the data set and may not pro-

vide a clear upper end with which to define a suitable

range for linearization.

A simple strategy of choosing the fifth percentile of the

negative data values to set this scale seems to work well

and combines adequate sensitivity to extreme values with

reasonable sampling stability. Using this strategy (the one

currently implemented in FlowJo and illustrated in Figure 2

545

‘‘LOGICLE’’ DATA DISPLAY METHOD

Cytometry Part A DOI 10.1002/cyto.a