ArticlePDF Available

Visual analysis of mass cytometry data by hierarchical stochastic neighbour embedding reveals rare cell types

November 2017
Nature Communications 8(1)

DOI:10.1038/s41467-017-01689-9

License
CC BY 4.0

Authors:

Vincent van Unen

Leiden University Medical Centre

Nicola Pezzotti

Delft University of Technology

Na Li

Leiden University Medical Centre

Show all 9 authorsHide

Mass cytometry allows high-resolution dissection of the cellular composition of the immune system. However, the high-dimensionality, large size, and non-linear structure of the data poses considerable challenges for the data analysis. In particular, dimensionality reduction-based techniques like t-SNE offer single-cell resolution but are limited in the number of cells that can be analyzed. Here we introduce Hierarchical Stochastic Neighbor Embedding (HSNE) for the analysis of mass cytometry data sets. HSNE constructs a hierarchy of non-linear similarities that can be interactively explored with a stepwise increase in detail up to the single-cell level. We apply HSNE to a study on gastrointestinal disorders and three other available mass cytometry data sets. We find that HSNE efficiently replicates previous observations and identifies rare cell populations that were previously missed due to downsampling. Thus, HSNE removes the scalability limit of conventional t-SNE analysis, a feature that makes it highly suitable for the analysis of massive high-dimensional data sets.

Schematic overview of Cytosplore+HSNE for exploring the mass cytometry data. By creating a multi-level hierarchy of an illustrative 3D data set (a), we achieve a clear separation of different cell groups in an overview embedding (left panel b) that conserves non-linear relationships (i.e., follows the distance indicated by the dashed line in a, instead of the grey arrow) and more detail within the separate groups on the data level (right panel b). c Construction and exploration of the hierarchy. The hierarchy is constructed starting with the data level (left two columns). On the basis of the high-dimensional expression patterns of the cells, a weighted kNN graph is constructed, which is used to find representative cells used as landmarks in the next coarser level. By administering the area of influence (AoI) of the landmarks, cells/landmarks can be aggregated without losing the global structure of the underlying data or creating shortcuts. The exploration of the hierarchy is shown in the two rightmost columns. At the bottom, we see the overview level (in this example the 3rd level in the hierarchy), which shows that a group of landmarks has low expression in marker c (bottom-right panel). Selecting this group of landmarks for further exploration results in a look-up of the landmarks in the preceding level (neighborhood graph, intermediate level) that are in the AoI, with which a new embedding can be created at the 2nd level of the hierarchy (middle-right panel). Marker b shows a strong separation between the upper and lower landmarks at this level. Zooming-in on the landmarks with low expression of marker b reveals further separation in marker a at the lowest level, the full data level (top-right panel)

…

Gain of information by analyzing the mass cytometry data at full resolution with Cytosplore+HSNE. a Pie chart showing cellular composition of the mass cytometry data set. Color represents the subsets (N = 142), as identified in our previous study¹⁴. Black represents the cells discarded by stochastic downsampling and grey represents the cells discarded by ACCENSE clustering. b Embeddings of the 1.1 million cells annotated in ref ¹⁴ showing the top three levels of the HSNE-hierarchy (five levels in total). Color represents annotations as in a. Size of the landmarks is proportional to the number of cells in the AoI that each landmark represents. Bottom map shows density features depicting the local probability density of cells for the level 3 embedding, where black dots indicate the centroids of identified cluster partitions using GMS clustering. c Embeddings of all 5.2 million cells, again showing only the top three levels of the hierarchy (five levels in total). Colors as in a. Right panels visualize landmarks representing cells discarded by stochastic downsampling (black) and the cells discarded by ACCENSE (grey). Bottom map shows density features for the level 3 embedding as described in (b). d Frequency of annotated cells for 145 clusters identified by Cytosplore+HSNE at the third hierarchical level using GMS clustering in c. Color coding as in a

…

Analysis of the CD7⁺CD3⁻ innate lymphocyte compartment in inflammatory intestinal diseases. a First HSNE level embedding of 5.2 million cells. Color represents arcsin5-transformed marker expression as indicated. Size of the landmarks represents AoI. Blue encirclement indicates selection of landmarks representing CD7⁺CD3⁻ innate lymphocytes and CD4⁺ T cells further discussed in Fig. 5. b The major immune lineages, annotated on the basis of lineage marker expression. c Third HSNE level embedding of the CD7⁺CD3⁻ innate lymphocytes (5.0 × 10⁵ cells). Color represents arcsin5-transformed marker expression in top panels, and tissue-origin and clinical features in bottom panels. Blue encirclement indicates selection of landmarks representing CD127⁺ILC and ILC-like cells. d Third HSNE level embedding shows density features depicting the local probability density of cells, where black dots indicate the centroids of identified cluster partitions using GMS clustering. e Embedding of the CD127⁺ILC and ILC-like cells (6.0 × 10⁴ cells) at single-cell resolution. Arrows indicate ILC1 (blue), ILC2 (orange) and ILC3 (green). Bottom-right panel shows corresponding cluster partitions using GMS clustering based on density features (top-right panel). f A heatmap summary of median expression values (same color coding as for the embeddings) of cell markers expressed by CD127 + ILC and ILC-like clusters identified in b and hierarchical clustering thereof. g Composition of cells for each cluster is represented graphically by a horizontal bar in which segment lengths represent the proportion of cells with: (left) tissue-of-origin, (middle) disease status and (right) sampling status

…

CD127⁺ILC and ILC-like subsets identified by Cytosplore+HSNE. Table showing cluster number, distinguishing phenotypic marker expression profiles and biological annotation for the clusters identified in Fig. 3e. Black color indicates clusters described in previous reports and red color additional unknown clusters. Hierarchical clustering of clusters based on marker expression profile shown in the heatmap depicted in Fig. 3f

…

Analysis of the CD4⁺ T-cell compartment in inflammatory intestinal diseases. a Third HSNE level embedding of the CD4⁺ T cells (1.4 × 10⁶ cells, selected in Fig. 3). Color and size of landmarks as described in Fig. 3. Right panel shows density features for the level 3 embedding. Blue encirclement indicates selection of landmarks representing CD28⁻CD4⁺ T cells. b Embedding of the CD28⁻CD4⁺ T cells (2.6 × 10⁴ cells) at single-cell resolution. Bottom-left panel shows yellow and black dashed encirclements based on CD56⁻ and CD56⁺ expression, respectively. Three bottom-right panels show cells colored according to: (left) from subjects with different disease status (CeD, Crohn, EATLII, RCDII, and controls), (middle) sampling status (annotated subset, discarded by ACCENSE and downsampled) and (right) tissue-of-origin (blood and intestine)

…

Figures - available from: Nature Communications

This content is subject to copyright. Terms and conditions apply.

Access to this full-text is provided by Springer Nature.

Learn more

Download available

Content available from Nature Communications

This content is subject to copyright. Terms and conditions apply.

ARTICLE

Visual analysis of mass cytometry data by

hierarchical stochastic neighbour embedding

reveals rare cell types

Vincent van Unen 1, Thomas Höllt2,3, Nicola Pezzotti2,NaLi

1, Marcel J.T. Reinders 4, Elmar Eisemann2,

Frits Koning1, Anna Vilanova2& Boudewijn P.F. Lelieveldt4,5

Mass cytometry allows high-resolution dissection of the cellular composition of the immune

system. However, the high-dimensionality, large size, and non-linear structure of the

data poses considerable challenges for the data analysis. In particular, dimensionality

reduction-based techniques like t-SNE offer single-cell resolution but are limited in the

number of cells that can be analyzed. Here we introduce Hierarchical Stochastic Neighbor

Embedding (HSNE) for the analysis of mass cytometry data sets. HSNE constructs a

hierarchy of non-linear similarities that can be interactively explored with a stepwise increase

in detail up to the single-cell level. We apply HSNE to a study on gastrointestinal disorders

and three other available mass cytometry data sets. We ﬁnd that HSNE efﬁciently replicates

previous observations and identiﬁes rare cell populations that were previously missed due to

downsampling. Thus, HSNE removes the scalability limit of conventional t-SNE analysis, a

feature that makes it highly suitable for the analysis of massive high-dimensional data sets.

DOI: 10.1038/s41467-017-01689-9 OPEN

1Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Albinusdreef 2, 2333 ZA Leiden, The Netherlands. 2Computer

Graphics and Visualization Group, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands. 3Computational Biology Center, Leiden

University Medical Center, Albinusdreef 2, 2333 ZA Leiden, The Netherlands. 4Pattern Recognition and Bioinformatics Group, Delft University of Technology,

Mekelweg 4, 2628 CD Delft, The Netherlands. 5Division of Image Processing, Department of Radiology, Leiden University Medical Center, Albinusdreef 2,

2333 ZA Leiden, The Netherlands. Vincent van Unen, Thomas Höllt and Nicola Pezzotti contributed equally to this work. Frits Koning, Anna Vilanova and

Boudewijn P.F. Lelieveldt jointly supervised this work. Correspondence and requests for materials should be addressed to V.v.U. (email: V.van_unen@lumc.nl)

or to B.P.F.L. (email: B.P.F.Lelieveldt@lumc.nl)

NATURE COMMUNICATIONS |8: 1740 |DOI: 10.1038/s41467-017-01689-9 |www.nature.com/naturecommunications 1

1234567890

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Mass cytometry (cytometry by time-of-ﬂight; CyTOF)

allows the simultaneous analysis of multiple cellular

markers (>30) present on biological samples consisting

of millions of cells. Computational tools for the analysis of such

data sets can be divided into clustering-based and dimensionality

reduction-based techniques1, each having distinctive advantages

and disadvantages. The clustering-based techniques, including

SPADE2, FlowMaps3, Phenograph4, VorteX5and Scaffold maps6,

allow the analysis of data sets consisting of millions of cells but

only provide aggregate information on generated cell clusters at

the expense of local data structure (i.e., single-cell resolution).

Dimensionality reduction-based techniques, such as PCA7,t-

SNE8(implemented in viSNE9), and Diffusion maps10, do allow

analysis at the single-cell level. However, the linear nature of PCA

renders it unsuitable to dissect the non-linear relationships in the

mass cytometry data, while the non-linear methods (t-SNE8and

Diffusion maps10) do retain local data structure, but are limited

by the number of cells that can be analyzed. This limit is imposed

by a computational burden but, more importantly, by local

neighborhoods becoming too crowded in the high-dimensional

space, resulting in overplotting and presenting misleading infor-

mation in the visualization. In cytometry studies, this poses a

problem, as a signiﬁcant number of cells needs to be removed by

random downsampling to make dimensionality reduction com-

putationally feasible and reliable. Future increases in acquisition

rate and dimensionality in mass- and ﬂow cytometry are expected

to amplify this problem signiﬁcantly11,12.

Here we adapted Hierarchical stochastic neighbor embedding

(HSNE)13 that was recently introduced for the analysis of

hyperspectral satellite imaging data to the analysis of mass

Marker b

Marker a

Marker c

HSNE

(2 levels)

HSNE 1 HSNE 1

HSNE 2

AoI

(# Events)

Overview level Data level

Construction Exploration

Embedding

color: marker a

Density

Heatmap

Color: marker c

Color: marker b

HSNE 1

HSNE 2

Cell

Landmark

AoI

123

1234

AoI Density

Expression

Events / area of influence (AoI)

Hierarchy construction (high-dimensional space)

Neighborhood graph Embeddings and clustering

Hierarchy exploration (two-dimensional space)

Intermediate levels

(arbitrary number)Overview level Data level

Fig. 1 Schematic overview of Cytosplore+HSNE for exploring the mass cytometry data. By creating a multi-level hierarchy of an illustrative 3D data set (a),

we achieve a clear separation of different cell groups in an overview embedding (left panel b) that conserves non-linear relationships (i.e., follows the

distance indicated by the dashed line in a, instead of the grey arrow) and more detail within the separate groups on the data level (right panel b).

cConstruction and exploration of the hierarchy. The hierarchy is constructed starting with the data level (left two columns). On the basis of the

high-dimensional expression patterns of the cells, a weighted kNN graph is constructed, which is used to ﬁnd representative cells used as landmarks in the

next coarser level. By administering the area of inﬂuence (AoI) of the landmarks, cells/landmarks can be aggregated without losing the global structure of

the underlying data or creating shortcuts. The exploration of the hierarchy is shown in the two rightmost columns. At the bottom, we see the overview level

(in this example the 3rd level in the hierarchy), which shows that a group of landmarks has low expression in marker c (bottom-right panel). Selecting this

group of landmarks for further exploration results in a look-up of the landmarks in the preceding level (neighborhood graph, intermediate level) that are in

the AoI, with which a new embedding can be created at the 2nd level of the hierarchy (middle-right panel). Marker b shows a strong separation between

the upper and lower landmarks at this level. Zooming-in on the landmarks with low expression of marker b reveals further separation in marker a at the

lowest level, the full data level (top-right panel)

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01689-9

2NATURE COMMUNICATIONS |8: 1740 |DOI: 10.1038/s41467-017-01689-9 |www.nature.com/naturecommunications

Content courtesy of Springer Nature, terms of use apply. Rights reserved

cytometry data sets to visually explore millions of cells while

avoiding downsampling. HSNE builds a hierarchical representa-

tion of the complete data that preserves the non-linear high-

dimensional relationships between cells. We implemented HSNE

in an integrated single-cell analysis framework called Cytosplore

+HSNE. This framework allows interactive exploration of the

hierarchy by a set of embeddings, two-dimensional scatter plots

where cells are positioned based on the similarity of all marker

expressions simultaneously, and used for subsequent analysis

such as clustering of cells at different levels of the hierarchy. We

found that Cytosplore+HSNE replicates the previously identiﬁed

hierarchy in the immune-system-wide single-cell data4,5,14, i.e.,

we can immediately identify major lineages at the highest over-

view level, while acquiring more information by dissecting the

immune system at the deeper levels of the hierarchy on demand.

Additionally, Cytosplore+HSNE does so in a fraction of the

time required by other analysis tools. Furthermore, we identiﬁed

rare cell populations speciﬁcally associating to diseases in both

the innate and adaptive immune compartments that were pre-

viously missed due to downsampling. We highlight scalability and

generalizability of Cytosplore+HSNE using three other data sets,

consisting of up to 15 million cells. Thus, Cytosplore+HSNE

combines the scalability of clustering-based methods with the

local single-cell detail preservation of non-linear dimensionality

reduction-based methods. Finally, Cytosplore+HSNE is not only

applicable to mass cytometry data sets, but can be used for

the other high-dimensional data like single-cell transcriptomic

data sets.

HSNE clusters

Annotated subsets

(% of HSNE clusters)

100

Overview level

All cells

Visualization: annotated

subsets

HSNE generated using 5.2×106 cells

HSNE generated

using 1.1×106 cells

Level 2

Downsampled +

discarded

Level 3

Downsampled

Discarded

Annotated subsets

0 142

×106 cells

3.0

1.1

Density

HighLow

HSNE 1

HSNE 2

Fig. 2 Gain of information by analyzing the mass cytometry data at full resolution with Cytosplore+HSNE .aPie chart showing cellular composition of

the mass cytometry data set. Color represents the subsets (N=142), as identiﬁed in our previous study14. Black represents the cells discarded by

stochastic downsampling and grey represents the cells discarded by ACCENSE clustering. bEmbeddings of the 1.1 million cells annotated in ref 14 showing

the top three levels of the HSNE-hierarchy (ﬁve levels in total). Color represents annotations as in a. Size of the landmarks is proportional to the number

of cells in the AoI that each landmark represents. Bottom map shows density features depicting the local probability density of cells for the level 3

embedding, where black dots indicate the centroids of identiﬁed cluster partitions using GMS clustering. cEmbeddings of all 5.2 million cells, again showing

only the top three levels of the hierarchy (ﬁve levels in total). Colors as in a. Right panels visualize landmarks representing cells discarded by

stochastic downsampling (black) and the cells discarded by ACCENSE (grey). Bottom map shows density features for the level 3 embedding as

described in (b). dFrequency of annotated cells for 145 clusters identiﬁed by Cytosplore+HSNE at the third hierarchical level using GMS clustering in c.

Color coding as in a

NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01689-9 ARTICLE

NATURE COMMUNICATIONS |8: 1740 |DOI: 10.1038/s41467-017-01689-9 |www.nature.com/naturecommunications 3

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Results

Hierarchical exploration of massive single-cell data. For a given

high-dimensional data set such as the three-dimensional illus-

trative example in Fig. 1a, HSNE13 builds a hierarchy of local

neighborhoods in this high-dimensional space, starting with the

raw data that, subsequently, is aggregated at more abstract

hierarchical levels. The hierarchy is then explored in reverse order,

by embedding the neighborhoods using the similarity-based

embedding technique, Barnes–Hut (BH)-SNE15. To allow for

more detail and faster computation, each level can be partitioned

in part or completely, by manual gating or unsupervised cluster-

ing, and partitions are embedded separately on the next, more

ILC

and

ILC-like cells

cCD127CD7 CD45RA CD56

CD38 NKp46

CD161

Level 3

5.0×105 cells (9.6 %)

Density

HighLow

CD127CD7 CD45RA CD56

CRTH2 c-KIT

CD27

Data level

6.0×104 cells (1.2 %)

Cluster partitions

Density

Blood/intestine Clinical features Sampling

Cell frequencies (fraction of cluster)

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

CD19

CCR6

c-KIT

CD11b

CD4

CD8a

CD7

CD25

CD123

TCRγδ

CD45

CRTH2

CD122

CCR7

CD14

CD11c

CD161

CD127

CD8b

CD27

IL-15Ra

CD45RA

CD3

CD28

CD38

NKp46

PD-1

CD56

16 15 4

13 3

512

CD4+ T cells

CD8+ T cells

TCRγδ

Innate

lymphocytes

B cells

Myeloid1

Myeloid2

CD7 CD3 CD4

Overview level

5.2×106 cells (100 %)

CD8a

TCRγδ

CD19

CD11c

Marker expression

HSNE 1

HSNE 2

Fig. 5

Select and zoom-In

Cluster

unique to RCDII

Tissue

Blood

RCDII

EATLII

Crohn

CeD

Ctrl

Blood

Intestine

Ctrl

CeD

Crohn

EATLII

RCDII

Subset

Discarded

Downsampled

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01689-9

4NATURE COMMUNICATIONS |8: 1740 |DOI: 10.1038/s41467-017-01689-9 |www.nature.com/naturecommunications

Content courtesy of Springer Nature, terms of use apply. Rights reserved

detailed level (compare Fig. 1b). HSNE works particularly well for

the analysis of the mass cytometry data because the local neigh-

borhood information of the data level is propagated through the

complete hierarchy. Groups of cells that are close in the Euclidian

sense (Fig. 1a, grey arrow), but not on the non-linear manifold

(Fig. 1a, dashed black line), are well separated even at higher

aggregation levels (Fig. 1b). The power of HSNE lies in its scal-

ability to tens of millions of cells, while the possibility to con-

tinuously explore the hierarchy allows the identiﬁcation of rare

cell populations at the more detailed levels. Next follows a general

description of how the hierarchy is built and explored through

embeddings. More details can be found in the Methods section.

The left panels of Fig. 1c give an overview of the HSNE-hierarchy

construction. We show the hierarchy from the ﬁne-grained data

level to an overview level from the top to bottom panels. The

number of levels is deﬁned by the user and depends mostly on the

input-data size. While the data aggregation is completely data-

driven, for a typical mass cytometry data set, every additional level

reduces the number of landmarks by roughly one order of

magnitude. Therefore, we recommend to use log10(N/100) levels,

with N being the number of cells: this generally results in at most

few thousands of landmarks at the highest level of the hierarchy.

The foundation of the hierarchy is constructed using the original

input data. Each dot represents a single cell (Fig. 1c, data level).

Similarities between cells on the data level are deﬁned by building

an approximated, weighted k-nearest neighbor (kNN) graph16

using the Euclidian distances based on the complete marker

expression (Fig. 1c, top-center panel). The weights of this graph can

directly be used as input to embed the data into a two-dimensional

space (Fig. 1c, top-right panel). With the BH-SNE the two-

dimensional embedding is generated such that the layout of the

points indicates similarities between the cells in the high-

dimensional space according to the neighborhood graph.

To aggregate the data into the next level (Fig. 1c, intermediate

levels), we identify representative cells to use as landmarks (Fig. 1c,

white circles). For that, the weighted kNN graph is interpreted as a

Finite Markov Chain and the most inﬂuential (i.e., best-

connected) nodes are chosen as landmarks, using a Monte Carlo

process. The landmarks are then embedded into a two-

dimensional space based on their similarities. However, simply

repeating the kNN construction with Euclidian distances for the

selected landmarks in the high-dimensional space would even-

tually eliminate non-linear structures by creating undesired

“shortcuts”in the graph (a problem reported by Setty et al.17 in

a different setting). Instead, we deﬁne the area of inﬂuence (AoI)

of each landmark, indicated by the grey hulls (Fig. 1c, left panels),

as the cells that are well-represented by the landmark according to

the kNN graph. Different landmarks can have overlapping regions

of locally-similar cells. Therefore, we deﬁne the similarity of two

landmarks as the overlap of their respective AoIs. Furthermore, we

construct a neighborhood graph, based on these similarities. Here,

two nodes are connected if they have overlapping AoIs. The

strength of the connection is deﬁned by the number of data points

within the overlapping region. This graph replaces the kNN graph

as input for levels subsequent to the data level. Hereby, we

effectively maintain the non-linear structure of the data to the top

of the hierarchy and avoid shortcuts (Fig. 1c, bottom panels). We

show that the preservation of non-linear neighborhoods by HSNE

indeed conserves structure that is otherwise lost by random

downsampling (Supplementary Note 1. Cytosplore+HSNE is

reproducible and robust. and Supplementary Fig. 1).

The data exploration in Cytosplore+HSNE starts with the

visualization of the embedding at the highest level, the overview

level (Fig. 1c, bottom-right panel). Similar to other embedding

techniques for visualizing the single-cell data4,9, the layout of

the landmarks indicates similarity in the high-dimensional

space according to the level’s neighborhood graph. Color is

used to represent additional traits, such as marker expressions.

The landmark size reﬂects its AoI. While it is possible to

continuously select all landmarks and compute a complete

embedding of the next, more detailed level, this strategy would

eventually embed all the data and suffer from the same scalability

problems as a t-SNE embedding, i.e., overcrowding (Supplemen-

tary Note 2. Millions of cells cause performance issues and

overcrowding in t-SNE. and Supplementary Fig. 2) and slow

performance. Instead, we envision that the user selects a group of

landmarks, by manual gating based on visual cues such as patterns

found in marker expression, or by performing unsupervised

Gaussian mean shift (GMS) clustering18 of the landmarks based

on the density representation of the embedding (Fig. 1c, right

panels). Then, the user can zoom into this selection by means of a

more detailed embedding. This means that, all landmarks/cells in

the combined AoI on the preceding level are retrieved from the

neighborhood graph (Fig. 1c, blue encirclements), embedded, and

visualized in a new view. Moreover, interactively linked heatmap

visualizations of clusters (Fig. 1c, right panels) and descriptive

statistics of markers within a selection can be used to guide

the exploration. For example, these tools allow to inspect the

heterogeneity of cells within individual clusters, including the cells

associated to individual landmarks. Importantly, all of the

described tools are available at every level of the hierarchy and

linked interactively. Selections in the embedding and heatmap at

one level of the hierarchy can thus be highlighted in the

embeddings of other levels (Supplementary Fig. 3). All these

aspects are further demonstrated using a typical exploration

workﬂow with Cytosplore+HSNE in the Supplementary Movie 1.

With this strategy, tens of millions of cells can be explored,

providing both global visualizations up to single-cell resolution

visualizations, while preserving non-linear relationships between

landmarks/cells at all levels of the hierarchy.

HSNE eliminates the need for downsampling. In a previous

study14, a mass cytometry data set on 5.2 million cells derived

from intestinal biopsies and paired blood samples was analyzed

using a SPADE-t-SNE-ACCENSE pipeline. Due to t-SNE

Fig. 3 Analysis of the CD7+CD3−innate lymphocyte compartment in inﬂammatory intestinal diseases. aFirst HSNE level embedding of 5.2 million cells.

Color represents arcsin5-transformed marker expression as indicated. Size of the landmarks represents AoI. Blue encirclement indicates selectionof

landmarks representing CD7+CD3−innate lymphocytes and CD4+T cells further discussed in Fig. 5.bThe major immune lineages, annotated on the basis of

lineage marker expression. cThird HSNE level embedding of the CD7+CD3−innate lymphocytes (5.0 × 105cells). Color represents arcsin5-transformed

marker expression in top panels, and tissue-origin and clinical features in bottom panels. Blue encirclement indicates selection of landmarks representing

CD127+ILC and ILC-like cells. dThird HSNE level embedding shows density features depicting the local probability density of cells, where black dots indicate

the centroids of identiﬁed cluster partitions using GMS clustering. eEmbedding of the CD127+ILC and ILC-like cells (6.0 × 104cells) at single-cell resolution.

Arrows indicate ILC1 (blue), ILC2 (orange) and ILC3 (green). Bottom-right panel shows corresponding cluster partitions using GMS clustering based on

density features (top-right panel). fA heatmap summary of median expression values (same color coding as for the embeddings) of cell markers expressed

by CD127 + ILC and ILC-like clusters identiﬁed in band hierarchical clustering thereof. gComposition of cells for each cluster is represented graphically by a

horizontal bar in which segment lengths represent the proportion of cells with: (left) tissue-of-origin, (middle) disease status and (right) sampling status

NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01689-9 ARTICLE

NATURE COMMUNICATIONS |8: 1740 |DOI: 10.1038/s41467-017-01689-9 |www.nature.com/naturecommunications 5

Content courtesy of Springer Nature, terms of use apply. Rights reserved

limitations, the data set had to be downsampled by 57.7%

(Fig. 2a), where it was decided to equal the number of cells from

blood and intestinal samples for a balanced comparison, which led

to the exclusion of more cells from the blood samples. Moreover,

ACCENSE clustered only 50% of the t-SNE-embedded data into

subsets (Fig. 2a). Together, this excluded 78.8% of the cells from

the analysis. The remaining 1.1 million cells were annotated into

142 phenotypically distinct immune subsets14 (Fig. 2a).

To determine whether Cytosplore+HSNE could identify similar

subsets, we embedded the 1.1 million annotated cells (Fig. 2b).

Computation time was in the order of minutes and the analysis

was ﬁnished within an hour, compared to 8 weeks of computation

in the original study. Color coding shows the grouping of subsets

at all hierarchical levels. GMS clustering at the third level

embedding (Fig. 2b, bottom panel) reveals that 75.5% of cells

were assigned to a single subset by both methods (Supplementary

Fig. 4). Hence, to reach similar results it was not necessary to

explore the data at lower (more detailed) levels.

Next, we utilized Cytosplore+HSNE to analyze the complete

dataset on 5.2 million cells, thus including the cells that were

discarded in the SPADE-t-SNE-ACCENSE pipeline. The embed-

dings show by color coding that subsets of the same immune

lineage clustered at all three levels (Fig. 2c). More interestingly, the

cells removed during downsampling (shown in black) and cells

ignored during the ACCENSE clustering (shown in grey) were

positioned throughout the entire map (Fig. 2c). We selected 145

clusters using GMS clustering at the third level and observed that

the identiﬁed clusters contained variable numbers of downsampled

and non-classiﬁed cells (Fig. 2d). These ﬁndings indicate that both

the non-uniform downsampling and the cell losses during the

ACCENSE clustering introduce a potential bias in observed

heterogeneity in the immune system. Cytosplore+HSNE overcomes

this problem as it analyzes all cells and does so efﬁciently.

HSNE identiﬁes rare subsets in the ILC compartment.We

illustrate an exploration workﬂow with Cytosplore+HSNE using

the data set of 5.2 million cells14 (Fig. 3). At the overview level,

4090 landmarks depict the general composition of the immune

system (Fig. 3a) and color coding is applied to reveal CD-marker

expression patterns on the basis of which the major immune

lineages are identiﬁed (Fig. 3b). Next the CD7+CD3−cell clusters

were selected as indicated and a new higher resolution embedding

was generated at level 3 of the hierarchy (Fig. 3c). Here, coloring

of the landmarks based on marker expression (Fig. 3c, top panels)

and a density plot of the embedding is shown (Fig. 3d) alongside

the clinical features of the subjects from which the samples

were obtained and the tissue-origin of the landmarks (Fig. 3c,

bottom panels). This reveals a cluster of cells abundantly

present in the intestine of patients with refractory celiac disease

(RCDII). In addition, a large cluster of CD45RA+CD56+NK cells

and three distinct innate lymphoid cell (ILC) clusters with a

characteristic lineage−CD7+CD161+CD127+marker expression

proﬁle19,20 are visualized. Strikingly, a distinct population of

CD7+CD127−CD45RA−and partly CD56+cells is found in

between the NK, RCDII and ILC cell clusters.

To uncover the phenotypes of these ILC-related clusters, we

next embedded the ILC and ILC-like clusters (Fig. 3c, selection) at

the full single-cell data level (59,775 cells; 1.2% of total) (Fig. 3e).

The marker expression overlays revealed that the majority of cells

are CD7+and displayed variable expression levels for CD127,

CD45RA, and CD56 (Fig. 3e). In addition, and in line with

previous reports21,22, (co-)expression of CD127 with CD27,

CRTH2, and c-KIT revealed the phenotypes corresponding to

helper-like ILC type 1, 2 and 3, respectively (indicated by arrows

in Fig. 3e). Moreover, by visualizing the tissue-origin in the

Cytosplore+HSNE embedding the tissue-speciﬁc location of ILC

and ILC-related phenotypes became evident (Fig. 3e).

Subset Phenotype Annotation

16 CD127+CD161+CD25+CD122–CRTH2+ILC2

15 CD127+CD161+CD25+CD122–CRTH2–ILC2-like

4 CD56+NKp46+CD127–CD161–c-KIT–NK-like

17 CD56+NKp46+CD127+CD161–c-KIT–ILC1-like

9 CD56+NKp46+CD127+CD161–c-KIT–ILC1-like

11 CD56+NKp46+CD127+CD161–c-KIT–ILC1-like

10 CD56+NKp46+CD127–CD161–c-KIT–NK-like

1 CD7–CD127+CD161+c-KIT+ILC3-like

5 CD7+CD127+CD161+c-KIT+ILC3

12 CD56+CD127+CD161+c-KIT–CD27–ILC1-like

19 CD56–CD127–NKp46–CD161dim Lin- cells

13 CD56–CD127–NKp46–CD161dim Lin- cells

18 CD56–CD127–NKp46+CD161–Lin- cells

14 CD56–CD127–NKp46+CD161–Lin- cells

6 CD56–CD127–NKp46+CD161+Lin- cells

8 CD56–CD127–NKp46+CD161+Lin- cells

7 CD56+CD127–CD45RA–CD161–NK-like

2 CD56+CD127–CD45RA–CD161+NK-like

3 CD56+CD127–CD45RA–CD161+NK-like

Fig. 4 CD127+ILC and ILC-like subsets identiﬁed by Cytosplore+HSNE. Table showing cluster number, distinguishing phenotypic marker expression proﬁles

and biological annotation for the clusters identiﬁed in Fig. 3e. Black color indicates clusters described in previous reports and red color additional unknown

clusters. Hierarchical clustering of clusters based on marker expression proﬁle shown in the heatmap depicted in Fig. 3f

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01689-9

6NATURE COMMUNICATIONS |8: 1740 |DOI: 10.1038/s41467-017-01689-9 |www.nature.com/naturecommunications

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Next, we performed GMS clustering on the full data level

embedding, which resulted in 19 phenotypically distinct clusters

(Fig. 3e, right plots) based on marker expression proﬁles (Fig. 3f).

The cell surface phenotypes of 8 out of the 19 clusters (Fig. 3f)

matched previously described21 biological annotations (Fig. 4,

black annotations) including the CRTH2+ILC2 (cluster 16), c-

KIT+ILC3 (cluster 5) and CD56−CD127−lineage−IELs (cluster

19, 13, 18, 14, 6, and 8), the latter representing innate type of

lymphocytes with dual T-cell precursor and NK/ILC traits23–25.

Remarkably, the remaining 11 clusters strongly resembled distinct

ILC types, but did not fulﬁl the complete phenotypic require-

ments according to established nomenclature21 (Fig. 4, red

annotations). For example, cluster 15 is highly similar to ILC2

(cluster 16) based on the expression of CD7, CD127, CD161,

and CD25, but lacks the ILC2-deﬁning marker CRTH2. Also,

clusters 17, 9 and 11 bear close resemblance to ILC1 based

on CD7+CD127+c-KIT−marker expression proﬁle, but lack the

ILC-deﬁning CD161 marker. Finally, cluster 1 is very similar to

ILC3 (cluster 5) based on CD127, CD161 and c-KIT positivity,

but lacks the lymphoid marker CD7. Interestingly, the ILC3

(cluster 5) and ILC3-like (cluster 1) populations resided mainly in

intestinal biopsies of patient with Crohn’s disease (Fig. 3f) and

may be related. Cluster 4 was mainly present in peripheral blood

of patients with RCDII, suggesting a possible association with this

pre-malignant disease state. Importantly, three clusters (4, 17, and

19) (Fig. 3f) were essentially missed in our previous study14 due

to the downsampling. Finally, all identiﬁed cell clusters consist to

a variable extent of cells that were downsampled in the original

analysis (Fig. 3g). Thus, the analysis of the full data set provides

increased detail and conﬁdence in establishing the phenotypes of

these low abundance innate cell subsets.

HSNE identiﬁes rare CD4+T-cell subsets in blood. Next, we

selected the CD4+T-cell lineage (Fig. 3a) and show the distribu-

tion of the landmarks at the third level, revealing several clusters

within the CD4+T-cell compartment (Fig. 5a), including a small

CD28−CD4+T-cell memory population (25,398 cells; 0.5% of

total), most likely representing terminally differentiated cells26.

Subsequent analysis at the single-cell level (Fig. 5b) identiﬁed a

CD56+population within the CD28−CD4+T cells that is enriched

in blood of patients with Crohn’s disease (Fig. 5b, bottom panels,

dashed black circle), as well as a CD56−population of CD28−CD4

+T cells (Fig. 5b, bottom panels, dashed yellow circle) present in

blood samples of both patients and controls. Importantly, this

latter cell population was not identiﬁed in our previous publica-

tion due to the non-uniform downsampling of cells (Fig. 5b).

Together, these ﬁndings emphasize that Cytosplore+HSNE is

highly efﬁcient in unbiased analysis of both abundant and rare cell

populations in health and disease by permitting full single-cell

CD28–

memory

Marker expression

CCR7CD45RA CD28 CD27

CD127 CD38

CD161

Data level

2.5×104 cells (0.5 %)

CCR7 CD27

CD45RA CD127 CD28 CD7

CD56

Density

HighLow

Level 3

1.9×106 cells (36.9 %)

RCDII

EATLII

Crohn

CeD

Ctrl

Downsampled

Discarded

Subset

Intestine

Blood

Fig. 5 Analysis of the CD4+T-cell compartment in inﬂammatory intestinal diseases. aThird HSNE level embedding of the CD4+T cells (1.4 × 106cells,

selected in Fig. 3). Color and size of landmarks as described in Fig. 3. Right panel shows density features for the level 3 embedding. Blue encirclement

indicates selection of landmarks representing CD28−CD4+T cells. bEmbedding of the CD28−CD4+T cells (2.6 × 104cells) at single-cell resolution.

Bottom-left panel shows yellow and black dashed encirclements based on CD56−and CD56+expression, respectively. Three bottom-right panels show

cells colored according to: (left) from subjects with different disease status (CeD, Crohn, EATLII, RCDII, and controls), (middle) sampling status (annotated

subset, discarded by ACCENSE and downsampled) and (right) tissue-of-origin (blood and intestine)

NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01689-9 ARTICLE

NATURE COMMUNICATIONS |8: 1740 |DOI: 10.1038/s41467-017-01689-9 |www.nature.com/naturecommunications 7

Content courtesy of Springer Nature, terms of use apply. Rights reserved

resolution. It enables the simultaneous identiﬁcation and visua-

lization of known cell subsets and provides evidence for additional

heterogeneity in the immune system, as it reveals the presence of

cell clusters that were missed in a previous analysis due to

downsampling of the input data. These currently unspeciﬁed cell

clusters might represent intermediate stages of differentiation or

novel rare cell types with presently unknown function.

HSNE is robust and outperforms current single-cell methods.

While the exploration of the hierarchy requires analysis at multiple

levels, the workﬂow is robust and reproducible as shown in Sup-

plementary Fig. 5. In this exemplary analysis, we obtained the same

Cytosplore+HSNE clusters at the single-cell level upon reconstructing

the hierarchy and embeddings in a matter of minutes (Methods

section). In addition, we tested the Cytosplore+HSNE applicability to

three different public mass cytometry data sets. First, we analyzed a

well-characterized bone marrow data set27 containing 81,747 cells

as a benchmark case (Supplementary Fig. 6) and demonstrated that

the landmarks in the overview level (2632; 3.2% of total) that were

selected by the HSNE algorithm were distributed across almost all

of the manually gated cell types (Supplementary Fig. 6a), indicating

that the global data heterogeneity was accurately preserved. Also,

GMS clustering resulted in HSNE clusters that were phenotypically

similar to the manually gated cell types and displayed additional

diversity within those subsets (Supplementary Fig. 6b). However, as

the power of Cytosplore+HSNE lies in its scalability to data sets

exceeding millions of cells, we also tested the versatility of Cytos-

plore+HSNE by comparing it to other state-of-the-art scalable single-

cell analysis methods and accompanying large data sets (Supple-

mentary Note 3. Cytosplore+HSNE offers advantages over current

scalable single-cell analysis methods, Supplementary Figs. 7and 8).

Here Cytosplore+HSNE computed the analyses of the VorteX data

set5containing 0.8 million cells in 4 min compared to 22 h, using

the publicly available VorteX implementation on the same com-

puter. Similarly, analysis of the Phenograph data set4containing 15

million cells was computed in 3.5 h compared to 40 h, using the

publicly available Phenograph implementation on the same com-

puter. Both analyses show that Cytosplore+HSNE reproduces the

main ﬁndings as presented in the original publications. More

importantly, Cytosplore+HSNE provides the distinct advantage of

visualizing all cells and intracluster heterogeneity at subsequent

levels of detail up to the single-cell level, even for the 15 million of

cell data set, without a need for downsampling. Also, VorteX failed

computing the 5.2 million cell gastrointestinal data set within 3 days

of clustering (regardless of using Euclidian or Angular distance),

where Cytosplore+HSNE accomplished this within 29 min. More-

over, while Phenograph did identify rare clusters that largely con-

sisted of CD56+cells within the CD28−

CD4+memory T cells

(Fig. 5b), these clusters did not accurately correspond to the total

number of CD56+cells, obscuring the association with Crohn’s

disease, further highlighting the advantages of Cytosplore+HSNE

over these other computational tools.

Finally, we investigated whether a density-based downsampling

as implemented for instance by SPADE2, could provide better

results compared to random downsampling. However, solely

applying density-based downsampling does not allow for

quantitative analysis of the resulting sample, as different types

of cells will be reduced by different amounts. To mitigate this

problem, SPADE implements an elaborate pipeline of down-

sampling, clustering and subsequent upsampling to enable for

such a comparison, while this is an inherent part of HSNE.

Therefore, we made a direct comparison between density-based

downsampling used in the SPADE pipeline2and HSNE of the

same 5.2 million cells gastrointestinal data set. On the basis of the

expression of major lineage markers (Fig. 3a), HSNE created six

large clusters (Fig. 3b) in the two-dimensional space at the

overview level where similar landmark cells group closely, laying

out all the cells of one cluster very close to any other cell of the

same cluster, but distant from the cells of the other clusters. The

SPADE analysis on the same data (Supplementary Fig. 9) created

a dendrogram where cells of one cluster are close to cells of other

clusters, while in high-dimensional space, they could be dissimilar

and far apart. Importantly, we compared the ability of the SPADE

analysis to preserve rare cellular subsets with HSNE. Despite

density-based downsampling, several SPADE nodes that were

created displayed a mixture of different phenotypes (under-

clustering) as revealed by the single-cell resolution of a linked t-

SNE analysis that we show for the CD56+CD4+T-cell node as an

example (Supplementary Fig. 9b, node #1), while other SPADE

nodes contained cells with overlapping phenotypes (overcluster-

ing) such as several myeloid cell populations (Supplementary

Fig. 9c, nodes #2–5). In addition, rare subsets such as the CD28−

subpopulations of CD4+memory T cells (Supplementary Fig. 9d)

or the ILC-like clusters (Supplementary Fig. 9e) that we could

identify with HSNE (Figs. 3and 5) were in the resulting SPADE

tree indistinguishable from other CD4+T cells or innate

lymphocytes, respectively (shown by the overlapping distribu-

tions of cells from different nodes); this indicates that SPADE is

less suitable for rare cell analysis. A similar problem was reported

by Amir et. al., where leukemic cells were not separated from

healthy cells in the SPADE tree9. Thus, combining the single-cell

resolution with the enhanced scalability may be critical for the

success of HSNE in preserving rare cells.

Discussion

Mass cytometry data sets generally consist of millions of cells.

Current tools can either extract global information with no

single-cell resolution or provide single-cell resolution but at the

expense of the number of cells that can be analyzed. Conse-

quently, when single-cell resolution is of interest, most current

tools require downsampling of the data sets. However, reducing

the number of included cells in the analysis pipeline may hamper

the identiﬁcation of rare subsets.

To overcome this problem, we introduce Cytosplore+HSNE.On

the basis of a novel hierarchical embedding of the data (HSNE),

Cytosplore+HSNE enables the analysis of tens of millions of cells

using the whole data in a fraction of the time required by

currently available tools. The power of the hierarchical embed-

ding strategy is that Cytosplore+HSNE provides visualizations

of the data at different levels of resolution, while preserving the

non-linear phenotypic similarities of the single cells at each level.

Cytosplore+HSNE enables the user to interactively select the

groups of data points at each resolution level, either hand-picked

or guided by density-based clustering, to further zoom-in on the

underlying data points in the hierarchy up to the single-cell

resolution. Using a data set of 5.2 million cells, we demonstrate

that Cytosplore+HSNE allows a rapid analysis of the composition

of the cells in the data set that, at all levels of the hierarchy, the

representation of these cells preserve phenotypic relationships,

and that one can zoom-in on rare cell populations that were

missed with other analysis tools. The identiﬁcation of such rare

immune subsets offers opportunities to determine cellular para-

meters that correlate with disease.

There is an ongoing scientiﬁc debate on the validity of clus-

tering in t-SNE maps versus direct clustering on the high-

dimensional space. However, it has been shown that stochastic

neighbor embedding (SNE) preserves and separates clusters in the

high dimensional space28. While clustering the data points on

highly non-linear manifolds is possible with complex models, we

argue that the presented approach simpliﬁes clustering

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01689-9

8NATURE COMMUNICATIONS |8: 1740 |DOI: 10.1038/s41467-017-01689-9 |www.nature.com/naturecommunications

Content courtesy of Springer Nature, terms of use apply. Rights reserved

considerably. We show that HSNE efﬁciently unfolds the non-

linearity in the high-dimensional data, as other SNE approaches

do and therefore simpler clustering methods based on locality in

the map sufﬁce to partition the data faithfully (e.g., the density-

based GMS clustering, implemented in Cytosplore+HSNE). Espe-

cially when combined with an interactive quality control

mechanism to visually inspect residual variance within each

cluster, the kernel size can be selected such that within-cluster

variance is minimized, and thereby supports the validity of the

cluster with respect to potential underclustering. This is indeed

conﬁrmed by comparisons to other scalable tools (i.e., Phenograph

and VorteX), showing that Cytosplore+HSNE provides a superior

discriminatory ability to identify and visualize rare phenotypically

distinct cell clusters in large data sets in a very short time span.

However, depending on user preference, Cytosplore+HSNE can be

used in conjunction with such direct clustering approaches. This

allows the user to identify additional heterogeneity that is poten-

tially missed by direct clustering, and provides the tools for an

informed merging and splitting of clusters as the user deems

appropriate. The recent application of mass cytometry and other

high-dimensional single-cell analysis techniques has greatly

increased the number of phenotypically distinct cell clusters

within the immune system. This raises obvious questions about

the true distinctiveness and function of such cell clusters in health

and disease, an issue that is beyond the scope of the present study

but needs to be addressed in future studies.

In conclusion, Cytosplore+HSNE allows an interactive and fast

analysis of large high-dimensional mass cytometry data sets from a

global overview to the single-cell level and is coupled to patient-

speciﬁcfeatures.Thismayprovidecrucialinformationforthe

identiﬁcation of disease-associated changes in the adaptive and

innate immune system which may aid in the development of dis-

ease- and patient-speciﬁc treatment protocols. Finally, Cytosplore

+HSNE applicability goes beyond analyzing mass cytometry data sets

as it is able to analyze any high-dimensional single-cell data set.

Methods

HSNE algorithm. HSNE builds a hierarchy of local and non-linear similarities of

high-dimensional data points13, where landmarks on a coarser level of the hier-

archy represent a set of similar points or landmarks of the preceding more detailed

level. To represent the non-linear structures of the data, the similarity of these

landmarks is not described by Euclidian distance, but by the concept of AoI on

landmarks of the preceding level. The similarities described in every level of the

hierarchy are then used as input for an adapted version of the similarity-based

embedding technique BH-SNE15 for visualization.

The algorithm works as follows: First, a weighted k-nearest neighbor (kNN)

graph is computed from the raw input data. For optimal performance and

scalability, the neighborhoods are approximated as described in ref. 16. The weight

of the link between the two data points in the kNN graph describes the similarity of

the connected data points.

In the subsequent steps, the hierarchy is built based on the similarities of the data

level. To this extent, a number of random walks of predeﬁned length is carried out

starting from every node in the kNN graph, using the similarities as probability for

the next jump; similar nodes to the current node are more likely to be the target of

the next jump. Nodes in the graph that are reached more often are considered more

important and selected as landmarks for the next coarser level. The number of

landmarks is selected in a data-driven manner, based on this importance. The AoI

of a landmark is deﬁned by a second set of random walks started from all nodes

(data points or landmarks on the preceding level). Here, the length is not

predeﬁned. Rather, once a landmark is reached, the random walk terminates. The

inﬂuence on the node is then deﬁned for every reached landmark as the fraction of

walks that terminated in that landmark. Inversely, the AoI for each landmark is

deﬁned as the set of all nodes that reached this landmark at least once in this second

set of random walks. Consequently, since multiple random walks initiated at the

same node can end in different nodes, the AoIs of different landmarks can overlap.

We use this overlap to deﬁne a new neighborhood graph at the levels above the

data level. Here, two nodes in the graph corresponding to landmarks at this level

are connected if they have overlapping AoIs, where the link between the nodes is

weighted by the number of data points in the overlapping area. This process is

carried out iteratively, until a predeﬁned number of hierarchical levels has been

constructed. For the full technical details, we refer to our previous work13.

HSNE implementation in Cytosplore+HSNE. We implemented our integrated

analysis tool Cytosplore+HSNE using a combination of C + + , javascript and

OpenGL. All computationally demanding parts are implemented in C + + and

make use of parallelization, where possible. The density estimation and GMS

clustering make use of the graphics processing unit (GPU), as described in our

original publication on Cytosplore29, if possible, allowing clustering of millions of

points in less than a second. We implemented the visualizations of the embedding

in OpenGL on the GPU, for optimal performance, and less computational

demanding visualizations, such as the heatmap, in javascript. We implemented the

HSNE algorithm in C + + , as presented in ref. 13. Since we use the sparse data

structures, memory consumption strongly depends on the data complexity. Max-

imum memory consumption during the construction of a four level hierarchy plus

overview embedding of the 841,644 cell VorteX data set was 1,684 MB, construc-

tion of a ﬁve-level hierarchy of our human inﬂammatory intestinal diseases data

set, consisting of 5,220,347 cells required a maximum of 9,357 MB of main

memory, and ﬁnally, the 15,299,616 cell Phenograph data set required a maximum

of 24.3 GB of memory during the computation of a ﬁve-level hierarchy plus the

overview embedding. Computation times for the described hierarchies plus the ﬁrst

level embedding after 1,000 iterations were 4 min, 29 min, and, 3 h and 37 min,

respectively, on a HP Z440 workstation with a single intel Xeon E5-1620 v3 CPU (4

cores) clocked at 3.5 Ghz, 64 GB of main memory and an nVidia Geforce GTX 980

GPU with 4 GB of memory, running Windows 7.

Human gastrointestinal disorders mass cytometry data set. Detailed descrip-

tion of the mass cytometry data set on human gastrointestinal disorders can be

found in our previous work14. In brief, samples (N=102) were collected from

patients who were undergoing routine diagnostic endoscopies. The cells from the

epithelium and lamina propria were isolated from two or three intestinal biopsies

by treatment with EDTA followed by a collagenase mix under rotation at 37 °C. We

analyzed single-cell suspensions from biological samples including duodenum

biopsies (N=36), rectum biopsies (N=13), perianal ﬁstulas (N=6), and PBMC

from control individuals (N=15) and from patients with inﬂammatory intestinal

diseases (celiac disease (CeD), N=13; RCD type II (RCDII), N=5; enteropathy-

associated T-cell lymphoma type II (EATLII), N=1 and Crohn’s disease (Crohn),

N=10). A CyTOF panel of 32 metal isotope-tagged monoclonal antibodies was

designed to obtain a global overview of the heterogeneity of the innate and adaptive

immune system. Primary antibody metal-conjugates were either purchased or

conjugated in-house. Procedures for mass cytometry antibody staining and data

acquisition were carried out as previously described27. CyTOF data were acquired

and analyzed on-the-ﬂy, using dual-count mode and noise-reduction on. All other

settings were either default settings or optimized with a tuning solution. After data

acquisition, the mass bead signal was used to normalize the short-term signal

ﬂuctuations with the reference EQ passport P13H2302 during the course of each

experiment and the bead events were removed30.

Processing of mass cytometry data. We transformed data from the human

inﬂammatory intestinal diseases data set using hyperbolic arcsin with a cofactor of

5 directly within Cytosplore+HSNE. We discriminated live, single CD45+immune

cells with DNA stains and event length for the human inﬂammatory intestinal

diseases study. We analyzed other data (Phenograph and VorteX data sets) as was

available, except the transformation using hyperbolic arcsin with a cofactor of 5.

Cytosplore+HSNE analysis. Cytosplore+HSNE facilitates the complete exploration

pipeline in an integrated manner (see Supplementary Movie 1). All presented tools

are available for every step of the exploration and every level of the hierarchy. Data

analysis in Cytosplore+HSNE included the following steps: We applied the arcsin

transform with a cofactor of ﬁve upon loading the data sets. After that, we started a

new HSNE analysis and deﬁned the markers that should be used for the similarity

computation. We used markers CD3, CD4, CD7, CD8a, CD8b, CD11b, CD11c,

CD14, CD19, CD25, CD27, CD28, CD34, CD38, CD45, CD45RA, CD56, CD103,

CD122, CD123, CD127 CD161, CCR6, CCR7, c-KIT, CRTH2, IL-15Ra, IL-21R,

NKp46, PD-1, TCRab, and TCRgd for the human inﬂammatory intestinal diseases

data set, all available markers for the bone marrow benchmark dataset, surface

markers CD3, CD7, CD11b, CD15, CD19, CD33, CD34, CD38, CD41, CD44,

CD45, CD47, CD64, CD117, CD123 and HLA-DR for the Phenograph dataset, and

markers CD3, CD4, CD5, CD8, CD11b, CD11c, CD16/32, CD19, CD23, CD25,

CD27, CD34, CD43, CD44, CD45.2, CD49b, CD64, CD103, CD115, CD138,

CD150, 120g8, B220, CCR7, c-KIT, F4/80, FceR1a, Foxp3, IgD, IgM, Ly6C, Ly6G,

MHCII, NKp46, Sca1, SiglecF, TCRb, TCRgd and Ter119 to construct the hier-

archy for the VorteX data set. We used the standard parameters for the hierarchy

construction; number of random walks for landmark selection: N=100, random

walk length: L=15, number of random walks for inﬂuence computation: N=15.

For any clustering that occurred the GMS grid size was set to S=256 ref. 2. The

reduction factor from one level in the hierarchy to the next coarser level is com-

pletely data-driven. In our experiments with mass cytometry data, the number of

landmarks was consistently reduced by roughly one order of magnitude from one

level to the next. Embeddings consisting of only a few hundred points usually

provide little insight. Therefore, we deﬁned the number of levels such that the

overview level could be expected to consist of in the order of 1,000 landmarks

NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01689-9 ARTICLE

NATURE COMMUNICATIONS |8: 1740 |DOI: 10.1038/s41467-017-01689-9 |www.nature.com/naturecommunications 9

Content courtesy of Springer Nature, terms of use apply. Rights reserved

meaning N=5 for the human inﬂammatory intestinal diseases data set and Phe-

nograph data set, N=3 for the bone marrow benchmark data set, and N=4 for the

VorteX data set. Building the hierarchy automatically creates a visualization of the

overview level using BH-SNE. Cytosplore+HSNE enables color coding of the land-

marks using expression (e.g., Fig. 3a) of any provided markers or by sample. For

example, we created the clinical feature (e.g., Fig. 3c, bottom-left panel) and blood/

intestine (e.g., Fig. 3c, bottom-right panel) color schemes based on samples for the

human inﬂammatory intestinal diseases data set within Cytosplore+HSNE, and for

the Phenograph data set, we created a color scheme that represented the sample

coloring as provided in ref. 4(Supplementary Fig. 7). For zooming into the data, we

generally selected cells based on visible clusters, either using manual selection or by

selecting clusters derived by using the GMS clustering. For the VorteX data set, we

clustered the third level embedding (Supplementary Fig. 8). We speciﬁed a kernel

size of 0.18 of the embedding size, to match the 48 clusters created by the X-shift

clustering described in ref. 5, resulting in 50 clusters.

For subset classiﬁcation, we ﬁrst cluster the embedding at a given level using the

GMS clustering. Next, we inspect the clustering by using the integrated descriptive

marker statistics and heatmap visualization. If there is still meaningful variation of

the marker expression within clusters, we zoom further into these clusters. If

clusters are phenotypically homogeneous, the corresponding cell types are deﬁned

by inspecting the full marker expression proﬁle in the heatmap and then the cluster

is exported from any level in the hierarchy.

Data availability. The gastrointestinal mass cytometry data set that supports the

ﬁndings of this study is publicly available on Cytobank, experiment no 60564.

https://community.cytobank.org/cytobank/experiments/60564. The source code of

the HSNE library, written in C+ +, is available at https://github.com/Nicola17/

High-Dimensional-Inspector. Furthermore, we provide a Cytosplore+HSNE installer

for Windows, allowing exploration of several million cells, for academic use at

https://www.cytosplore.org.

Received: 16 June 2017 Accepted: 25 September 2017

References

1. Saeys, Y., Gassen, S. V. & Lambrecht, B. N. Computational ﬂow cytometry:

helping to make sense of high-dimensional immunology data. Nat. Rev.

Immunol. 16, 449–462 (2016).

2. Qiu, P. et al. Extracting a cellular hierarchy from high-dimensional cytometry

data with SPADE. Nat. Biotechnol. 29, 886–891 (2011).

3. Zunder, E. R., Lujan, E., Goltsev, Y., Wernig, M. & Nolan, G. P. A continuous

molecular roadmap to iPSC reprogramming through progression analysis of

single-cell mass cytometry. Cell Stem Cell 16, 323–337 (2015).

4. Levine, J. H. et al. Data-Driven phenotypic dissection of AML reveals

progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).

5. Samusik, N., Good, Z., Spitzer, M. H., Davis, K. L. & Nolan, G. P. Automated

mapping of phenotype space with single-cell data. Nat. Methods 13, 493–496

(2016).

6. Spitzer, M. H. et al. IMMUNOLOGY. An interactive reference framework for

modeling a dynamic immune system. Science 349, 1259425 (2015).

7. Hotelling, H. Analysis of a complex of statistical variables into principal

components. J Ed. Psychol.24, 417–441 (1933).

8. van der Maaten, L. J. P. & Hinton, G. E. Visualizing high-dimensional data

using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

9. Amir, E.-A. D. et al. viSNE enables visualization of high dimensional single-cell

data and reveals phenotypic heterogeneity of leukemia. Nat. Biotechnol. 31,

545–552 (2013).

10. Haghverdi, L., Buettner, F. & Theis, F. J. Diffusion maps for high-dimensional

single-cell analysis of differentiation data. Bioinformatics 31, 2989–2998 (2015).

11. Bendall, S. C., Nolan, G. P., Roederer, M. & Chattopadhyay, P. K. A deep

proﬁler’s guide to cytometry. Trends Immunol. 33, 323–332 (2012).

12. Chattopadhyay, P. K., Gierahn, T. M., Roederer, M. & Love, J. C. Single-cell

technologies for monitoring immune systems. Nat. Immunol. 15, 128–135

(2014).

13. Pezzotti, N., Höllt, T., Lelieveldt, B., Eisemann, E. & Vilanova, A. Hierarchical

Stochastic Neighbor Embedding. Comput. Graph. Forum 35,21–30 (2016).

14. van Unen, V. et al. Mass cytometry of the human mucosal immune system

identiﬁes tissue- and disease-associated immune subsets. Immunity 44,

1227–1239 (2016).

15. van der Maaten, L. Accelerating t-SNE using tree-based algorithms. J. Mach.

Learn. Res.15, 3221–3245 (2014).

16. Pezzotti, N. et al. Approximated and user steerable tSNE for progressive visual

analytics. IEEE. Trans. Vis. Comput. Graph. 23, 1739–1752 (2016).

17. Setty, M. et al. Wishbone identiﬁes bifurcating developmental trajectories from

single-cell data. Nat. Biotechnol. 34, 637–645 (2016).

18. Comaniciu, D. & Meer, P. Mean shift: a robust approach toward feature space

analysis. IEEE. Trans. Pattern Anal. Mach. Intell. 24, 603–619 (2002).

19. Spits, H. & Cupedo, T. Innate lymphoid cells: emerging insights in

development, lineage relationships, and function. Annu. Rev. Immunol. 30,

647–675 (2012).

20. McKenzie, A. N. J., Spits, H. & Eberl, G. Innate lymphoid cells in inﬂammation

and immunity. Immunity 41, 366–374 (2014).

21. Spits, H. et al. Innate lymphoid cells--a proposal for uniform nomenclature.

Nat. Rev. Immunol. 13, 145–149 (2013).

22. Robinette, M. L. et al. Transcriptional programs deﬁne molecular characteristics

of innate lymphoid cell classes and subsets. Nat. Immunol. 16, 306–317

(2015).

23. Schmitz, F. et al. Identiﬁcation of a potential physiological precursor of aberrant

cells in refractory coeliac disease type II. Gut. 62, 509–519 (2013).

24. Schmitz, F. et al. The composition and differentiation potential of the duodenal

intraepithelial innate lymphocyte compartment is altered in coeliac disease.

Gut. 65, 1269–1278 (2016).

25. Ettersperger, J. et al. Interleukin-15-dependent T-cell-like innate intraepithelial

lymphocytes develop in the intestine and transform into lymphomas in celiac

disease. Immunity 45, 610–625 (2016).

26. Mou, D., Espinosa, J., Lo, D. J. & Kirk, A. D. CD28 negative T cells: is their loss

our gain? Am. J. Transplant. 14, 2460–2466 (2014).

27. Bendall, S. C. et al. Single-cell mass cytometry of differential immune and drug

responses across a human hematopoietic continuum. Science 332, 687–696

(2011).

28. Shaham, U. & Steinerberger, S. Stochastic neighbor embedding separates well-

separated clusters. arXiv:1702.02670 [stat.ML] (2017).

29. Höllt, T. et al. Cytosplore: Interactive immune cell phenotyping for large single-

cell datasets. Comput. Graph. Forum 35, 171–180 (2016).

30. Finck, R. et al. Normalization of mass cytometry data with bead standards.

Cytometry A 83, 483–494 (2013).

Acknowledgements

The research leading to these results has received funding from Leiden University

Medical Center, the Netherlands Organization for Scientiﬁc Research (ZonMW grant

91112008) and the Technology Foundation STW, the Netherlands (VAnPIRe; grant

12720, and Genes in Space; grant 12721). We thank Drs M.W. Schilham, M. Yazdan-

bakhsh, J. Goeman, K. Schepers, J. van Bergen and S.E. de Jong for critical review of the

manuscript and B. van Lew for narrating the Supplementary Movie 1.

Author contributions

V.v.U., T.H., N.P., F.K., A.V. and B.P.F.L.: Conceived the study. T.H., N.P., A.V. and

B.P.F.L.: Developed the HSNE method and implementation in Cytosplore+HSNE. V.v.U.

and F.K.: Performed the biological analysis and interpretation. T.H.: Performed the

t-SNE scalability analysis and comparison. V.v.U.: Performed the hierarchy robustness

analysis. V.v.U. and T.H.: Performed the comparison with other methods. N.L., M.J.T.R.

and E.E.: Provided conceptual input. V.v.U., T.H., N.P., F.K., A.V. and B.L.: Wrote the

manuscript. All authors discussed the results and commented on the manuscript.

Additional information

Supplementary Information accompanies this paper at doi:10.1038/s41467-017-01689-9.

Competing interests: The authors declare no competing ﬁnancial interests.

Reprints and permission information is available online at http://npg.nature.com/

reprintsandpermissions/

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in

published maps and institutional afﬁliations.

Open Access This article is licensed under a Creative Commons

Attribution 4.0 International License, which permits use, sharing,

adaptation, distribution and reproduction in any medium or format, as long as you give

appropriate credit to the original author(s) and the source, provide a link to the Creative

Commons license, and indicate if changes were made. The images or other third party

material in this article are included in the article’s Creative Commons license, unless

indicated otherwise in a credit line to the material. If material is not included in the

article’s Creative Commons license and your intended use is not permitted by statutory

regulation or exceeds the permitted use, you will need to obtain permission directly from

the copyright holder. To view a copy of this license, visit http://creativecommons.org/

licenses/by/4.0/.

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01689-9

10 NATURE COMMUNICATIONS |8: 1740 |DOI: 10.1038/s41467-017-01689-9 |www.nature.com/naturecommunications

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Terms and Conditions

Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).

Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-

scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By

accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these

purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.

These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal

subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription

(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will

apply.

We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within

ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not

otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as

detailed in the Privacy Policy.

While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may

not:

use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access

control;

use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is

otherwise unlawful;

falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in

writing;

use bots or other automated methods to access the content or redirect messages

override any security feature or exclusionary protocol; or

share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal

content.

In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,

royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal

content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any

other, institutional repository.

These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or

content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature

may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.

To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied

with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,

including merchantability or fitness for any particular purpose.

Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed

from third parties.

If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not

expressly permitted by these Terms, please contact Springer Nature at

onlineservice@springernature.com

Available via license: CC BY 4.0

Content may be subject to copyright.

Supplementary Material 1

Data

November 2017

Vincent van Unen · Thomas Höllt · Nicola Pezzotti · Na Li · Boudewijn P.F. Lelieveldt

Supplementary Material 3

Data

November 2017

Vincent van Unen · Thomas Höllt · Nicola Pezzotti · Na Li · Boudewijn P.F. Lelieveldt

Supplementary Material 2

Data

November 2017

Vincent van Unen · Thomas Höllt · Nicola Pezzotti · Na Li · Boudewijn P.F. Lelieveldt

41467 2017 1689 MOESM4 ESM

Data

November 2017

Vincent van Unen · Thomas Höllt · Nicola Pezzotti · Na Li · Boudewijn P.F. Lelieveldt

Download

41467 2017 1689 MOESM2 ESM

Data

November 2017

Vincent van Unen · Thomas Höllt · Nicola Pezzotti · Na Li · Boudewijn P.F. Lelieveldt

Download

41467 2017 1689 MOESM1 ESM

Data

November 2017

Vincent van Unen · Thomas Höllt · Nicola Pezzotti · Na Li · Boudewijn P.F. Lelieveldt

Download

High-dimensional analysis reveals an immune atlas and novel neutrophil clusters in the lungs of model animals with Actinobacillus pleuropneumoniae-induced pneumonia

Article

Full-text available

Sep 2023
VET RES

Due to the increase in bacterial resistance, improving the anti-infectious immunity of the host is rapidly becoming a new strategy for the prevention and treatment of bacterial pneumonia. However, the specific lung immune responses and key immune cell subsets involved in bacterial infection are obscure. Actinobacillus pleuropneumoniae (APP) can cause porcine pleuropneumonia, a highly contagious respiratory disease that has caused severe economic losses in the swine industry. Here, using high-dimensional mass cytometry, the major immune cell repertoire in the lungs of mice with APP infection was profiled. Various phenotypically distinct neutrophil subsets and Ly-6C⁺ inflammatory monocytes/macrophages accumulated post-infection. Moreover, a linear differentiation trajectory from inactivated to activated to apoptotic neutrophils corresponded with the stages of uninfected, onset, and recovery of APP infection. CD14⁺ neutrophils, which mainly increased in number during the recovery stage of infection, were revealed to have a stronger ability to produce cytokines, especially IL-10 and IL-21, than their CD14⁻ counterparts. Importantly, MHC-II⁺ neutrophils with antigen-presenting cell features were identified, and their numbers increased in the lung after APP infection. Similar results were further confirmed in the lungs of piglets infected with APP and Klebsiella pneumoniae infection by using a single-cell RNA-seq technique. Additionally, a correlation analysis between cluster composition and the infection process yielded a dynamic and temporally associated immune landscape where key immune clusters, including previously unrecognized ones, marked various stages of infection. Thus, these results reveal the characteristics of key neutrophil clusters and provide a detailed understanding of the immune response to bacterial pneumonia. Supplementary Information The online version contains supplementary material available at 10.1186/s13567-023-01207-4.

Immune responses associated with protection induced by chemoattenuated PfSPZ vaccine in malaria-naive Europeans

Article

Full-text available

May 2024

Tuning the perplexity for and computing sampling-based t-SNE embeddings

Preprint

Full-text available

Aug 2023

Widely used pipelines for the analysis of high-dimensional data utilize two-dimensional visualizations. These are created, e.g., via t-distributed stochastic neighbor embedding (t-SNE). When it comes to large data sets, applying these visualization techniques creates suboptimal embeddings, as the hyperparameters are not suitable for large data. Cranking up these parameters usually does not work as the computations become too expensive for practical workflows. In this paper, we argue that a sampling-based embedding approach can circumvent these problems. We show that hyperparameters must be chosen carefully, depending on the sampling rate and the intended final embedding. Further, we show how this approach speeds up the computation and increases the quality of the embeddings.

Phenotypic Alterations in Erythroid Nucleated Cells of Spleen and Bone Marrow in Acute Hypoxia

Article

Full-text available

Dec 2023

Hypoxia leads to metabolic changes at the cellular, tissue, and organismal levels. The molecular mechanisms for controlling physiological changes during hypoxia have not yet been fully studied. Erythroid cells are essential for adjusting the rate of erythropoiesis and can influence the development and differentiation of immune cells under normal and pathological conditions. We simulated high-altitude hypoxia conditions for mice and assessed the content of erythroid nucleated cells in the spleen and bone marrow under the existing microenvironment. For a pure population of CD71+ erythroid cells, we assessed the production of cytokines and the expression of genes that regulate the immune response. Our findings show changes in the cellular composition of the bone marrow and spleen during hypoxia, as well as changes in the composition of the erythroid cell subpopulations during acute hypoxic exposure in the form of a decrease in orthochromatophilic erythroid cells that are ready for rapid enucleation and the accumulation of their precursors. Cytokine production normally differs only between organs; this effect persists during hypoxia. In the bone marrow, during hypoxia, genes of the C-lectin pathway are activated. Thus, hypoxia triggers the activation of various adaptive and compensatory mechanisms in order to limit inflammatory processes and modify metabolism.

SuperCellCyto: enabling efficient analysis of large scale cytometry datasets

Article

Full-text available

Apr 2024
GENOME BIOL

Advancements in cytometry technologies have enabled quantification of up to 50 proteins across millions of cells at single cell resolution. Analysis of cytometry data routinely involves tasks such as data integration, clustering, and dimensionality reduction. While numerous tools exist, many require extensive run times when processing large cytometry data containing millions of cells. Existing solutions, such as random subsampling, are inadequate as they risk excluding rare cell subsets. To address this, we propose SuperCellCyto, an R package that builds on the SuperCell tool which groups highly similar cells into supercells. SuperCellCyto is available on GitHub (https://github.com/phipsonlab/SuperCellCyto) and Zenodo (https://doi.org/10.5281/zenodo.10521294).

Dynamic immune signatures of patients with advanced non-small-cell lung cancer for infection prediction after immunotherapy

Article

Full-text available

Jan 2024

Background Pulmonary infections are a crucial health concern for patients with advanced non–small-cell lung cancer (NSCLC). Whether the clinical outcome of pulmonary infection is influenced by immunotherapy(IO) remains unclear. By evaluating immune signatures, this study investigated the post-immunotherapy risk of pulmonary infection in patients with lung cancer and identified circulating biomarkers that predict post-immunotherapy infection. Methods Blood specimens were prospectively collected from patients with NSCLC before and after chemotherapy(C/T) and/or IO to explore dynamic changes in immune signatures. Real-world clinical data were extracted from medical records for outcome evaluation. Mass cytometry and ELISA were employed to analyze immune signatures and cytokine profiles to reveal potential correlations between immune profiles and the risk of infection. Results The retrospective cohort included 283 patients with advanced NSCLC. IO was associated with a lower risk of pneumonia (odds ratio=0.46, p=0.012). Patients receiving IO and remained pneumonia-free exhibited the most favorable survival outcomes compared with those who received C/T or developed pneumonia (p<0.001). The prospective cohort enrolled 30 patients. The proportion of circulating NK cells significantly increased after treatment in IO alone (p<0.001) and C/T+IO group (p<0.01). An increase in cell densities of circulating PD-1⁺CD8⁺(cytotoxic) T cells (p<0.01) and PD-1⁺CD4⁺ T cells (p<0.01) were observed in C/T alone group after treatment. In IO alone group, a decrease in cell densities of TIM-3⁺ and PD-1⁺ cytotoxic T cells (p<0.05), and PD-1⁺CD4⁺ T cells (p<0.01) were observed after treatment. In C/T alone and C/T+IO groups, cell densities of circulating PD-1⁺ cytotoxic T cells significantly increased in patients with pneumonia after treatment(p<0.05). However, in IO alone group, cell density of PD-1⁺ cytotoxic T cells significantly decreased in patients without pneumonia after treatment (p<0.05). TNF-α significantly increased after treatment with IO alone (p<0.05) but decreased after C/T alone (p<0.01). Conclusions Our results indicate that the incorporation of immunotherapy into treatment regimens may potentially offer protective effects against pulmonary infection. Protective effects are associated with reduction of exhausted T-cells and augmentation of TNF-α and NK cells. Exhausted T cells, NK cells, and TNF-α may play crucial roles in immune responses against infections. These observations highlight the potential utility of certain circulating biomarkers, particularly exhausted T cells, for predicting post-treatment infections.

Essential procedures of single-cell RNA sequencing in multiple myeloma and its translational value

Article

Full-text available

Nov 2023

Multiple myeloma (MM) is a malignant neoplasm characterized by clonal proliferation of abnormal plasma cells. In many countries, it ranks as the second most prevalent malignant neoplasm of the hematopoietic system. Although treatment methods for MM have been continuously improved and the survival of patients has been dramatically prolonged, MM remains an incurable disease with a high probability of recurrence. As such, there are still many challenges to be addressed. One promising approach is single-cell RNA sequencing (scRNA-seq), which can elucidate the transcriptome heterogeneity of individual cells and reveal previously unknown cell types or states in complex tissues. In this review, we outlined the experimental workflow of scRNA-seq in MM, listed some commonly used scRNA-seq platforms and analytical tools. In addition, with the advent of scRNA-seq, many studies have made new progress in the key molecular mechanisms during MM clonal evolution, cell interactions and molecular regulation in the microenvironment, and drug resistance mechanisms in target therapy. We summarized the main findings and sequencing platforms for applying scRNA-seq to MM research and proposed broad directions for targeted therapies based on these findings.

Functional and spatial proteomics profiling reveals intra- and intercellular signaling crosstalk in colorectal cancer

Article

Full-text available

Nov 2023

Precision oncology approaches for patients with colorectal cancer (CRC) continue to lag behind other solid cancers. Functional precision oncology—a strategy that is based on perturbing primary tumor cells from cancer patients—could provide a road forward to personalize treatment. We extend this paradigm to measuring proteome activity landscapes by acquiring quantitative phosphoproteomic data from patient-derived organoids (PDOs). We show that kinase inhibitors induce inhibitor- and patient-specific off-target effects and pathway crosstalk. Reconstruction of the kinase networks revealed that the signaling rewiring is modestly affected by mutations. We show non-genetic heterogeneity of the PDOs and upregulation of stemness and differentiation genes by kinase inhibitors. Using imaging mass-cytometry-based profiling of the primary tumors, we characterize the tumor microenvironment (TME) and determine spatial heterocellular crosstalk and tumor-immune cell interactions. Collectively, we provide a framework for inferring tumor cell intrinsic signaling and external signaling from the TME to inform precision (immuno-) oncology in CRC.

Efficient prediction of the load-carrying capacity of ECC-strengthened RC beams – An extra-gradient boosting machine learning method

Article

Oct 2023

VISTA Expression on Cancer-Associated Endothelium Selectively Prevents T-cell Extravasation

Article

Sep 2023
CIR

Cancers evade T-cell immunity by several mechanisms such as secretion of anti-inflammatory cytokines, downregulation of antigen presentation machinery, upregulation of immune checkpoint molecules, and exclusion of T cells from tumor tissues. The distribution and function of immune checkpoint molecules on tumor cells and tumor-infiltrating leukocytes is well established, but less is known about their impact on intratumoral endothelial cells. Here, we demonstrated that V-domain Ig suppressor of T-cell activation (VISTA), a PD-L1 homologue, was highly expressed on endothelial cells in synovial sarcoma, subsets of different carcinomas, and immune-privileged tissues. We created an ex vivo model of the human vasculature and demonstrated that expression of VISTA on endothelial cells selectively prevented T-cell transmigration over endothelial layers, under physiological flow conditions, whereas it does not affect migration of other immune cell types. Furthermore, endothelial VISTA correlated with reduced infiltration of T cells and poor prognosis in metastatic synovial sarcoma. In endothelial cells, we detected VISTA on the plasma membrane and in recycling endosomes, and its expression was upregulated by cancer cell-secreted factors in a VEGF-A-dependent manner. Our study reveals that endothelial VISTA is upregulated by cancer-secreted factors and that it regulates T-cell accessibility to cancer and healthy tissues. This newly identified mechanism should be considered when using immunotherapeutic approaches aimed at unleashing T cell-mediated cancer immunity.

Stochastic Neighbor Embedding separates well-separated clusters

Article

Full-text available

Feb 2017

Stochastic Neighbor Embedding and its variants are widely used dimensionality reduction techniques -- despite their popularity, no theoretical results are known. We prove that the optimal SNE embedding of well-separated clusters from high dimensions to the real line $\mathbb{R}$ manages to successfully separate the clusters in a quantitative way.

Automated Mapping of Phenotype Space with Single-Cell Data

Article

Full-text available

May 2016

Accurate identification of cell subsets in complex populations is key to discovering novelty in multidimensional single-cell experiments. We present X-shift (http://web.stanford.edu/~samusik/vortex/), an algorithm that processes data sets using fast k-nearest-neighbor estimation of cell event density and arranges populations by marker-based classification. X-shift enables automated cell-subset clustering and access to biological insights that 'prior knowledge' might prevent the researcher from discovering.

Hierarchical Stochastic Neighbor Embedding

Article

Full-text available

Jun 2016
COMPUT GRAPH FORUM

In recent years, dimensionality-reduction techniques have been developed and are widely used for hypothesis generation in Exploratory Data Analysis. However, these techniques are confronted with overcoming the trade-off between computation time and the quality of the provided dimensionality reduction. In this work, we address this limitation, by introducing Hierarchical Stochastic Neighbor Embedding (Hierarchical-SNE). Using a hierarchical representation of the data, we incorporate the well-known mantra of Overview-First, Details-On-Demand in non-linear dimensionality reduction. First, the analysis shows an embedding, that reveals only the dominant structures in the data (Overview). Then, by selecting structures that are visible in the overview, the user can filter the data and drill down in the hierarchy. While the user descends into the hierarchy, detailed visualizations of the high-dimensional structures will lead to new insights. In this paper, we explain how Hierarchical-SNE scales to the analysis of big datasets. In addition, we show its application potential in the visualization of Deep-Learning architectures and the analysis of hyperspectral images.

Interleukin 15-Dependent T Cell-like Innate Intraepithelial Lymphocytes Develop in the Intestine and Transform into Lymphomas in Celiac Disease

Article

Aug 2016

The nature of gut intraepithelial lymphocytes (IELs) lacking antigen receptors remains controversial. Herein we showed that, in humans and in mice, innate intestinal IELs expressing intracellular CD3 (iCD3(+)) differentiate along an Id2 transcription factor (TF)-independent pathway in response to TF NOTCH1, interleukin-15 (IL-15), and Granzyme B signals. In NOTCH1-activated human hematopoietic precursors, IL-15 induced Granzyme B, which cleaved NOTCH1 into a peptide lacking transcriptional activity. As a result, NOTCH1 target genes indispensable for T cell differentiation were silenced and precursors were reprogrammed into innate cells with T cell marks including intracellular CD3 and T cell rearrangements. In the intraepithelial lymphoma complicating celiac disease, iCD3(+) innate IELs acquired gain-of-function mutations in Janus kinase 1 or Signal transducer and activator of transcription 3, which enhanced their response to IL-15. Overall we characterized gut T cell-like innate IELs, deciphered their pathway of differentiation and showed their malignant transformation in celiac disease.

Computational flow cytometry: Helping to make sense of high-dimensional immunology data

Article

Jun 2016
NAT REV IMMUNOL

Recent advances in flow cytometry allow scientists to measure an increasing number of parameters per cell, generating huge and high-dimensional datasets. To analyse, visualize and interpret these data, newly available computational techniques should be adopted, evaluated and improved upon by the immunological community. Computational flow cytometry is emerging as an important new field at the intersection of immunology and computational biology; it allows new biological knowledge to be extracted from high-throughput single-cell data. This Review provides non-experts with a broad and practical overview of the many recent developments in computational flow cytometry.

Visualizing High-Dimensional Data using t-SNE

Article

Jan 2008

Mass Cytometry of the Human Mucosal Immune System Identifies Tissue- and Disease-Associated Immune Subsets

Article

May 2016
IMMUNITY

Inflammatory intestinal diseases are characterized by abnormal immune responses and affect distinct locations of the gastrointestinal tract. Although the role of several immune subsets in driving intestinal pathology has been studied, a system-wide approach that simultaneously interrogates all major lineages on a single-cell basis is lacking. We used high-dimensional mass cytometry to generate a system-wide view of the human mucosal immune system in health and disease. We distinguished 142 immune subsets and through computational applications found distinct immune subsets in peripheral blood mononuclear cells and intestinal biopsies that distinguished patients from controls. In addition, mucosal lymphoid malignancies were readily detected as well as precursors from which these likely derived. These findings indicate that an integrated high-dimensional analysis of the entire immune system can identify immune subsets associated with the pathogenesis of complex intestinal disorders. This might have implications for diagnostic procedures, immune-monitoring, and treatment of intestinal diseases and mucosal malignancies.

Wishbone identifies bifurcating developmental trajectories from single-cell data

Article

May 2016

Recent single-cell analysis technologies offer an unprecedented opportunity to elucidate developmental pathways. Here we present Wishbone, an algorithm for positioning single cells along bifurcating developmental trajectories with high resolution. Wishbone uses multi-dimensional single-cell data, such as mass cytometry or RNA-Seq data, as input and orders cells according to their developmental progression, and it pinpoints bifurcation points by labeling each cell as pre-bifurcation or as one of two post-bifurcation cell fates. Using 30-channel mass cytometry data, we show that Wishbone accurately recovers the known stages of T-cell development in the mouse thymus, including the bifurcation point. We also apply the algorithm to mouse myeloid differentiation and demonstrate its generalization to additional lineages. A comparison of Wishbone to diffusion maps, SCUBA and Monocle shows that it outperforms these methods both in the accuracy of ordering cells and in the correct identification of branch points.

Cytosplore: Interactive Immune Cell Phenotyping for Large Single-Cell Datasets

Article

Jun 2016
COMPUT GRAPH FORUM

To understand how the immune system works, one needs to have a clear picture of its cellular compositon and the cells’ corresponding properties and functionality. Mass cytometry is a novel technique to determine the properties of single-cells with unprecedented detail. This amount of detail allows for much finer differentiation but also comes at the cost of more complex analysis. In this work, we present Cytosplore, implementing an interactive workflow to analyze mass cytometry data in an integrated system, providing multiple linked views, showing different levels of detail and enabling the rapid definition of known and unknown cell types. Cytosplore handles millions of cells, each represented as a high-dimensional data point, facilitates hypothesis generation and confirmation, and provides a significant speed up of the current workflow. We show the effectiveness of Cytosplore in a case study evaluation.

Mean shift towards feature space analysis

Article

Jan 2002

Visual analysis of mass cytometry data by hierarchical stochastic neighbour embedding reveals rare cell types

Abstract and Figures

Supplementary resources (6)

Recommended publications

Human anti-mouse xenorecognition. Provision of noncognate interactions reveals plasticity of T cell...

Progressive increase of CD7– T cells in human blood lymphocytes with ageing

Responsiveness of HIV-specific CD4 T cells to PD-1 blockade

The "early' CD4+CD8+ stage of thymocte differentiation hallmarks the end of a strong positive correl...