ArticlePDF Available

Explainable multiview framework for dissecting spatial relationships from highly multiplexed data

Authors:
  • Joint Research Centre of Computational Biomedicine

Abstract and Figures

The advancement of highly multiplexed spatial technologies requires scalable methods that can leverage spatial information. We present MISTy, a flexible, scalable, and explainable machine learning framework for extracting relationships from any spatial omics data, from dozens to thousands of measured markers. MISTy builds multiple views focusing on different spatial or functional contexts to dissect different effects. We evaluated MISTy on in silico and breast cancer datasets measured by imaging mass cytometry and spatial transcriptomics. We estimated structural and functional interactions coming from different spatial contexts in breast cancer and demonstrated how to relate MISTy’s results to clinical features.
Recovery of structural relationships in the in silico tissues. A Voronoi diagram representation of the generated in silico tissue structures: Tissue 1—random structure, Tissue 2—structure with self-preference of a single cell type and Tissue 3—structure with mutual preferences of two cell types. B Amount of variance explained (percentage points) of the identity of the cell type when taking into account the information about the distribution of cell types in the immediate neighborhood. C Estimated importance of the relationships from the cell type distribution in the immediate neighborhood. Tissue 1 is missing as there was no information captured about the structure of the random tissue. Some of the target cell types are missing as the heatmaps contain only those targets with gain of variance explained of more than 5% and importances larger than 1 (one standard deviation above the mean importance of all predictors for that target). D Estimated importances of the expression of genes in Tissue 3 coming from the immediate neighborhood as predictors of the expression of the target gene. Shown are target genes with variance explained above 4%, highlighted are the interaction with estimated importance of 2 and higher. The genes shown in bold are either top 10 markers of ct0 or ct2. There is only one gene (gene_33) that is in the top 10 markers of the other two cell types. E Distributions of the estimated importance of the predictor-target relationship from the juxtaview of the top 10 markers of the individual cell types to the markers of cell types 1 (above) and 3 (below) in Tissue 3
… 
Evaluating MISTy on mechanistic in silico data. A MISTy was evaluated on the task of reconstruction of simulated interaction networks. Models of intra- and intercellular interactions of four different cell types (cell type specific intracellular networks are shown in Sup. Fig. 2), arranged on a grid representing a tissue, were used to simulate measurements of 29 molecular species. We considered two pipelines, (1) in which cell type information is available and (2) where cell types are not considered. B Increase in explained variance by adding the paraview contribution to the intraview model. Only variables with positive paraview contribution are shown. C Contribution of each view to the prediction of the marker expressions in the meta-model. The stacked barplot represents normalized values of the fusion coefficients of the respective views for each marker. D Receiver operating characteristic (ROC) depicting the aggregate performance of MISTy on the samples for the intraview and paraview, for the two cases with and without cell type information. The dashed lines represent the expected performance of an uninformed classifier, the gray iso-lines represent points in ROC space with informedness (Youden’s J statistic) equal to 0.1, 0.2, 0.5, and 0.8. E Predicted importance of the interactions for the intraview and paraview models for the case with cell type information (for cell type 1) together with the direct interactions from the in silico model (red crosses). Some targets had very low variance and therefore filtered out
… 
R² signature and permutation analysis of IMC data from 46 breast cancer samples. A Imaging mass cytometry example image from a breast cancer sample (HH3 in blue, CD68 in gray, E. cadherin in red and Vimentin in green) and improvement in the predictive performance (variance explained) for all samples when considering multiple views in contrast to a single, intraview (in absolute percentage points). B The relative contribution of each view to the prediction of the expression of the markers. C Distribution of improvement in variance explained when considering multiple views in contrast to a single intraview across all markers and samples with original cell locations and 10 random permutations. The p-value is calculated by a one-sided Wilcoxon rank-sum test. D Distribution of the relative contribution of the intraview, juxtaview, and the paraview to the prediction of the markers across all markers and samples with original cell locations and 10 random permutations. The p-values are calculated by a one-sided Wilcoxon rank-sum test. E First two principal components of the R² signature of the samples colored by grade and clinical subtype, and the importance of the variables of the signature in the principal component analysis. The naming of the variables is in the form Marker_Measure. The measures taken into account are variance explained by the intraview only (intra. R2), total variance explained by the multiview model (multi. R2), and the gain in variance explained (gain. R2)
… 
This content is subject to copyright. Terms and conditions apply.
Explainable multiview framework
fordissecting spatial relationships fromhighly
multiplexed data
Jovan Tanevski1,2, Ricardo Omar Ramirez Flores1, Attila Gabor1, Denis Schapiro1,3,4,5 and
Julio Saez‑Rodriguez1,6*
Background
Highly multiplexed, spatially resolved data is becoming available at an increasing pace
thanks to recent and ongoing technical developments. In contrast to dissociated sin-
gle-cell data, this data informs us on the cell-to-cell heterogeneity in tissue slices while
conserving the arrangement of cells [1]. erefore, each cell can be studied in its micro-
environment. We can observe the spatial distribution of the expression of markers of
interest, their interactions within the local cellular niche and at the level of tissue struc-
ture. All these aspects provide an excellent platform to gain better insight into multicel-
lular processes, in particular cell-cell communication.
e proliferation of spatial technologies leads to the generation of large amounts of
data. Different technologies allow for measuring different types of molecules with vary-
ing resolution, capturing different areas of tissue with diverse numbers of readouts.
Immunofluorescence-based methods allow detection of the expression of tens to hun-
dreds of proteins at subcellular resolution [24] and hundreds to potentially thousands
Abstract
The advancement of highly multiplexed spatial technologies requires scalable meth‑
ods that can leverage spatial information. We present MISTy, a flexible, scalable, and
explainable machine learning framework for extracting relationships from any spatial
omics data, from dozens to thousands of measured markers. MISTy builds multiple
views focusing on different spatial or functional contexts to dissect different effects.
We evaluated MISTy on in silico and breast cancer datasets measured by imaging mass
cytometry and spatial transcriptomics. We estimated structural and functional interac‑
tions coming from different spatial contexts in breast cancer and demonstrated how to
relate MISTy’s results to clinical features.
Keywords: Spatial omics, Multiplexed data, Machine learning, Intercellular signaling
Open Access
© The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits
use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original
author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third
party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the mate
rial. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or
exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://
creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi
cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
METHOD
Tanevskietal. Genome Biology (2022) 23:97
https://doi.org/10.1186/s13059-022-02663-5
*Correspondence:
pub.saez@uni‑heidelberg.de
1 Institute for Computational
Biomedicine, Faculty
of Medicine, Heidelberg
University and Heidelberg
University Hospital,
Heidelberg, Germany
Full list of author information
is available at the end of the
article
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 2 of 31
Tanevskietal. Genome Biology (2022) 23:97
of RNA species at single-cell resolution [5]. Mass spectrometry-assisted methods ena-
ble detection of the expression of a high number of proteins at the resolution of tissue
patches [6, 7] and tens of markers at subcellular resolution [8, 9], and over hundred
metabolites at cellular and subcellular resolutions [10, 11]. Finally, barcoding-based
approaches [12] facilitate the measurement of genome-wide expression at a resolution
of hundreds of microns, i.e., several cells, and are being further developed to increase
the resolution to below ten microns [13, 14]. Complementally, we are also witnessing
the rapid development of methods for spatial localization that combine limited amounts
of spatially resolved data with richer, but dissociated single-cell data [1519], which can
alleviate the various shortcomings of the technologies. erefore, there is a need for
methods to analyze large amounts of rich and spatially resolved data in order to discover
patterns of expression, interaction, and cell functions. In fact, this has been identified as
one of the grand challenges in single-cell data science [20]. ese methods should ide-
ally be able to handle the variety of produced data and scale well with future technology
improvements.
Currently, there is a limited number of computational methods available for the analy-
sis of high-resolution spatially resolved data [21]. One group of methods focuses on the
analysis of the significant patterns and the variability of expression of individual markers
[2225] to describe the landscape of expression within a tissue. Another group of meth-
ods considers, more broadly, the analysis of the interactions between the markers within
different spatial contexts, that is the expression in the directly neighboring cells or the
effect of the expression of a marker in the broader tissue structure. e methods within
the latter group focus mainly on identifying interactions in the local cellular niche, by
establishing the statistical significance of the distribution of automatically identified
cell types in the neighborhood of each cell [2631]. ese methods assume a fixed form
of nonlinear relationship between markers or have a predefined set of spatial contexts
which can be explored. Spatial variance component analysis (SVCA) [32], for example,
goes a step further by examining contributions of different spatial context to the expres-
sion of markers by decomposing the source of variation to three fixed spatial contexts:
intrinsic, environmental, and intercellular effects.
We introduce here a Multiview Intercellular SpaTial modeling framework (MISTy),
an explainable machine learning framework for knowledge extraction and analysis of
highly multiplexed, spatially resolved data. MISTy facilitates an in-depth understanding
of marker interactions by profiling the intra- and intercellular relationships. MISTy is a
flexible framework able to build models to describe the different spatial contexts, that is,
the types of relationship among the observed expressions of the markers, such as intra-
cellular regulation or paracrine regulation. For each of these contexts, MISTy builds a
component in the model, called a view. MISTy allows for a hypothesis-driven and flex-
ible definition and composition of views that fit the application of interest. e views can
also capture functional relationships, such as pathway activities and crosstalk, cell-type-
specific relationships, or focus on relations between different anatomical regions. Each
MISTy view is considered as a potential source of variability in the measured marker
expressions. Each view is then analyzed for its contribution to the total expression of
each marker. e measured contribution points to the relevance of a potential source
of interactions coming from the different spatial contexts and is estimated from the
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 3 of 31
Tanevskietal. Genome Biology (2022) 23:97
view-specific models. Our approach is modular, easily parallelizable, and thus scalable to
samples with millions of cells and thousands of measured markers.
While inspired by other approaches [2225, 2628] to explicitly model the spatial
component of the data, MISTy’s approach is unique: First, it models the complete meas-
ured expression profile and interactions instead of analyzing spatial patterns of single
markers. Second, it is not limited to fixed predefined sources of variation, aggregation,
or representation of the data, but allows for the flexible construction of models to ana-
lyze spatial data. ird, it does not require to annotate the cell type, state, or any other
feature of the spatial unit (cell or spot). We show a more detailed comparison of MISTy
with related methods in Additional file1: TableS1.
erefore, MISTy is not directly comparable to existing related methods. MISTy does
not consider the expression of markers or their patterns individually. MISTy takes into
account simultaneously the entire expression profile coming from different spatial or
functional contexts assumed to explain the overall expression, as described by the mod-
eled views. In principle, MISTy does not require annotation of cells or any other exter-
nal information to describe the influence of the local niche (immediate neighborhood)
or the broader tissue structure. Instead, it is agnostic to potential sources of bias and
operates at the level of the available expression profile. MISTy does not assume linear
or other fixed types of relationship between individual markers. Instead, it constructs
a nonparametric and nonlinear model of the expression of each available marker as a
function of the expression of all other markers at the same time (intrinsically) or the
expression of other markers or features captured in the available views. Finally, unlike
related approaches, MISTy is able to not only estimate the contribution of the available
views, but also infer the importance of relations that can explain their contribution.
We validated MISTy on in silico data generated by a custom algorithm. We further
applied our framework on two different imaging mass cytometry (IMC) datasets con-
sisting of 46 and 720 breast cancer biopsies respectively. On these data sets, we dem-
onstrated how MISTy outperforms available methods by recapitulating previous results
and at the same time adding interpretation and new insights. is enabled us to dis-
cover intra- and intercellular features in triple negative breast cancer that are associated
with clinical outcomes. To our knowledge, this is the first method available to connect
spatially resolved single-cell measurements to the clinical outcomes without the use of
cell type annotation. Finally, MISTy can extract knowledge about the interactions among
signaling pathways and ligands expressed in the microenvironment from different spa-
tial views. We demonstrate this on spatial transcriptomics data of breast cancer. ese
case studies illustrate the flexibility of MISTy as a framework to define exploratory and
hypothesis-driven workflows for the analysis of diverse types of spatial omics data in
basic and translational research.
Results
MISTy: Multiview intercellular spatial modeling framework
MISTy is a late fusion multiview framework for the construction of a domain-specific,
explainable model of the expression of markers (Fig.1, Additional file1: Fig. S1). For each
marker of interest in a sample, we can model cell-cell interactions coming from different
spatial contexts as different views. For example, the first and main view, containing all
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 4 of 31
Tanevskietal. Genome Biology (2022) 23:97
markers of interest, is the intraview, where we relate the expression of other markers to
a specific marker of interest within the same location. To capture the local cellular niche,
we can create a view that relates the expression from the immediate neighborhood of
a cell to the observed expression within that cell; we call this view a juxtaview. To cap-
ture the effect of the tissue structure, we can create a view that relates the expression
of markers measured in cells within a radius around a given cell, and we call this view
a paraview (see “Methods”). Importantly, MISTy is not limited to the abovementioned
views. Other views can be added to the workflow that can offer insight about relations
coming not only as a function of space. For example, views can focus on interactions
between different cell types, interactions within specific regions of interest within a sam-
ple, or a higher-level functional organization.
Formally, we consider a matrix [Y]u, i where each column represents a marker (i = 1. .
n) and each row is a spatial location (u = 1. . L). Y., iis the vector made by all observations
of the marker i. MISTy models its expression as
where
Y
=Y
·,k�=i
, i.e., all markers except the target marker. Fv are models con-
structed by a machine learning algorithm for each view v. G are domain-specific func-
tions that transform the data to generate informative variables (features) from the
expression Y at the corresponding spatial localization X. Optionally, G can depend
(1)
Y
·,i=αI+α0F0
Y
+
v
αv
Fv
Gv
X,
Y,T
,
Fig. 1 MISTy: An explainable multiview framework for modeling intercellular interactions from highly
multiplexed spatial data. MISTy models marker relationships coming from different spatial views: intrinsic
(intraview), local niche view (juxtaview), the broader, tissue view (paraview), or others, based directly on
marker expressions or derived typology or functional characterizations of the data. At output, A MISTy
extracts information about the contribution of different views to the expression of markers in each
spatial unit. B MISTy also estimates the markers’ interactions coming from each view that explains those
contributions. C These results can be described qualitatively as communities of interacting markers for each
view
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 5 of 31
Tanevskietal. Genome Biology (2022) 23:97
on other specific properties T, such as prior knowledge expressed as annotated func-
tions, regions, or cell types. e G functions can be used to generate alternative views
that can be inputs to the model function F. Finally, α are the late fusion parameters of
the meta-model that balances the contribution of each view to the prediction.
MISTy always models a fixed, intraview
F
0
Y
as a baseline view that is independ-
ent of the spatial localization of the cell. Recall that the intraview is modeling the
expression of a target marker as a function of the expression of all other markers in
the same location. It is biologically expected that this intraview will be able to capture
most of the variance of the expression of the measurements: the effects on the meas-
ured markers from outside of the cell are normally lower than the effects of the inter-
actions and regulation coming from within the cell itself [33]. By design, our focus is
to distinguish the non-intrinsic effects from the intrinsic baseline and estimate
important interactions that supplement the explanation of the overall expression. To
this end, other intercellular views are then added to
F
0
Y
. e user can add a num-
ber v of additional, intercellular views and separate the effect of each view for each
marker on the improvement in the predictive performance of the multiview model.
We use the improvement in the predictive performance of the models as a proxy to
estimate their potential as sources of interactions that can be further explored by
extracting feature importances, as outlined in the following. e contribution of each
view is captured by the late fusion parameters α of the meta-model. e intercept on
the other hand captures (implicitly) the environmental effects on the mean expression
of the targets, specific to the analyzed slide. For determining the contribution of the
views, the fusion parameters (except for the intercept) are normalized such that they
sum up to one
α
v=
α
v
V
i=0
α
i
.
e above model is trained in two steps. First, the models for each view are trained
independently. Second, we estimate α parameters of the meta-model after training the
view-specific models independently, by regularized linear regression (ridge regression),
to address potential issues of multicollinearity of the view-specific model predictions.
e regularization parameter is determined automatically [34]. e performance of the
meta-model is estimated by a 10-fold cross validation.
In terms of the choice of algorithm for training models, MISTy is a general framework
and can construct models for the functions F with any algorithm that fulfills two require-
ments. First, the algorithm should construct ensemble models, with constituents trained
on a bootstrap sample (bag) from the data. Second, they should be or consist of explain-
able models. e first criterion guarantees the unbiased use of the measurements in
both steps of model training. e predictions of the constituents of an ensemble model
can be made on portions of the data (out-of-bag) that were not used for their training.
e second criterion means that a global explanation of the model or the feature impor-
tances can be obtained post hoc from the trained models. As proof of principle, in the
implementation used in this manuscript, we consider F to be Random Forest [35] with
100 full, unpruned decision trees with (rounded) square root of the number of variables
selected at every split. Random forests are well known, robust, and flexible models fit-
ting the two criteria outlined above and have been shown to achieve good performance
in various application areas. e feature importances for Random Forest models can be
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 6 of 31
Tanevskietal. Genome Biology (2022) 23:97
explained by the total reduction of variance achieved as the result of splitting by each of
the variables in all constituent trees.
At the first level, the meta-model can be interpreted in two different ways. First, to
answer the question of how much the intercellular views improve the prediction of the
expression in addition to the intracellular view. is can be achieved by comparing the
predictive performance of a single intracellular view vs all views combined in a meta-
model. Second, by comparing the values of the fusion parameters, we can investigate
how much the individual views contribute to explaining the marker expression that led
to the aforementioned improvement in predictive performance (Fig.1A).
At a second level, given this information, we can further analyze the feature impor-
tances. For each target marker, we can inspect each view-specific model and analyze how
important is the contribution of each marker in that view to the prediction of the expres-
sion of the target marker (Fig.1B). us, we estimate the interactions among the mark-
ers from the individual marker and view-specific models. However, for every marker, the
statistical significance of the contribution of the view-specific models in the meta-model
is explicitly taken into account when calculating the importances (see “Importance
weighting and result aggregation”). ese importances correspond to potential relation-
ship between the predictor and the target marker in the specific spatial or other context
modeled by the corresponding view. MISTy outputs the estimated importances of signif-
icant marker relations. Since these relations are based on the importance of a marker in
predicting the target, they cannot be assumed to be directly causal nor directional. e
relations between markers may occur through a network of intermediate interactions
in the specific biological context, which can be further explored by enrichment of these
relationships using curated databases of intra- and intercellular interactions (Fig.1C).
Finally, if multiple samples are available during the analysis, the relationships from indi-
vidual samples are aggregated to produce robust results (see “Methods”). By aggregation,
we accentuate consistently inferred interactions from individual samples and reduce
the number of false positive interactions. We show a more detailed visual overview of
MISTy in Additional file1: Fig. S1.
e interpretation of the estimated relationships (interactions) is dependent on the
view composition and the biological contexts of the available markers. As we show in
this paper, MISTy can capture (i) a structural relationship, such as the spatial organiza-
tion of cell types based on cell type identities or the expression of cells based on cell type
markers, and (ii) a functional relationship between the markers, such as aspects of regu-
latory programs or communication-driven interactions. MISTy is designed as a method
for efficient data exploration and robust hypothesis generation.
In silico performance
Recovering structural relationships ininsilico generated tissues
We first assess the ability of MISTy to recover purely structural relationships decoupled
from the influence of functional relationships. To this end, we generated three in silico
tissues with specified spatial interactions between four cell types [36]. e number of
cells belonging to each of the cell types is approximately equal, to remove the potential
confounding effect of abundance. Cells in the generated in silico tissues are arranged in
space such that the different cell types exhibit different patterns (Fig.2A). In Tissue 1,
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 7 of 31
Tanevskietal. Genome Biology (2022) 23:97
the cell types do not show any preference to any other cell type, i.e., they are arranged
randomly. is is a control tissue, where MISTy is expected to find no relationships from
the spatial context. In Tissue 2, cell type 1 (ct1) exhibits preference to co-localize with
itself (self-preference), while the other cell types do not have any preferences. In Tissue
3, cell type 1 and cell type 3 show mutual preference, while the other cell types have no
preferences.
For each spatial pattern generated, we simulated 100 gene expression markers to cre-
ate a synthetic dataset (Methods). Of the 100 markers, the distributions of 75 markers
were distinguishable between cell types (“informative markers”). e simulated expres-
sion of the remaining 25 markers did not differ between cell types (“uninformative mark-
ers”). To our knowledge, this represents the most comprehensive in silico tissue model
to simulate spatial interactions with continuous cell type markers.
Fig. 2 Recovery of structural relationships in the in silico tissues. A Voronoi diagram representation of the
generated in silico tissue structures: Tissue 1—random structure, Tissue 2—structure with self‑preference of
a single cell type and Tissue 3—structure with mutual preferences of two cell types. B Amount of variance
explained (percentage points) of the identity of the cell type when taking into account the information about
the distribution of cell types in the immediate neighborhood. C Estimated importance of the relationships
from the cell type distribution in the immediate neighborhood. Tissue 1 is missing as there was no
information captured about the structure of the random tissue. Some of the target cell types are missing as
the heatmaps contain only those targets with gain of variance explained of more than 5% and importances
larger than 1 (one standard deviation above the mean importance of all predictors for that target). D
Estimated importances of the expression of genes in Tissue 3 coming from the immediate neighborhood
as predictors of the expression of the target gene. Shown are target genes with variance explained above
4%, highlighted are the interaction with estimated importance of 2 and higher. The genes shown in bold
are either top 10 markers of ct0 or ct2. There is only one gene (gene_33) that is in the top 10 markers of the
other two cell types. E Distributions of the estimated importance of the predictor‑target relationship from
the juxtaview of the top 10 markers of the individual cell types to the markers of cell types 1 (above) and 3
(below) in Tissue 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 8 of 31
Tanevskietal. Genome Biology (2022) 23:97
is model only considers cell type preferences in the immediate neighborhood.
Accordingly, we created view compositions in the MISTy workflows consisting of intra-
view and juxtaview only. e juxtaview threshold was set to the 75th percentile of all
Euclidean distances between first neighbors to simulate potential errors of determining
the correct threshold when applied to real data.
We considered two types of workflows: (i) e first workflow uses only information
about the cell type identity and focuses on reconstructing directly the cell type composi-
tion of the tissue. (ii) e second workflow uses only information about the expression of
the cell type markers and focuses on reconstructing the composition of the tissue based
on the interaction of gene markers without the information about the actual cell types.
For the first workflow, the identity of the cells is captured in the intraview by one-hot
encoding. In particular, each cell is described by a vector of length of the total number of
cell types (4), where all values are equal to zero except for the value of the variable repre-
senting the type that the cell belongs to, which is set to one. e juxtaview then captures
the distribution (total number) of the different cell types in the immediate neighborhood
of each cell. For the second workflow, the intraview for each cell is represented by the
expression of the 100 marker genes. e juxtaview in this workflow captures the total
expression of all marker genes in the immediate neighborhood of each cell.
By modeling the interactions coming from the intraview, we would capture trivial
results of prediction of identity by exclusion or co-occurrence of a large number of
unique cell-type markers. While the target expressions to be modeled remain in the
intraview, to avoid the aforementioned issues and to allow for modeling of self-prefer-
ences, we excluded the intraview-specific model from the meta-model in these work-
flows. As a result, the baseline to compare the multiview model is a model with an
intercept only, i.e., a model that always predicts the mean value of the target variable.
When we applied the first workflow, since the structure of Tissue 1 is random, MISTy
did not capture any information (Fig.2B). For Tissue 2 and Tissue 3, we observed notice-
able increase of variance explained for ct1 and the pair ct1 and ct3, respectively. e
estimated interactions for Tissues 2 and 3 (Fig.2C) uncover the true preferences in the
tissue structure.
When we applied the second workflow to Tissue 3, we obtained the estimated interac-
tions of the markers with high importance in the immediate neighborhood as captured
by the juxtaview (Fig.2D). For each cell type, we first ordered the gene markers by the
absolute value of the difference in the mean of expression to other cell types (differen-
tially expressed markers). We took the top 10 markers for each cell type as representative
markers of that cell type. We compared the distribution of the importances of the rep-
resentative markers per cell type as predictors of the representative markers of a target
cell type. e mutual preference of cell types 1 and 3 is captured unambiguously by the
distribution of the importances of their respective markers as predictors with significant
importance (Fig.2E).
In summary, both workflows converge on the same results—interaction between cell
types 1 and 3—yet the second workflow provides a much more detailed view of the indi-
vidual markers involved with the caveat of added complexity. With the two workflows,
we demonstrated that MISTy is able to reconstruct the structural relationships based on
annotated cell types and by the expression of cell type marker genes independently.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 9 of 31
Tanevskietal. Genome Biology (2022) 23:97
Of note, in a workflow similar to the first one several related approaches, for example
univariate spatial pattern identification and immediate neighborhood analysis methods,
would be able to capture the self-preference structure of Tissue 2. Only the latter group
of methods would be able to capture the true structure of Tissue 3. In a workflow simi-
lar to the second one, only immediate neighborhood analysis methods would be able
to detect marker interactions. Some of them would require preprocessing of the data
(such as clustering) before they can be applied. In the following, we introduce an in silico
experiment that these groups of methods cannot be applied to.
Recovering functional relationships frominsilico generated mechanistic interaction networks
We next assessed the performance of MISTy to reconstruct functional relationships
in in silico intra- and intercellular interaction networks, decoupled from the influence
of structural relationships. To estimate the robustness of MISTy to infer mechanistic
molecular interactions, we created a tissue simulator that can mimic the interactions of
different cell types through ligand binding and subsequent signaling events (Fig.3A; see
Methods”) and simulated two tissue samples. e dynamic model simulates the pro-
duction, diffusion, degradation, and interactions of 29 molecular species including 5
ligands, 5 receptors, and 19 intracellular signaling proteins (see Methods). e model
considers four cell types (Additional file1: Fig. S2) arranged on a two-dimensional lat-
tice, where ligands diffuse and activate cells. e simulated values for every molecular
species (except ligands) at every location are recorded and these images are passed as
input to MISTy (Additional file1: Fig. S3).
We compared two scenarios, one with no information on the cell types: in this case all
the measured cells are treated equivalently; and another scenario where cell type infor-
mation is considered: in this case a MISTy model is built for each cell type. e MISTy
workflow consists of two views, intracellular view and broader tissue structure view
(paraview). e intraview for each cell is represented by the expression of the molecular
species. e paraview captures the weighted expression of all molecular species in the
broader tissue structure with radius of significance of 10.
Overall, the intraview alone (Additional file1: Fig. S4) explains a large amount of vari-
ance of the nodes that appear only in the intracellular space and that are expressed and
regulated by other intracellular nodes. e paraview module increases the model accu-
racy mostly for receptors, which are activated by diffusing molecules in the intercellular
space (Fig.3B). For example, the increase in explained variance was the largest for R3,
R4, and R5, which are the receptors that are expressed in cell type 1 (Sup. Fig.2). We
obtained similar results for all other cell types (Sup. Fig.3 and Sup. Fig.4). When we
compare the predictive performance of this model to a model with a single intraview, we
see the highest improvement in predictive performance for the expressed receptors (R3,
R4, R5) in cell type 1 (Fig.3C). Markers that were not affected by environmental interac-
tions showed, as expected, no improvement in the paraview. It is also clear that when cell
type information is considered, the model explains more variance of the targets (Fig.3B)
and the paraview contributions are generally higher (Fig.3C).
MISTy derives an importance score for each pair of markers (see “Methods”). Using
this score, we can infer intracellular and intercellular molecular interactions. To test
this, we evaluated the performance of MISTy to recover interactions among markers.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 10 of 31
Tanevskietal. Genome Biology (2022) 23:97
Fig. 3 Evaluating MISTy on mechanistic in silico data. A MISTy was evaluated on the task of reconstruction
of simulated interaction networks. Models of intra‑ and intercellular interactions of four different cell types
(cell type specific intracellular networks are shown in Sup. Fig. 2), arranged on a grid representing a tissue,
were used to simulate measurements of 29 molecular species. We considered two pipelines, (1) in which cell
type information is available and (2) where cell types are not considered. B Increase in explained variance by
adding the paraview contribution to the intraview model. Only variables with positive paraview contribution
are shown. C Contribution of each view to the prediction of the marker expressions in the meta‑model.
The stacked barplot represents normalized values of the fusion coefficients of the respective views for each
marker. D Receiver operating characteristic (ROC) depicting the aggregate performance of MISTy on the
samples for the intraview and paraview, for the two cases with and without cell type information. The dashed
lines represent the expected performance of an uninformed classifier, the gray iso‑lines represent points in
ROC space with informedness (Youden’s J statistic) equal to 0.1, 0.2, 0.5, and 0.8. E Predicted importance of
the interactions for the intraview and paraview models for the case with cell type information (for cell type
1) together with the direct interactions from the in silico model (red crosses). Some targets had very low
variance and therefore filtered out
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 11 of 31
Tanevskietal. Genome Biology (2022) 23:97
First, we defined the ground truth interactions from the in silico model. e 24 molecu-
lar species in the model give rise to 552 potentially interacting pairs. We considered an
intracellular interaction correct if there is a direct interaction between the markers in
the in silico model’s networks (Sup. Fig.2). Further, an intercellular interaction is cor-
rect between two markers, if one marker is directly responsible for a ligand production
and the other marker is activated by the same ligand. For example, X14 produces ligand
L1 and L1 is activating receptor R1, thus X14 -> R1 is considered as real intercellular
interaction. Between the two samples, we observed small variance in MISTy’s perfor-
mance, in both the area under the receiver operating characteristic curve (AUROC)
and the area under the precision-recall curve (AUPRC) (Additional file1: Fig. S5). We
aggregated the results from both samples (see “Methods” section for details) and cal-
culated the performance for cell type 1 (Fig. 3D) and all other cell types (Additional
file1: Fig. S5). e average AUROC across the four cell types are 0.851 and 0.715 for
the intrinsic and paraview, which strongly exceeds the performance of a random clas-
sifier (AUROCrandom of 0.5). Further, the method also outperformed random classifier
with respect to the AUPRC: the obtained AUPRC for the four cell types ranged between
0.581 and 0.737 for the intraview (number of true interactions 34–40; AUPRCintra,random
0.062–0.065) and between 0.022 and 0.053 for the paraview (number of true interactions
10–16; AUPRCpara,random 0.018–0.025). e low AUPRC baseline is due to the sparsity of
true intercellular interactions in contrast to the total number of interactions between the
cells. e sparsity of these interactions, which is also inherent in real biological systems,
adds high complexity to the task of reconstruction of the direct connections. In sum-
mary, MISTy is able to reliably extract interactions in the in silico case study.
In particular, MISTy accurately captured the downstream intracellular signaling cas-
cades of receptors (Fig.3E, left). Most of the false positive interactions are the results of
inferring indirect or higher-order interactions, while false negative hits are likely because
of the lack of perturbation: for example, node X10 in cell type 1 has no incoming edge
(Supp Fig.2), which results in a slowly decaying value in simulation. Finding the interac-
tion partner of these types of nodes would be challenging or rather impossible for any
data-driven inference method. Finding the mechanistic intercellular interactions is par-
ticularly challenging because we are looking for 10–16 real interactions among the 552
possible interactions. Most of the interactions found by MISTy correctly involve recep-
tors (Fig.3E, right); however, we found higher false positive rates.
With these workflows, we demonstrated the extent MISTy is able to reconstruct the
mechanistic relationships in general and focused on a cell type of interest in a complex
and fully observable. ese results also outline the limitations of the approach, such as
those caused by the presence of confounders and indirect interactions, whose effect
becomes more prominent when reconstructing mechanistic intercellular relationships.
Unfortunately, the performance cannot be directly compared to other related
approaches, due to their limitation to infer interactions without additional sources of
information. e comparison with the most related approach, SVCA, is limited and at
best only qualitative at the level of estimated contributions of the fixed views provided
by SVCA. SVCA does not provide information about the potential interactions that
explain the estimated contributions. Note also that the computational resources needed
to construct models even on the in silico data by SVCA are orders of magnitude larger
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 12 of 31
Tanevskietal. Genome Biology (2022) 23:97
than those needed by MISTy. For a single layout with 4000 locations, on a standard con-
figuration using 4 processor cores, SVCA took 24 h of computation time and 2.5 GB of
memory, while MISTy took 18 s of computation time (4800 fold decrease) and 900 MB
of memory (2.5 fold decrease).
Application toimaging mass cytometry breast cancer datasets
Analyzing theimportance ofthetissue structure
As a first real-data case study, we applied MISTy to an Imaging Mass Cytometry data-
set consisting of 46 samples of breast cancer across three tumor grades coming from
26 patients, with measurements of 26 protein markers [28]. We processed each sam-
ple independently, with 944 cells on average per sample, or 43,434 single cells in total.
We designed the exploratory MISTy workflow for this task to include three different
views capturing different spatial contexts and providing a foundation for comparison
with SVCA: In addition to the intraview, we considered creating views by aggregating
the available spatial and expression information in two ways. We created a view that
describes the local cellular niche (juxtaview) and a view that describes the broader tissue
structure (paraview). In order to avoid ambiguity, we set the zone of indifference for the
paraview to the cutoff threshold for the juxtaview. In this way, there is no overlap in the
information captured by the juxtaview and the paraview. e following results illustrate
the importance of the various sources of spatial information and how MISTy can reca-
pitulate previous findings without the need for single-cell clustering and cell type anno-
tation using prior knowledge [28].
We aggregated the MISTy results from all samples and we found that the multiview
model resulted in significant improvements in the absolute value of variance explained
of up to 20.1% over using the intraview alone, which accounted for an average of 31.8%
of overall variance explained across all markers (Additional file1: Fig. S6A). is is con-
sistent with results obtained with SVCA, on the same data [32]. e highest improve-
ment was detected for the markers pS6 (4.63% ± 4.99), CREB (4.07% ± 3.08), and SMA
(3.83% ± 3.19) (Fig.4A). is is expected since these three markers have distinct spa-
tial distributions: pS6 represents “active” stroma present in distinct regions of the
tumor microenvironment, SMA represents smooth muscle Actin, which is expressed
in ductal structures and blood vessels; and CREB is a transcription factor commonly
overexpressed and activated in tumor regions. e highest change in variance explained
(20.1%) in a single sample was observed for Erk12. All top ranked markers by improve-
ment found by MISTy are consistent with the highest improvement due to environmen-
tal effect in the results of SVCA.
We next analyzed the contribution of each view to the prediction of the multiview
model (Fig.4B). With MISTy, unlike SVCA, we were able to dissect the effect of the jux-
taview and paraview. We find that a significant contribution (higher value of the fusion
parameter in the meta-model) comes from the paraview compared to the juxtaview.
is suggests a stronger effect from the broader tissue structure than from the immedi-
ate neighbors. e mean fraction of contribution to the prediction of the intraview was
69.5%, of the juxtaview 5.3%, and of the paraview 25.1%.
To investigate the importance of tissue structure for the modeling of spatially
resolved single cell data, we performed a spatial permutation-based analysis and
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 13 of 31
Tanevskietal. Genome Biology (2022) 23:97
Fig. 4 R2 signature and permutation analysis of IMC data from 46 breast cancer samples. A Imaging mass
cytometry example image from a breast cancer sample (HH3 in blue, CD68 in gray, E. cadherin in red and
Vimentin in green) and improvement in the predictive performance (variance explained) for all samples
when considering multiple views in contrast to a single, intraview (in absolute percentage points). B The
relative contribution of each view to the prediction of the expression of the markers. C Distribution of
improvement in variance explained when considering multiple views in contrast to a single intraview across
all markers and samples with original cell locations and 10 random permutations. The p‑value is calculated
by a one‑sided Wilcoxon rank‑sum test. D Distribution of the relative contribution of the intraview, juxtaview,
and the paraview to the prediction of the markers across all markers and samples with original cell locations
and 10 random permutations. The p‑values are calculated by a one‑sided Wilcoxon rank‑sum test. E First
two principal components of the R2 signature of the samples colored by grade and clinical subtype, and the
importance of the variables of the signature in the principal component analysis. The naming of the variables
is in the form Marker_Measure. The measures taken into account are variance explained by the intraview only
(intra. R2), total variance explained by the multiview model (multi. R2), and the gain in variance explained
(gain. R2)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 14 of 31
Tanevskietal. Genome Biology (2022) 23:97
compared the results obtained by MISTy. e coordinates of each cell in each sam-
ple were permuted 10 times. Subsequently, we ran the aforementioned MISTy work-
flow on the resulting 10 new datasets. e mean gain in variance explained for the
permuted data across all samples and markers was 0.5% with 49.8% of values of the
gain of variance explained less or equal to 0 (Fig.4C). e availability of true tissue
structure improves the performance of the mode significantly. e estimated contri-
bution of the juxtaview and paraview for the permuted datasets was much lower than
for the original dataset, and often nearly absent. In addition, there were significantly
higher contributions of the intraview than for the original dataset (Fig.4D, Additional
file1: Fig. S6B). For the permuted dataset, the mean baseline variance explained over
all samples and markers by using only the intraview was found to be consistent, i.e.,
remained the same as for the original dataset (31.8%).
Subsequently, we analyzed our results by the spatial variance signature (R2 signa-
ture) of each sample. We defined the R2 signature of the MISTy results for each sam-
ple by concatenating the estimated values of the variance explained using only the
intraview, the variance explained by the multiview model, and the gain in variance
explained for each marker. Note that the signature relates only to the results pro-
duced by MISTy for each sample and can capture different aspects of them. In this
case of the R2 signature, the performance achieved by MISTy per target for each sam-
ple. ese signatures are not related to a signature composed of biological markers for
the samples and thus does not provide any insights into specific marker relationships.
Here, we use the R2 signature representation to group the samples by similarity of the
results. e use of R2 signature allows us to compare the results of MISTy to SVCA as
reported in the manuscript describing SVCA.
e maximum length of the signature vector for each sample in this dataset is 78
(26 markers × 3 measures) dimensional, when using the information for all markers.
From our signature vectors and in the following analyses, we removed the results of
the performance of the markers that have less than 2% of gain in variance explained.
is resulted in signature vectors of length 27 (9 markers × 3 measures).
Using the first two components of the principal component analysis (PCA) of the
R2 signature, we identified a weak but visible structure in the samples driven by the
tumor-grade and clinical subtypes, which is consistent with the findings of SVCA
(Fig.4E). e two first principal components of the R2 signatures of MISTy captured
50.8% of the variance of the samples compared to 30% with the spatial variance signa-
ture of SVCA. Inspecting the importance of the R2 signature components for the PCA
analysis (Fig.4E), we observed that the structure of the results can be explained by the
gain in variance explained, which points again to the relevance of the spatial compo-
nent of the data. In particular, the gain for markers CD68, ki67, and SMA were found
to be the highest, suggesting that proliferation, presence, or absence of CD68 and
changes in vascularization in different grades and clinical subtypes are significantly
affected by the change in regulation as a result of intercellular interactions. Collec-
tively, these results support the importance of the tissue structure for the expression
of proteins at the single-cell level and overall overlap with results from SVCA and the
initial performed single-cell analysis.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 15 of 31
Tanevskietal. Genome Biology (2022) 23:97
Since the output of SVCA only includes fractional contributions of fixed views, com-
parison beyond this point is not possible. As shown in the following, MISTy significantly
extends the scope of possible analyses to be performed on the data.
While the R2 signature allows us to analyze the differences between the samples
based on predictive performance only, more insights into the relationships between
the markers can be obtained by a more detailed signature at the level of the estimated
importances. e importance signature is generated by concatenating the estimated
importance for each predictor-target marker pair from all views. e aggregated impor-
tances are weighted by the estimated relevance of the results (see “Methods”). In this
case, we created a 26-marker × 26-marker vector for 3 views (2028 dimensions). Same
as before, we removed the results of the importance of the target markers that have less
than 2% of gain in variance explained. is resulted in signature vectors of length 702 (9
target markers × 26 predictor markers × 3 views). e signature vector for each sample
is, therefore, still large but more informative and focused on interactions. e structure
in the results, driven by the tumor grade, can also be observed when visualizing the first
two principal components of the importance signature (Fig.5A). Due to richer informa-
tion, they account for less (16%) of the variance of the samples compared to the R2 signa-
ture. By inspection of the importance of the signature components, we observed that in
the two first principal components the most significant interactions that can account for
the observed structure and differences among the samples come from the broader tissue
view.
To confirm that the structure of the samples can be observed complementary as the
result of accounting for the spatial component of the data and is not simply a result of the
intrinsic expression of the markers, we performed PCA on the samples as represented by
the mean expression of the markers across all cells. While the separation of the samples
by grade is observable when visualizing the first two principal components (account-
ing for 63.9% of the sample variance), the importance of the markers that account for
this separation is more uniform and different from the components of the R2 signature
(Additional file1: Fig. S6C). More importantly with the R2 and importance signatures,
we were able to identify a clearer and more informative relationship between the avail-
ability of information coming from the different spatial contexts and tumor progression.
is information can be then used to focus on exploratory and comparative analysis of
more homogeneous groups, with lower variance of performance among samples.
Highlighting intergroup dierences
By grouping the samples by tumor grade, we further analyzed the robust intercellular
features of tumor samples. Since only a small number of samples came from grade 2
tumors, we considered only grade 1 and grade 3 tumor samples. In grade 1 samples,
we observed the highest gain of variance explained for markers Cytokeratin 7 (4.92% ±
4.43), SMA (3.96% ± 3.09), and CREB (3.8% ± 2.31). In grade 3 samples, we observed
the highest gain of variance explained for pS6 (6.95% ± 5.09), SMA (4.43% ± 3.39), and
CREB (4.26% ± 3.52).
We further compared the aggregated results by contrasting the important interactions
between the same views intragroup and intergroup. Due to the higher overall contri-
bution of the paraview compared to the juxtaview (Fig.4B), we analyzed the important
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 16 of 31
Tanevskietal. Genome Biology (2022) 23:97
interactions that can be extracted from the paraview model that have not been found as
significant for the intraview model, i.e., capture interactions coming only from the con-
text of the broader tissue structure. In grade 1 samples, we observed predictor markers
with high importance for many target markers: the transcription factor markers CREB
Fig. 5 Importance signature and contrasts of IMC data from 46 breast cancer samples. A First two principal
components of the importance signature of the samples colored by grade and clinical subtype, and
importance of the variables of the signature in the principal component analysis (10 variables with the
highest square cosine shown). The naming of the variables is in the form View_Predictor_Target, representing
the estimated importance of the interaction between the predictor and target markers for the specific spatial
context (view). B Intragroup contrast of importances of marker expression as predictors of the expression
of each target marker between the intraview and paraview for grade 1 samples and between the intraview
and paraview for grade 3 samples. C Change of the total number of estimated important interactions per
grade (Importance 0.5). D Intergroup contrast of importances of marker expression as predictors of the
expression of each target maker for the intraview and for the paraview between grade 1 and grade 3 samples
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 17 of 31
Tanevskietal. Genome Biology (2022) 23:97
and GATA3, the immune cell marker CD68, and the myoepithelial marker SMA. In
grade 3 samples, we observed a pronounced decline of the number of important interac-
tions coming from the paraview compared to grade 1 samples, with the most important
predictor markers being Cytokeratin 7 and SMA (Fig.5B). is is likely representing the
loss of luminal cell types (Cytokeratin 7) interacting with myoepithelial cells (SMA) in
ducts and alveoli, leading to the loss of normal tissue architecture in grade 3 samples. In
other words, normal tissues are highly structured, and the underlying tissue structure is
critical to perform tissue relevant functions. Advanced tumors create tissue structures
that are dominated by tumor cells and thus are not as dependent on cellular crosstalk
and organization.
e loss of signaling during tumor progression is apparent when comparing the results
view by view for the different tumor grades. e intergroup and view focused contrasts
outline the interactions that were estimated as important in grade 1 samples and not
important in grade 3 samples. While we observed a loss of a number of important inter-
actions from the intraview, the loss of important interactions from the paraview is higher
(Fig.5C, D).
Linking estimated interactions toclinical features
To highlight the ability to associate MISTy results with clinically relevant features, we
analyzed a breast cancer imaging mass cytometry dataset with outcome data, based on
415 samples from 352 patients (see “Methods” section for sample selection) [37]. As with
the previous dataset, we processed each sample with MISTy independently and used the
exploratory MISTy workflow with three views capturing different spatial contexts: intra-
view, juxtaview, and paraview.
e number of important interactions, as with the previous breast cancer data set,
decreased with tumor progression based on grading (Fig.6A) across all three views. e
highest median improvements were detected for similar markers as shown in the previ-
ously described breast cancer data set showing reproducibility across multiple sample
cohorts (Additional file1: Fig. S7A and B). Visualization of the network communities
based on the estimated importance of the predictor—target pairs from the juxtaview for
grades 1 and 3, highlights the rewiring of the tumor microenvironment during breast
cancer progression. While CK14 and CK5 (green pair; top right corner) consistently
interact with each other representing the basal and luminal cell compartment, immune
cells seem to increase their interaction with other immune cells (e.g., B cells (CD20+))
and with cells potentially undergoing epithelial-mesenchymal-transition (EMT) (e.g.,
Twist+) (Fig.6B). Next, we plotted the first two components of the PCA of the results
represented by their importance signatures to visualize how tumor grade (Fig.6C) and
clinical subtypes (Fig.6D) are distributed.
We decided to focus our further analysis on grade 3 tumors only since grade 1
and 2 samples are mostly annotated with HR+HER2 clinical subtypes (96% and
81% respectively) and the distribution of clinical subtypes for grade 3 samples is
more balanced (47.6% HR, 52.4% HR+ (out of 185), with the HR group contain-
ing mostly triple negative subtype (79% out of 88 samples) and in the HR+ group
58,7% out of 97 samples are HR+HER2). We next asked whether there are specific
predictor-target interactions with high importance that could be linked to survival
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 18 of 31
Tanevskietal. Genome Biology (2022) 23:97
overall and for the different clinical subtypes. The performance MISTy achieved
and the view contributions for the group 3 samples specifically are shown in Addi-
tional file1: Fig. S7C and D. The predictor-target interactions that were estimated
as important and are specific to the juxtaview and paraview are shown in Additional
Fig. 6 MISTy signatures can uncover clinically relevant features in IMC data from 415 breast cancer samples.
A Change of the total number of estimated important interactions per grade (Importance 0.5). B Changes
in the tumor microenvironment can be visualized by network community plots representing the juxtaview
for tumor grades 1 and 3. For example, the green cluster represents a constant link in the juxtaview between
luminal‑ (CK8/18+) and basal‑like (CK5+) cell types across all tumor grades, while the yellow cell cluster
shows an increased interaction with tumor progression of immune cells (CD68+/CD45+), B cells (CD20+)
and T cells (CD3+) with cells potentially undergoing EMT (Twist+). C Importance signatures visualized as the
first two components of a PCA highlight the separation of grade and D clinical subtype. Kaplan‑Meier curves
and p‑values of a log rank test based on stratification by estimated importance of MISTy predictor‑target
interactions that were found to be correlated with the patient outcome: E CC3.cPARP and EGFR in the
intraview; F SMA and pHH3 in the juxtaview, and G Vimentin and EGFR in the paraview
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 19 of 31
Tanevskietal. Genome Biology (2022) 23:97
file1: Fig. S7E and F respectively. Finally, the first two principal components of the
importance signature of the samples, where we did not see further strong grouping
according to clinical subtype.
To associate MISTy results with clinical outcome, we calculated the Spearman rank
correlation coefficients between the estimated importance of target-marker pairs to
the overall survivability in months, selecting only pairs that contain at least 30% pos-
itive importance values. Samples from patients with multiple samples were treated
as independent. Next, we performed the analysis accounting for clinical subtypes by
running analysis on those samples independently. In the group of HR+HER2+ sam-
ples (n = 8), the estimated importance of 24 predictor-target pairs (6 intraview, 12
juxtaview, 6 paraview) is significantly correlated to the overall survival (p < 0.05). In
the group of HR+HER2 samples (n = 20), the estimated importance of 51 predic-
tor-target pairs (16 intraview, 14 juxtaview, 21 paraview) is significantly correlated to
the overall survival and in the group of HR-HER2+ samples (n = 11), the estimated
importance of 39 predictor-target pairs (7 intraview, 17 juxtaview, 15 paraview) is
significantly correlated to the overall survival. Importantly, we recover many inter-
actions, without the need of single-cell annotation, that were previously shown to be
linked to poor prognosis. For example, pan-cytokeratin and ER as one of the strong-
est correlations.
For our analysis, we focused specifically on the triple negative samples (n = 26),
since currently no biomarkers are available that could be linked to outcome, where
the estimated importance of 64 predictor-target pairs (26 intraview, 18 juxtaview, 20
paraview) is significantly correlated to the overall survival. We picked from the top
predictor-target pairs correlated with overall survival for each view as an example for
further analysis, but we provide all results for further experimental validation (Addi-
tional file1: TableS2). We grouped the samples by the estimated importance of the
selected predictor-target interaction. If the estimated importance for that predictor-
target interaction in that sample is larger than 0.5, we consider that sample to be in
the positive group; otherwise, we consider the sample to be in the negative group.
We then plotted the Kaplan-Meier curves and performed a log rank test to estimate
the significance of the difference in overall survivability between the two groups. We
found cleaved caspase 3 and cPARP, which are both markers of cell death, when esti-
mated to interact (predictor-target) with EGFR in the intraview, are linked to worse
overall survival (Fig.6E). In the juxtaview, we found that the absence of interaction
of cells expressing myoepithelial marker SMA and pHH3, which represents cells in
the cell cycle (mitosis), is linked to worse overall survival (Fig.6F). This could hint to
the importance of the distance to a blood vessel. As the last example, we also found
that estimated interactions between stromal cells (Vimentin+) and cells with active
RTK signaling (EGFR) to be linked to better overall survival (Fig.6G).
In summary, we could successfully link MISTy results and signatures to clini-
cal features and survival outcomes. The provided list of features can be used as a
resource for future experimental validations, and with an increasing amount of pub-
lished spatial omics datasets linked to clinical data, we expect similar studies across
various disease types and experimental technologies.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 20 of 31
Tanevskietal. Genome Biology (2022) 23:97
Application toaspatial transcriptomics breast cancer dataset
Key features of MISTy are that it is technology agnostic and flexible to analyze differ-
ent spatially resolved data. Even more, the properties of the data obtained from different
technologies can be leveraged to create different explanatory views.
To illustrate this, we analyzed the spatial gene expression profiles of two sections of a
sample of invasive ductal carcinoma in breast tissue profiled with 10x Visium [38]. e
10x Visium slides contain 4992 total spots of 55 μm in diameter per captured area that
enable the profiling of up to 10 cells per spot. With this technology, thousands of spa-
tially resolved genes can be profiled simultaneously within a sample, allowing for the
characterization of molecular processes.
Previously, we have shown the utility of the footprint-based method PROGENy to
robustly estimate the activity of signaling pathways, in both bulk and single-cell tran-
scriptomics [3941]. PROGENy estimates the pathways’ activity by looking at the
expression changes in downstream target genes, rather than on the genes that constitute
the pathway itself. Due to the resolution and the gene coverage of 10x Visium slides, the
same approach can be applied to spatial transcriptomics datasets to enhance the func-
tional view of the data. We estimated pathway activities for two reasons: (1) to reduce
the dimensions of the data of each spot into interpretable and functionally relevant fea-
tures, while still using the information of as many genes as possible, and (2) to provide a
set of features that are more stable than the sparse expression of marker genes.
For each sample section, we estimated the activities of 14 cancer relevant signaling
pathways of each spot using PROGENy [39, 41] (Fig. 7A). While pathway crosstalk
mechanisms are expected within a spot, we hypothesized that the local pathway activity
could also be regulated by neighboring cells in other spots to coordinate cellular pro-
cesses. erefore, we identified a set of 377 expressed genes in both sections annotated
as ligands (Fig.7A) in the meta-resource OmniPath [42] (see “Methods”) and designed
a MISTy pipeline to model pathway activities using three different views: An intraview
of pathway activities and two functional paraviews focusing on pathway activity a para-
view using the estimated pathway activities at each patch and a paraview using the meas-
ured expressions of a set of ligands. Improvement in the prediction of pathway activities
by this multiview model would provide evidence of the relevance of spatial relation-
ships in the regulation and maintenance of the functional state of a spot. Moreover, the
traceable importances of each view may suggest possible mechanisms of intercellular
communication.
e multiview model improved significantly the variance explained of 12 of the
14 pathway activities (t-test on cross validation folds, mean adjusted p-value < 0.1),
with improvements of up to 24% compared to the intraview model in the case of the
estrogen and hypoxia pathways (Fig.4B). We found a mean contribution of 55% of
the intraview, 24% of the ligand expression paraview, and 21% of the pathway activ-
ity paraview to the prediction of pathway activities in the multiview model (Fig.7B).
We compared these results to the model performance in five iterations of slides with
permuted layouts to provide further evidence of the importance of spatial infor-
mation in the prediction of marker pathway activities (see “Methods,” Additional
file1: Fig. S9A,B). As expected, in these random slides, we observed no improve-
ments in variance explained when fitting models with spatially contextualized views
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 21 of 31
Tanevskietal. Genome Biology (2022) 23:97
(Additional file1: Fig. S9A). Moreover, when we compared the view’s contributions
of the models fitted to random and original slides, lower contributions of the para-
views were recovered for the random models (Additional file1: Fig. S9B, Wilcoxon
test log10p < 10). ese results confirmed that MISTy models are able to extract
Fig. 7 Application of MISTy to a spatial transcriptomics dataset. A Schematic of the MISTy pipeline used in
Visium 10x slides. Each visium spot profiles the gene expression of up to 10 cells. Pathway activities were
estimated with PROGENy and a MISTy model was built to predict them using two spatially contextualized
views. B Changes in R2 observed in each predicted pathway after using the multiview model, reflecting the
importance of the spatial context (upper panel). Contribution of each view to the prediction of the pathway
activities in the meta‑model. The stacked barplot represents normalized values of the fusion coefficients of
the respective views for each pathway (lower panel). C Variable importances for the intraview. D Intrinsic
associations of pathway activity scores of NFkB and TNFa, and p53 and MAPK. Spatial distribution of pathway
activities from the first section. Circled areas exemplify niches where coordinated activities were observed.
Scatterplots show the within‑spot relationship between each pair of pathways. E Variable importances for the
pathway paraview. F Spatial distribution of estrogen pathway activities and scaled gene expression of SCT1
(top predictor of Estrogen in the ligand paraview) and TNF
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 22 of 31
Tanevskietal. Genome Biology (2022) 23:97
informative spatial relationships between markers in tissue samples where spatial
organization is expected.
e importances of the features used as predictors in each view are consistent with
biological processes. In the intraview (Fig. 7C), we recovered, among others, associa-
tions between NFkB and TNFa, and P53 and MAPK that have been reported previ-
ously [39]. ese results capture pathway crosstalks within a spot as illustrated in the
spatial distribution of pathway activities shown in Fig. 7D. Predictor importances in
the pathway paraview captured similar associations as the ones captured by the path-
way intraview (Fig.7E). e paraview importances, however, reflect patterns of tissue
organization in which multiple neighboring spots share similar cellular states in larger
areas. If a relationship between two pathways A and B is observed within a spot and a
coordinated local activity of these pathways is happening, then the activity of pathway
A of the neighbors of a given spot indirectly explains its pathway B activity. For exam-
ple, the obtained paraview relationship between NFkB and TNFa, and P53 and MAPK
(Fig.7E) explained the regions where a collection of spots showed coordinated higher or
lower activities of these pathways (Fig.7D, circled areas). Additionally, new associations
between pathways became relevant when taking into account the functional state of the
neighbors of each spot (Fig.7E). In hypoxia, where the contribution of the para pathway
view to the multiview model was 35%, estrogen, PI3K, p53, and WNT pathway activi-
ties had the highest importances, besides EGFR and TGFb that were recovered from
the intraview importances too (Additional file1: Fig. S9C). e local expression of puta-
tive ligands contributed mostly to the prediction of estrogen, WNT, and hypoxia (para
ligand view contribution 34%, Fig.7B). We annotated each ligand-pathway interaction
using Omnipath. We recovered the potential target receptors of all predictor ligands and
assigned them to one of the 14 pathways in PROGENy based on the whole collection of
annotations stored in Omnipath. Additionally, we annotated each predictor ligand as a
direct byproduct of a pathway if they belonged to one of the transcriptional footprints
in PROGENy. From the 195 most important ligand-pathway interactions (importance
2), 130 could be annotated as described above. e 65 unannotated interactions could
represent novel context-dependent intercellular processes and show how MISTy could
be used as a hypothesis generation tool. Among the top annotated interactions observed
between the pathway activities and ligands (Additional file1: Fig. S9D), we recovered the
relationship between STC1 and estrogen pathway activities (Fig.7F). STC1 is a glycopro-
tein hormone that is secreted into the extracellular matrix and has been discussed in the
literature as a promising molecular marker in breast cancer [43]. TNF, STC1’s reported
receptor, showed similar spatial patterns in the slide (Fig.7F), suggesting a potential
intercellular mechanism that mediates estrogen pathway activity. High importances to
predict estrogen pathway activity were observed for other estrogen-receptor-dependent
genes such as EFNA1 and EDN1, as well as for the estrogen responsive gene TFF1 (Addi-
tional file1: Fig. S9E). Interestingly, we observed that ligand importances clustered path-
ways that shared para pathway interactions, such as p53, MAPK, and TGFb. Altogether,
our results showed that MISTy was able to improve the prediction of pathway activities
by incorporating their spatial context. Moreover, we were able to identify known and
novel spatial dependencies between pathway activities and ligands that reflect the func-
tional organization of the tissue.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 23 of 31
Tanevskietal. Genome Biology (2022) 23:97
Discussion
Here we present MISTy, an explainable framework for the analysis of highly multi-
plexed spatial data without the need of cell type annotation. It can scale and is tech-
nology agnostic enabling the analysis of increasingly complex data generated by recent
and upcoming technologies. MISTy complements other methods that leverage spatial
information to explore intercellular interactions. e current approaches focus mainly
only on the local cellular niche, i.e., the expressions measured in the immediate neigh-
borhood of each cell [2629, 31]. Other methods that consider the broader tissue
structure are relatively inflexible [30, 32]. ey consider a fixed form of nonlinear rela-
tionship between markers at predefined spatial contexts (e.g., fixed distance), and they
do not scale well due to their high computational complexity. In contrast, MISTy offers
a flexible range of spatial analyses in a scalable framework. We present a selected set of
workflows for the analysis of spatial data, using not only the marker expressions but also
derived features, such as pathway activities.
We established a performance baseline for MISTy on in silico data before applying
MISTy to real-world data. We showed that MISTy achieves high performance on the
task of reconstructing the intra- and intercellular networks of interactions.
We then applied MISTy to three real-world spatial omics data sets from breast cancer
samples. We applied MISTy on imaging mass cytometry data, capturing dozens of pro-
tein markers at (sub) cellular resolution. e results show that we were not only able to
recapitulate results from the literature without prior-knowledge-based cell type annota-
tion, but to also generate new hypotheses. Our results show that the information that is
available from the expression of markers in the broader tissue structure is often more
important than their expression in the local cellular niche. Of note, this result, which
is biologically intuitive, could not be found with previous methods that do not distin-
guish between para- and juxtaview. is highlights that not only cellular niches but also
the tissue structure has a direct impact on cellular states and should be included in the
“microenvironment” definition. Furthermore, we show how MISTy finds interactions
that are associated with clinical features.
Finally, we applied MISTy on a spatial transcriptomics data set measured with 10x
Visium. Here, thousands of transcripts are measured in spots containing several cells.
Given the richness of the data, we were able to go a step further and consider the analy-
sis of functional features, in the form of pathway activities that were inferred from the
data. In particular, we showed the crosstalk between pathways and the ligand-pathway
interactions in the context of the broader tissue structure in breast cancer. Our results
showed that MISTy in combination with functional transcriptomics tools and prior
knowledge can be used in spatial transcriptomics to uncover coordinated functions that
are maintained in niches of the tissue. Moreover, the explanatory component of the mul-
tiview model provides relevant predictors that could become the base of mechanistic
models.
Although the interactions extracted by MISTy cannot be considered directly as causal,
they can facilitate the downstream analysis of biological systems at the tissue level in
several directions: (i) to predict the behavior of systems under perturbations, by using
the MISTy model to generate marker expressions based on the new conditions; (ii) to
guide the reconstruction of multicellular causal signaling networks, using databases to
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 24 of 31
Tanevskietal. Genome Biology (2022) 23:97
identify mechanisms giving rise to the extracted interactions; and subsequently (iii) to
construct mechanistic models of the dynamical behavior of the system constrained by
the extracted explanations.
e work we presented lays the foundation for further exploration of MISTy in several
directions. One direction is to address the scalability of MISTy to millions of cells and
thousands of markers per sample, which is beyond what the available technologies can
offer, but is likely to come in the near future. To do this, we are exploring approximate
but accurate methods to replace the computationally expensive step of generating views
where the pairwise distances between all cells need to be calculated. Another direction is
the exploration of the performance that can be achieved by MISTy with different ensem-
ble approaches using various types of explainable constituent models. Furthermore,
MISTy can be used to generate more specific views. In particular, views that capture the
spatial expression of specific cell types, so that we can dissect the spatial interactions
between different cell types, or views that focus on regions of the tissue, for example,
healthy vs pathological, where we would model the interactions between the functionally
different regions. Of special interest is also the specialization of MISTy workflows that
focus on the analysis of ligand-receptor interactions while taking into account the spa-
tial context. To this end, we look towards combining MISTy with complementary tools,
such as GCGN [30], MESSI [31], cell2cell [44] and Tensor-cell2cell [45]. In particular, we
plan to explore the integration of databases of intercellular signaling as modeling bias
as in GCGN, focusing workflows on the communication between pairs of cell types as
in MESSI. In another direction, cell2cell results can be used to inform ligand-receptor
analysis with MISTy, or use MISTy’s importance signatures as the input communication
score matrices for Tensor-cell2cell. Finally, MISTy generates a model for each marker of
interest that can be readily used to make predictions of marker expressions under dif-
ferent conditions. For example, we can increase or reduce the expression of a certain
marker in silico and explore the effects of the new condition.
Conclusions
In summary, we believe that MISTy is a valuable tool to analyze spatially resolved data,
adaptable to multiple data modalities and biological contexts, that will also evolve as
experimental techniques improve. An implementation of MISTy as an R package named
mistyR (https:// saezl ab. github. io/ mistyR/) is fully documented and freely available from
GitHub, Bioconductor, and as a Docker image.
Methods
In silico tissue structure
We simulated the data distribution for each cell type by sampling from a multivariate
normal distribution, where each marker had a randomly chosen mean expression with a
narrow variance. To create informative markers between cell types, we randomly adjust
the mean of a marker for each cell type, such that the distributions of a given marker
expression for each cell type are likely to be non-overlapping. For uninformative mark-
ers, mean expression is the same between cell types. After choosing marker-wise mean
expression and adjusting the means for informative markers, a synthetic dataset is gen-
erated by sampling from this distribution. Cells of specific types are matched with their
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 25 of 31
Tanevskietal. Genome Biology (2022) 23:97
corresponding spatial location from the in silico tissue generation to construct a syn-
thetic spatial dataset.
In silico mechanistic model
e mechanistic in silico model is a two-dimensional cellular automata model that
focuses on signaling events; therefore, cell growth, division, motility, and death are
neglected. First, we created two random layouts. To account for cellular heterogeneity
in the tissue, we assigned one of four different cell types CT1, …CT4, to each spot of
the layout or left it empty (intercellular space). Each of these cell types has a distinct set
of receptors expressed and distinct intracellular wiring (Sup. Fig.2). To keep the model
simple, we considered 29 biological species S = {ligands: L1-L5; receptors: R1-R5 intra-
cellular proteins: X10-X29}. e intracellular processes involve the ligand activation of
receptors and downstream signaling nodes, and ligand production/secretion (Fig.3A).
e model simulates the production, diffusion, degradation, and interactions of these 29
molecular species on a 100-by-100 grid. Ligands are produced in each cell type based on
the activity level of their production nodes and then freely diffuse, degrade, or interact
with other cells on the grid. Other molecular species involved in signaling are localized
in the intracellular space and their activity depends on ligand binding and intracellular
wiring.
e model is formally stated by the following partial differential equations for each
species:
is equation describes the diffusion, the production/activation, and the degradation
of the species. We made the following assumptions: cs(x, y, t) is the concentration of spe-
cies s S at the grid point (x, y) at time t. e diffusion is homogenous across the image,
the diffusion coefficient of species s is ds. Only ligands are diffusing, and other intracel-
lular molecules cannot leave the cell.
e production term includes the generation of ligands and the activation of intracel-
lular proteins and receptors. Production depends on the cell types and the activity of
the production node: the ligand production depends linearly on the nodes above them
(supp Fig.2): Pi(x, y, t) = αi, ctXi(x, y, t) for i {L1, L2, L3, L4, L5} and ct CT, the αi, ct coef-
ficient defines which cell type produces which ligands and how strongly the production
depends on the activity of the production node.
Ligands are specific and activate only the corresponding receptors, e.g., L1 activates
R1, L2 activates R2, etc. e activation of the receptor depends on the concentration of
the ligand at the location of the cell.
For intracellular proteins, the protein activity depends on the activity of upstream
nodes. An interaction Xi -> Xj is translated to the equation:
P
j
x,y,t
=βj,i
ct
ci(x
,
y
,
t)
,
where βj, ict encodes the strength of interactions between the nodes in cell type ct.
Degradation is proportional to the concentration of ligands, intracellular proteins, and
ECM, Ds(x, y, t) = γscs(x, y, t), where γs is a constant degradation coefficient.
e above model was simulated from a randomized initial condition, and the activ-
ity distribution (Additional file 1: Fig. S3) was achieved. We considered all markers
(2)
cs
x,y,t
t
=dscsx,y,t+Psx,y,tDsx,y,t
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 26 of 31
Tanevskietal. Genome Biology (2022) 23:97
except the ligands as available measurements for the MISTy workflow. In contrast to the
included measurements, our assumption is that the expression of ligands would be more
difficult to capture in a real experiment and therefore excluded them.
We aggregated the interactions from the mechanistic model for the different cell types
in joint binary matrices of directed ground truth interactions for the different views. To
compare the matrices to the importance matrices from the output of MISTy, we trans-
formed the joint matrices into undirected matrices Au = sgn (A + AT). We then quanti-
fied the performance of MISTy for the task of reconstruction of intra- and intercellular
networks from the true and the extracted interaction matrices.
Data acquisition andprocessing
Imaging mass cytometry
e first imaging mass cytometry dataset consists of 46 samples from 26 breast cancer
patients with varying disease grades [28]. e original data consisted of 50 samples, from
which we removed samples coming from normal tissue. e raw data was segmented
and single cell features were extracted with histoCAT. e samples contain between 267
and 1455 cells with measured expression of 26 proteins/protein modifications. e cell-
level data was preprocessed as defined in Arnol etal. [32] in order to assure the validity
of direct comparison of results.
e second imaging mass cytometry dataset consists of 720 samples from 352 breast
cancer patients from two cohorts, with long-term survival data available for 281 of those
patients [37]. e samples contain measurements of 37 proteins/protein modifications.
e raw data was segmented, and single-cell features were extracted with histoCAT. e
cell-level data was preprocessed as in the original study. To ensure robustness of the
results, we filtered samples containing less than 1000 cells, samples coming from a nor-
mal or control tissue, and samples without annotated tumor grade or clinical subtype,
resulting in a total of 415 samples for our analysis.
Spatial transcriptomics
e data and sample information were obtained from 10x Genomics [38]. e data
consists of spatial transcriptomic measurements of two sections of a sample analyzed
with 10x Genomics Visium. e sections come from tissue from a patient with grade 2
ER+, PR, HER2+, annotated with ductal carcinoma insitu, lobular carcinoma insitu,
and invasive carcinoma. e mean sequencing depths were reported to be 149,800 and
137,262 reads per spot for a total of 3813 and 4015 spots per section respectively. e
median UMI counts per spot were reported as 17,531 and 16,501, and the median genes
per spot as 5394 and 5100 respectively. e raw data was preprocessed and count matri-
ces were generated with spaceranger-1.0.0. Individual count matrices were normalized
with sctransform implemented in Seurat 3.1.2 [46]. For each spot, we estimated signaling
pathway activities with PROGENy’s model matrix using the top 1000 genes of each tran-
scriptional footprint. We retrieved from Omnipath [42] all proteins labeled as ligands
and in each dataset, we filtered all ligands whose expression was captured in at least 5%
of the spots.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 27 of 31
Tanevskietal. Genome Biology (2022) 23:97
View generation
We consider a dataset D = [X(n × s)Y(n × k)], represented as a matrix of dimensions
n × (s + k) of spatially resolved highly multiplexed measurements of a sample, where n
is the number of measured units (pixels, cells, patches) available in the sample, s is the
number of spatial dimensions in the geometry matrix X, and kis the number of meas-
ured markers in the expression matrix Y.
e juxtaview was generated by summing the expressions of its direct neighboring
cells, i.e.,
G
c=
j
Nc
Yj,
·
, where Ncrepresents the set of neighboring cells of cell c. e
neighboring cells for each cell can be determined either during image segmentation, for
example by setting a threshold of membrane-to-membrane distance, or, as in the case
for the application of MISTy on IMC data, by post hoc neighborhood estimation. For the
application of MISTy on IMC data, the neighborhood of each cell in a sample was esti-
mated by constructing a cell graph by 2D Delaunay triangulation followed by removal of
edges with length larger than the 25th percentile of all pairwise cell distances across all
samples, which corresponded to 11.236 microns from the cell centroid.
e paraview was generated by weighted aggregation of the expressions of all cells
(patches) from the sample
G
c=
n
j=
11
dcj z
w
dcj,l
Yj
,.
, where w is a weighing func-
tion, dcjis the Euclidean distance between cells c and j, calculated from matrix X, l is a
parameter controlling the shape of the weighting function w, and z is a parameter con-
trolling the zone of indifference. A juxtaview and a paraview with no zone of indifference
will both contain the expression of the markers in the neighboring cells. Considering
a zone of indifference larger than the immediate neighborhood would ensure that only
the juxtaview will capture the immediate neighborhood, while the paraview will capture
only the broader tissue structure excluding the immediate neighborhood.
We can assume the cells that are closer affect the expression within the cell more than
cells that are farther away to various degrees. e weighing function w controls the con-
tribution of the expression coming from the broader tissue structure as a function of
the distance and a parameter l that captures the radius around the cell where we con-
sider the cells in the broader tissue structure to be significantly contributing to explain-
ing the expression in the cell. Examples of weighting functions in MISTy are the families
of radial basis
w=e
d
2
l
2 , exponential
w=e
d
l
, linear
w
=1
d
l
, and constant functions
w = 1(d l).
For the application of MISTy to both IMC and spatial transcriptomics data, we con-
sidered the family of radial basis functions for weighting and optimized the value for
the parameter l. For each IMC sample, we constructed models for each marker, with
parameter l {25, 50, 100, 200, 400}. is corresponds to an effective radius of influence
of 25 to 400 pixels or micrometers. e mean values of the parameter l across all sam-
ples for all markers are shown in Additional file1: Fig. S8. Given the resolution of 10x
Visium, MISTy models for spatial transcriptomics were built for each pathway activity
considering in the paraview the family of radial basis functions for weighting with values
of parameter l {2, 5, 10}, corresponding to a radius of influence of up to 10 spots. en,
for each marker, we selected the value for l such that the estimated improvement in pre-
dictive performance by using the multiview model in contrast to the intraview model is
maximized. For each model (Random Forest), we estimate the predictive performance
by measuring the variance explained on out-of-bag samples.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 28 of 31
Tanevskietal. Genome Biology (2022) 23:97
Importance weighting andresult aggregation
To calculate the interaction importances from a sample, we used information from
the two layers of interpretability and explainability: the values of the fusion param-
eters (αvin Eq. (1)) in the meta-model and their respective p-values pk(v) for each tar-
get marker k and the importances Ikj(v)of features j for the prediction of each target
marker extracted from the predictive model for view v yields the MISTy interaction
importance:
Since the importances Ikj(v)extracted from a Random Forest model (used for the
current instance of MISTy) represent the amount of variance reduction in the target
expression, the MISTy interaction importances correspond to the standardized value
(by mean
¯
Ikj
(v
)
and variance
σIk
(v)
2
) of the variance reduction weighted by the quantile
1 pk(v) of the statistic under the null hypothesis of zero contribution of the fusion
coefficient for view v for target kin the linear meta-model.
MISTy is conceived to be a framework applicable to any type of omics data. e per-
formance measures and the estimated relationships are independent from the proper-
ties of the variables used to describe the data. e machine learning models (Random
Forest) trained on the specific views are invariant to the scale of the predictor or the
target variables. e measure of the performance of the model (variance explained) is
also chosen such that issues with scale are avoided during training and interpretation.
MISTy infers the interactions between the variables by the proxy task of predicting
the expression, abundance, activity, or any other quantity of the target variables and
estimating the importance of each of the predictor variables for this task. e esti-
mated relationships/importances are related to the amount of reduction of variance
and not to absolute values. e importance derived from variance reduction can be
generalized to any measure of impurity or values extracted by other feature impor-
tance estimation methods, given the model constituents of MISTy. Since the MISTy
importances are standardized, importances from multiple samples can then be aggre-
gated by simple averaging, while their interpretation remains the same.
For views that contain the same set of predictors as targets, we also identified the
communities of interactions from the estimated importances. For this, we trans-
formed the square matrix A of estimated predictor-target interactions to an undi-
rected graph adjacency matrix as Ap = A + AT. We then extract the community
structure from the graph using the Louvain algorithm [47], a commonly used algo-
rithm for community detection by grouping nodes, such that the modularity of the
graph is maximized.
Permutation oftheslides
To evaluate the performance of MISTy models in samples with no spatial organiza-
tion, we generated random samples for both the IMC and spatial transcriptomics data
and ran the same pipeline as for the original data. We permuted the coordinates of
(3)
M
kj(v)=Ikj(v)
¯
Ik(v)
σ
Ik
(v)2
1pk(v)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 29 of 31
Tanevskietal. Genome Biology (2022) 23:97
each spatial unit (cell or spot) for each slide ten times for the IMC data and five times
for the spatial transcriptomics data. Results were grouped and compared to the ones
obtained in the slides with the original spot layout.
Annotation ofpredictive ligands fromthespatial transcriptomics pipeline
To facilitate the interpretation of the paraview ligand importances observed in the spa-
tial transcriptomics models, we assigned each ligand to a PROGENy pathway in two
ways: (1) as a byproduct of pathway activation or (2) as a potential activator of a path-
way. A ligand was considered as a byproduct of a PROGENy pathway if it was part of
its top 1000 footprint genes. A ligand was considered as a putative activator of the path-
way it predicted if at least one of its possible receptors could be assigned to it. We used
OmnipathR to annotate each receptor using the import_omnipath_annotations function
and regular expressions to filter annotations associated with the PROGENy pathway of
interest. Only ligands with a paraview importance 2 were considered in this annota-
tion (Additional file1: Fig. S9D).
Supplementary Information
The online version contains supplementary material available at https:// doi. org/ 10. 1186/ s13059‑ 022‑ 02663‑5.
Additional le1.Supplementary Figures S1 to S9 and supplementary Tables S1 and S2.
Additional le2.
Acknowledgements
We would like to thank Nicolàs Palacio‑Escat for the Fig. 1 design, Olga Ivanova for providing feedback on the first version
of the manuscript, and Ethan Baker for helping with the generation of the in silico tissue structures.
Review history
Review history is available as Additional file 2.
Peer review information
Barbara Cheifet and Stephanie McClelland were the primary editors of this article and managed its editorial process and
peer review in collaboration with the rest of the editorial team.
Authors’ contributions
Conceptualization ‑ J.T. and J.S.R.; Formal analysis ‑ J.T.; Investigation ‑ J.T. and D.S; Methodology ‑ J.T.; Software ‑ J.T., A.G.,
and R.O.R.F.; Super vision ‑ J.S.R.; Visualization ‑ J.T., R.O.R.F., and A.G.; Writing ‑ original draft ‑ J.T., A.G., and R.O.R.F; Writing
‑ review and editing ‑ D.S., and J.S.R. The authors read and approved the final manuscript. All authors consent to publica‑
tion of this article.
Author’s information
Twitter handle: @saezlab (Julio Saez‑Rodriguez)
Funding
Open Access funding enabled and organized by Projekt DEAL. J.T. ack nowledges the financial support from the Euro‑
pean Union and the Slovenian Ministry of Education, Science and Sport (agreement No. C3330‑17‑529021). D.S. was
supported by an Early Postdoc Mobility fellowship (no. P2ZHP3_181475) and was a Damon Runyon Fellow supported by
the Damon Runyon Cancer Research Foundation (DRQ‑03‑20). D.S is currently supported by is supported by the German
Federal Ministry of Education and Research (BMBF 01ZZ2004).
Availability of data and materials
The source code of mistyR is publicly available from Bioconductor [48]. The exact version (1.3.5) of the source code used
for the manuscript is available from Zenodo [49].
The source code for the analysis of the data is available from a public repository [50].
In silico tissue generation code is available from a public repository [51]. The generated mechanistic in silico data is avail‑
able from a public repository [50, 51]. The Imaging Mass Cytometry data is publicly available from Schapiro et al. [28] and
Jackson, Fischer et al. [37]. The Visium spatial transcriptomics data is publicly available from 10x Genomics [38].
Declarations
Ethics approval and consent to participate
Not applicable.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 30 of 31
Tanevskietal. Genome Biology (2022) 23:97
Competing interests
J.S.R. receives funding from GSK and Sanofi and consultant fees from Travere Therapeutics and Astex Pharmaceutical. D.S.
is a consultant for Roche Glycart AG.
Author details
1 Institute for Computational Biomedicine, Faculty of Medicine, Heidelberg University and Heidelberg University Hospital,
Heidelberg, Germany. 2 Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia. 3 Laboratory
of Systems Pharmacology, Harvard Medical School, Boston, MA, USA. 4 Klarman Cell Observatory, Broad Institute of MIT
and Harvard, Cambridge, MA, USA. 5 Institute of Pathology, Faculty of Medicine, Heidelberg University and Heidelberg
University Hospital, Heidelberg, Germany. 6 Joint Research Centre for Computational Biomedicine (JRC‑COMBINE), Faculty
of Medicine, RWTH Aachen University, Aachen, Germany.
Received: 29 October 2021 Accepted: 1 April 2022
References
1. Chen X, Teichmann SA, Meyer KB. From tissues to cell types and back: single‑cell gene expression analysis of tis‑
sue architecture. Ann Rev Biomed Data Sci. 2018;1:29–51. Available from. https:// doi. org/ 10. 1146/ annur ev‑ bioda
tasci‑ 080917‑ 013452.
2. Gut G, Herrmann MD, Pelkmans L. Multiplexed protein maps link subcellular organization to cellular states. Science.
2018;361(6401) Available from. https:// doi. org/ 10. 1126/ scien ce. aar70 42.
3. Lin J‑R, Izar B, Wang S, Yapp C, Mei S, Shah PM, et al. Highly multiplexed immunofluorescence imaging of human
tissues and tumors using t‑CyCIF and conventional optical microscopes. Elife. 2018;11:7. Available from:. https:// doi.
org/ 10. 7554/ eLife. 31657.
4. Saka SK, Wang Y, Kishi JY, Zhu A, Zeng Y, Xie W, et al. Immuno‑SABER enables highly multiplexed and amplified
protein imaging in tissues. Nat Biotechnol. 2019;37(9):1080–90.
5. Zhuang X. Spatially resolved single‑cell genomics and transcriptomics by imaging. Nat Methods. 2021;18(1):18–22.
6. Aichler M, Walch A. MALDI Imaging mass spectrometry: current frontiers and perspectives in pathology research
and practice. Lab Invest. 2015;95(4):422–31.
7. Butler HJ, Ashton L, Bird B, Cinque G, Curtis K, Dorney J, et al. Using Raman spectroscopy to characterize biological
materials. Nat Protoc. 2016;11:664–87. Available from. https:// doi. org/ 10. 1038/ nprot. 2016. 036.
8. Giesen C, Wang HAO, Schapiro D, Zivanovic N, Jacobs A, Hattendorf B, et al. Highly multiplexed imaging of tumor
tissues with subcellular resolution by mass cytometry. Nat Methods. 2014;11(4):417–22.
9. Angelo M, Bendall SC, Finck R, Hale MB, Hitzman C, Borowsky AD, et al. Multiplexed ion beam imaging of human
breast tumors. Nat Med. 2014;20(4):436–42.
10. Passarelli MK, Pirkl A, Moellers R, Grinfeld D, Kollmer F, Havelund R, et al. The 3D OrbiSIMS—label‑free metabolic
imaging with subcellular lateral resolution and high mass‑resolving power. Nat Methods. 2017;14:1175–83. Avail‑
able from. https:// doi. org/ 10. 1038/ nmeth. 4504.
11. Rappez L, Stadler M, Triana S, Gathungu RM, Ovchinnikova K, Phapale P, et al. SpaceM reveals metabolic states of
single cells. Nat Methods. 2021;18(7):799–805.
12. Larsson L, Frisén J, Lundeberg J. Spatially resolved transcriptomics adds a new dimension to genomics. Nat Meth‑
ods. 2021;18(1):15–8.
13. Vickovic S, Eraslan G, Salmén F, Klughammer J, Stenbeck L, Schapiro D, et al. High‑definition spatial transcriptomics
for in situ tissue profiling. Nat Methods. 2019;16(10):987–90.
14. Stickels RR, Murray E, Kumar P, Li J, Marshall JL, Di Bella D, et al. Sensitive spatial genome wide expression profiling at
cellular resolution. bioRxiv. 2020:2020.03.12.989806 Available from: https:// www. biorx iv. org/ conte nt/ 10. 1101/ 2020.
03. 12. 98980 6v1. abstr act.
15. Bageritz J, Willnow P, Valentini E, Leible S, Boutros M, Teleman AA. Gene expression atlas of a developing tissue by
single cell expression correlation analysis. Nat Methods. 2019;16:750–6. Available from. https:// doi. org/ 10. 1038/
s41592‑ 019‑ 0492‑x.
16. Nitzan M, Karaiskos N, Friedman N, Rajewsky N. Gene expression cartography. Nature. 2019;576(7785):132–7.
17. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM 3rd, et al. Comprehensive integration of single‑
cell data. Cell. 2019;177(7):1888–902.e21.
18. Tanevski J, Nguyen T, Truong B, Karaiskos N, Ahsen ME, Zhang X, et al. Gene selection for optimal prediction of cell
position in tissues from single‑cell transcriptomics data. Life Sci Alliance. 2020;3(11) Available from. https:// doi. org/
10. 26508/ lsa. 20200 0867.
19. Cang Z, Nie Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat
Commun. 2020;11(1):2084.
20. Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, et al. Eleven grand challenges in single‑cell
data science. Genome Biol. 2020;21(1):31.
21. Moses L, Pachter L. Museum of spatial transcriptomics. Available from: https:// doi. org/ 10. 1101/ 2021. 05. 11. 443152
22. Sun S, Zhu J, Zhou X. Statistical analysis of spatial expression patterns for spatially resolved transcriptomic studies.
Nat Methods. 2020;348:aaa6090.
23. Edsgärd D, Johnsson P, Sandberg R. Identification of spatial expression trends in single‑cell gene expression data.
Nat Methods. 2018;15(5):339–42.
24. Svensson V, Teichmann SA, Stegle O. SpatialDE: identification of spatially variable genes. Nat Methods.
2018;15(5):343–6.
25. Ghazanfar S, Lin Y, Su X, Lin DM, Patrick E, Han Z‑G, et al. Investigating higher‑order interactions in single‑cell data
with scHOT. Nat Methods. 2020;17(8):799–806.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 31 of 31
Tanevskietal. Genome Biology (2022) 23:97
26. Keren L, Bosse M, Marquez D, Angoshtari R, Jain S, Varma S, et al. A structured tumor‑immune microenvironment in
triple negative breast cancer revealed by multiplexed ion beam imaging. Cell. 2018;174(6):1373–87.e19.
27. Goltsev Y, Samusik N, Kennedy‑Darling J, Bhate S, Hale M, Vazquez G, et al. Deep profiling of mouse splenic architec‑
ture with CODEX multiplexed imaging. Cell. 2018;174(4):968–81.e15.
28. Schapiro D, Jackson HW, Raghuraman S, Fischer JR, Zanotelli VRT, Schulz D, et al. histoCAT: analysis of cell pheno‑
types and interactions in multiplex image cytometry data. Nat Methods. 2017;14(9):873–6.
29. Dries R, Zhu Q, Dong R, Eng C‑HL, Li H, Liu K, et al. Giotto: a toolbox for integrative analysis and visualization of
spatial expression data. Genome Biol. 2021;22(1):78.
30. Yuan Y, Bar‑Joseph Z. GCNG: graph convolutional networks for inferring gene interaction from spatial transcriptom‑
ics data. Genome Biol. 2020;21(1):300.
31. Li D, Ding J, Bar‑Joseph Z. Identifying signaling genes in spatial single‑cell expression data. Bioinformatics.
2021;37(7):968–75.
32. Arnol D, Schapiro D, Bodenmiller B, Saez‑Rodriguez J, Stegle O. Modeling cell‑cell interactions from spatial molecular
data with spatial variance component analysis. Cell Rep. 2019;29(1):202–11.e6.
33. Kramer BA, Pelkmans L. Cellular state determines the multimodal signaling response of single cells. Cold Spring
Harbor Lab. 2019;12(18):880930 Available from: https:// www. biorx iv. org/ conte nt/ 10. 1101/ 2019. 12. 18. 88093 0v1.
abstr act.
34. Cule E, De Iorio M. Ridge regression in prediction problems: automatic choice of the ridge parameter. Genet Epide‑
miol. 2013;37(7):704–14.
35. Breiman L. Random Forests. Mach Learn. 2001;45(1):5–32.
36. Baker EAG, Schapiro D, Dumitrascu B, Vickovic S, Regev A. Power analysis for spatial omics. Available from: https://
doi. org/ 10. 1101/ 2022. 01. 26. 477748
37. Jackson HW, Fischer JR, Zanotelli VRT, Ali HR, Mechera R, Soysal SD, et al. The single‑cell pathology landscape of
breast cancer. Nature. 2020;578(7796):615–20.
38. Datasets ‑Spatial Gene Expression ‑Official 10x Genomics Support. Available from: https:// suppo rt. 10xge nomics.
com/ spati al‑ gene‑ expre ssion/ datas ets
39. Schubert M, Klinger B, Klünemann M, Sieber A, Uhlitz F, Sauer S, et al. Perturbation‑response genes reveal signaling
footprints in cancer gene expression. Nat Commun. 2018;9(1):20.
40. Holland CH, Tanevski J, Perales‑Patón J, Gleixner J, Kumar MP, Mereu E, et al. Robustness and applicability of tran‑
scription factor and pathway analysis tools on single‑cell RNA‑seq data. Genome Biol. 2020;21(1):36.
41. Holland CH, Szalai B, Saez‑Rodriguez J. Transfer of regulatory knowledge from human to mouse for functional
genomics analysis. Biochim Biophys Acta Gene Regul Mech. 2020;1863:194431. Available from. https:// doi. org/ 10.
1016/j. bbagrm. 2019. 194431.
42. Türei D, Korcsmáros T, Saez‑Rodriguez J. OmniPath: guidelines and gateway for literature‑curated signaling pathway
resources. Nat Methods. 2016;13(12):966–7.
43. Chen F, Zhang Z, Pu F. Role of stanniocalcin‑1 in breast cancer. Oncol Lett. 2019;18(4):3946.
44. Armingol E, Ghaddar A, Joshi CJ, Baghdassarian H, Shamie I, Chan J, et al. Inferring a spatial code of cell‑cell interac‑
tions across a whole animal body. bioRxiv. 2020. Available from. https:// doi. org/ 10. 1101/ 2020. 11. 22. 392217.
45. Armingol E, Baghdassarian HM, Martino C, Perez‑Lopez A, Knight R, Lewis NE. Context‑aware deconvolution of cell‑
cell communication with Tensor‑cell2cell. bioRxiv. 2021. Available from. https:// doi. org/ 10. 1101/ 2021. 09. 20. 461129.
46. Hafemeister C, Satija R. Normalization and variance stabilization of single‑cell RNA‑seq data using regularized nega‑
tive binomial regression. Genome Biol. 2019;20(1):1–15.
47. Blondel VD, Guillaume J‑L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mechan.
2008;2008:P10008. Available from:. https:// doi. org/ 10. 1088/ 1742‑ 5468/ 2008/ 10/ p10008.
48. Tanevski J, Ramirez Flores RO. Misty R. Bioconductor; 2021. Available from: https:// bioco nduct or. org/ packa ges/
mistyR
49. Tanevski J. saezlab/mistyR: mistyR 1.3.5 (devel). Zenodo; 2022. Available from: https:// zenodo. org/ record/ 60357 62
50. Tanevski J, Ramirez Flores RO, Gabor A. GitHub ‑ saezlab/misty_pipelines: MISTy pipelines used to generate results
for the paper. GitHub. Available from: https:// github. com/ saezl ab/ misty_ pipel ines
51. Baker E. generate_markers.py. GitHub Gist. Available from: https:// gist. github. com/ ethan agb/ c8080 dc20b 3d060
b9b44 f153d 1f8bf 9e
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... We demonstrated the application of Tensor-cell2cell in cases where samples correspond to distinct patients, but it can be applied to many other contexts. For instance, our strategy can be readily applied to time series data by considering time points as the contexts, and to spatial transcriptomic datasets, by previously defining cellular niches or neighborhoods as the contexts, given their spatial signatures 60 . We have included Tensor-cell2cell as a part of our previously developed tool cell2cell 61 , enabling previous functionalities such as employing any list of LR pairs (including protein complexes), multiple visualization options, and personalizing the communication scores to account for other signaling effects such as the (in)activation of downstream genes in a signaling pathway 55,62,63 . ...
Article
Full-text available
Cell interactions determine phenotypes, and intercellular communication is shaped by cellular contexts such as disease state, organismal life stage, and tissue microenvironment. Single-cell technologies measure the molecules mediating cell–cell communication, and emerging computational tools can exploit these data to decipher intercellular communication. However, current methods either disregard cellular context or rely on simple pairwise comparisons between samples, thus limiting the ability to decipher complex cell–cell communication across multiple time points, levels of disease severity, or spatial contexts. Here we present Tensor-cell2cell, an unsupervised method using tensor decomposition, which deciphers context-driven intercellular communication by simultaneously accounting for multiple stages, states, or locations of the cells. To do so, Tensor-cell2cell uncovers context-driven patterns of communication associated with different phenotypic states and determined by unique combinations of cell types and ligand-receptor pairs. As such, Tensor-cell2cell robustly improves upon and extends the analytical capabilities of existing tools. We show Tensor-cell2cell can identify multiple modules associated with distinct communication processes (e.g., participating cell–cell and ligand-receptor pairs) linked to severities of Coronavirus Disease 2019 and to Autism Spectrum Disorder. Thus, we introduce an effective and easy-to-use strategy for understanding complex communication patterns across diverse conditions. Cellular contexts such as disease state, organismal life stage and tissue microenvironment, shape intercellular communication, and ultimately affect an organism’s phenotypes. Here, the authors present Tensor-cell2cell, an unsupervised method for deciphering context-driven intercellular communication.
Article
Full-text available
Myocardial infarction is a leading cause of death worldwide¹. Although advances have been made in acute treatment, an incomplete understanding of remodelling processes has limited the effectiveness of therapies to reduce late-stage mortality². Here we generate an integrative high-resolution map of human cardiac remodelling after myocardial infarction using single-cell gene expression, chromatin accessibility and spatial transcriptomic profiling of multiple physiological zones at distinct time points in myocardium from patients with myocardial infarction and controls. Multi-modal data integration enabled us to evaluate cardiac cell-type compositions at increased resolution, yielding insights into changes of the cardiac transcriptome and epigenome through the identification of distinct tissue structures of injury, repair and remodelling. We identified and validated disease-specific cardiac cell states of major cell types and analysed them in their spatial context, evaluating their dependency on other cell types. Our data elucidate the molecular principles of human myocardial tissue organization, recapitulating a gradual cardiomyocyte and myeloid continuum following ischaemic injury. In sum, our study provides an integrative molecular map of human myocardial infarction, represents an essential reference for the field and paves the way for advanced mechanistic and therapeutic studies of cardiac disease.
Article
Full-text available
Spatial transcriptomics (ST) has advanced significantly in the last few years. Such advancement comes with the urgent need for novel computational methods to handle the unique challenges of ST data analysis. Many artificial intelligence (AI) methods have been developed to utilize various machine learning and deep learning techniques for computational ST analysis. This review provides a comprehensive and up-to-date survey of current AI methods for ST analysis.
Article
Tumor heterogeneity has emerged as a fundamental property of most human cancers, with broad implications for diagnosis and treatment. Recently, spatial omics have enabled spatial tumor profiling, however computational resources that exploit the measurements to quantify tumor heterogeneity in a spatially-aware manner are largely missing. We present ATHENA, a computational framework that facilitates the visualization, processing and analysis of tumor heterogeneity from spatial omics measurements. ATHENA employs graph representations of tumors and bundles together a large collection of established and novel heterogeneity scores that quantify different aspects of the complexity of tumor ecosystems. Availability and implementation: ATHENA is available as a Python package under an open-source licence at: https://github.com/AI4SCR/ATHENA. Detailed documentation and step-by-step tutorials with example datasets are also available at: https://ai4scr.github.io/ATHENA/. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Tissue development and homeostasis require coordinated cell–cell communication. Recent advances in single-cell sequencing technologies have emerged as a revolutionary method to reveal cellular heterogeneity with unprecedented resolution. This offers a great opportunity to explore cell–cell communication in tissues systematically and comprehensively, and to further identify signaling mechanisms driving cell fate decisions and shaping tissue phenotypes. Using gene expression information from single-cell transcriptomics, several computational tools have been developed for inferring cell–cell communication, greatly facilitating analysis and interpretation. However, in single-cell transcriptomics, spatial information of cells is inherently lost. Given that most cell signaling events occur within a limited distance in tissues, incorporating spatial information into cell–cell communication analysis is critical for understanding tissue organization and function. Spatial transcriptomics provides spatial location of cell subsets along with their gene expression, leading to new directions for leveraging spatial information to develop computational approaches for cell–cell communication inference and analysis. These computational approaches have been successfully applied to uncover previously unrecognized mechanisms of intercellular communication within various contexts and across organ systems, including the skin, a formidable model to study mechanisms of cell–cell communication due to the complex interactions between the different cell populations that comprise it. Here, we review emergent cell–cell communication inference tools using single-cell transcriptomics and spatial transcriptomics, and highlight the biological insights gained by applying these computational tools to exploring cellular communication in skin development, homeostasis, disease and aging, as well as discuss future potential research avenues.
Preprint
Full-text available
Cell interactions determine phenotypes, and intercellular communication is shaped by cellular contexts such as disease state, organismal life stage, and tissue microenvironment. Single-cell technologies measure the molecules mediating cell-cell communication, and emerging computational tools can exploit these data to decipher intercellular communication. However, current methods either disregard cellular context or rely on simple pairwise comparisons between samples, thus limiting the ability to decipher complex cell-cell communication across multiple time points, levels of disease severity, or spatial contexts. Here we present Tensor-cell2cell, an unsupervised method using tensor decomposition, which is the first strategy to decipher context-driven intercellular communication by simultaneously accounting for multiple stages, states, or locations of the cells. To do so, Tensor-cell2cell uncovers context-driven patterns of communication associated with different phenotypic states and determined by unique combinations of cell types and ligand-receptor pairs. We show Tensor-cell2cell can identify multiple modules associated with distinct communication processes (e.g., participating cell-cell and ligand receptor pairs) linked to COVID-19 severities. Thus, we introduce an effective and easy-to-use strategy for understanding complex communication patterns across diverse conditions.
Article
Full-text available
A growing appreciation of the importance of cellular metabolism and revelations concerning the extent of cell–cell heterogeneity demand metabolic characterization of individual cells. We present SpaceM, an open-source method for in situ single-cell metabolomics that detects >100 metabolites from >1,000 individual cells per hour, together with a fluorescence-based readout and retention of morpho-spatial features. We validated SpaceM by predicting the cell types of cocultured human epithelial cells and mouse fibroblasts. We used SpaceM to show that stimulating human hepatocytes with fatty acids leads to the emergence of two coexisting subpopulations outlined by distinct cellular metabolic states. Inducing inflammation with the cytokine interleukin-17A perturbs the balance of these states in a process dependent on NF-κB signaling. The metabolic state markers were reproduced in a murine model of nonalcoholic steatohepatitis. We anticipate SpaceM to be broadly applicable for investigations of diverse cellular models and to democratize single-cell metabolomics.
Preprint
Full-text available
The function of many biological systems, such as embryos, liver lobules, intestinal villi, and tumors depends on the spatial organization of their cells. In the past decade high-throughput technologies have been developed to quantify gene expression in space, and computational methods have been developed that leverage spatial gene expression data to identify genes with spatial patterns and to delineate neighborhoods within tissues. To assess the ability and potential of spatial gene expression technologies to drive biological discovery, we present a curated database of literature on spatial transcriptomics dating back to 1987, along with a thorough analysis of trends in the field such as usage of experimental techniques, species, tissues studied and computational approaches used. Our analysis places current methods in historical context, and we derive insights about the field that can guide current research strategies. A companion supplement offers a more detailed look at the technologies and methods analyzed: https://pachterlab.github.io/LP_2021/.
Article
Full-text available
Spatial transcriptomic and proteomic technologies have provided new opportunities to investigate cells in their native microenvironment. Here we present Giotto, a comprehensive and open-source toolbox for spatial data analysis and visualization. The analysis module provides end-to-end analysis by implementing a wide range of algorithms for characterizing tissue composition, spatial expression patterns, and cellular interactions. Furthermore, single-cell RNAseq data can be integrated for spatial cell-type enrichment analysis. The visualization module allows users to interactively visualize analysis outputs and imaging features. To demonstrate its general applicability, we apply Giotto to a wide range of datasets encompassing diverse technologies and platforms.
Article
Full-text available
Most methods for inferring gene-gene interactions from expression data focus on intracellular interactions. The availability of high-throughput spatial expression data opens the door to methods that can infer such interactions both within and between cells. To achieve this, we developed Graph Convolutional Neural networks for Genes (GCNG). GCNG encodes the spatial information as a graph and combines it with expression data using supervised training. GCNG improves upon prior methods used to analyze spatial transcriptomics data and can propose novel pairs of extracellular interacting genes. The output of GCNG can also be used for downstream analysis including functional gene assignment. Supporting website with software and data: https://github.com/xiaoyeye/GCNG.
Preprint
Full-text available
Cell-cell interactions are crucial for multicellular organisms as they shape cellular function and ultimately organismal phenotype. However, the spatial code embedded in the molecular interactions that drive and sustain spatial organization, and in the organization that in turns drives intercellular interactions across a living animal remains to be elucidated. Here we use the expression of ligand-receptor pairs obtained from a whole-body single-cell transcriptome of Caenorhabditis elegans larvae to compute the potential for intercellular interactions through a Bray-Curtis-like metric. Leveraging a 3D atlas of C. elegans ’ cells, we implement a genetic algorithm to select the ligand-receptor pairs most informative of the spatial organization of cells. Validating the strategy, the selected ligand-receptor pairs are involved in known cell-migration and morphogenesis processes and we confirm a negative correlation between cell-cell distances and interactions. Thus, our computational framework helps identify cell-cell interactions and their relationship with intercellular distances, and decipher molecular bases encoding spatial information in a whole animal. Furthermore, it can also be used to elucidate associations with any other intercellular phenotype and applied to other multicellular organisms. Graphical abstract
Article
Full-text available
Single-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq ex- periments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage. Mapping scRNAseq to genes with spatial information increases coverage while providing spatial location. However, methods to perform such mapping have not yet been bench- marked. To fill this gap, we organized the DREAM Single-Cell Transcriptomics challenge focused on the spatial reconstruc- tion of cells from the Drosophila embryo from scRNAseq data, leveraging as silver standard, genes with in situ hybridization data from the Berkeley Drosophila Transcription Network Project reference atlas. The 34 participating teams used diverse algo- rithms for gene selection and location prediction, while being able to correctly localize clusters of cells. Selection of predictor genes was essential for this task. Predictor genes showed a relatively high expression entropy, high spatial clustering and included prominent developmental genes such as gap and pair- rule genes and tissue markers. Application of the top 10 methods to a zebra fish embryo dataset yielded similar performance and statistical properties of the selected genes than in the Drosophila data. This suggests that methods developed in this challenge are able to extract generalizable properties of genes that are useful to accurately reconstruct the spatial arrangement of cells in tissues.
Preprint
As spatially-resolved multiplex profiling of RNA and proteins becomes more prominent, it is increasingly important to understand the statistical power available to test specific hypotheses when designing and interpreting such experiments. Ideally, it would be possible to create an oracle that predicts sampling requirements for generalized spatial experiments. However, the unknown number of relevant spatial features and the complexity of spatial data analysis makes this challenging. Here, we enumerate multiple parameters of interest that should be considered in the design of a properly powered spatial omics. We introduce a method for tunable in silico tissue generation, and use it with spatial profiling datasets to construct an exploratory computational framework for single cell spatial power analysis. Finally, we demonstrate that our framework can be applied across diverse spatial data modalities and tissues of interest.
Article
As single-cell omics continue to advance, the field of spatially resolved transcriptomics has emerged with a set of experimental and computational methods to map out the positions of cells and their gene expression profiles in space. Here we summarize current transcriptome-wide and sequencing-based methodologies and their applications in genomics research.
Article
The recent advent of genome-scale imaging has enabled single-cell omics analysis in a spatially resolved manner in intact cells and tissues. These advances allow gene expression profiling of individual cells, and hence in situ identification and spatial mapping of cell types, in complex tissues. The high spatial resolution of these approaches further allows determination of the spatial organizations of the genome and transcriptome inside cells, both of which are key regulatory mechanisms for gene expression.