Access to this full-text is provided by Springer Nature.
Content available from Nature Communications
This content is subject to copyright. Terms and conditions apply.
Article https://doi.org/10.1038/s41467-025-55847-5
Integration of clinical, pathological,
radiological, and transcriptomic data
improves prediction for first-line
immunotherapy outcome in metastatic
non-small cell lung cancer
Nicolas Captier
1,2
,MarvinLerousseau
2,3
, Fanny Orlhac
1
,
Narinée Hovhannisyan-Baghdasarian
1
,MarieLuporsi
1,4
,ErwinWoff
1,5
,
Sarah Lagha
6
, Paulette Salamoun Feghali
6
, Christine Lonjou
2
,
Clément Beaulaton
7
,AndreiZinovyev
8
, Hélène Salmon
9
,
Thomas Walter
2,3,10
,IrèneBuvat
1,10
, Nicolas Girard
6,10
&
Emmanuel Barillot
2,10
Immunotherapy is improving the survival of patients with metastatic non-
small cell lung cancer (NSCLC), yet reliable biomarkers are needed to identify
responders prospectively and optimize patient care. In this study, we explore
the benefits of multimodal approaches to predict immunotherapy outcome
using multiple machine learning algorithms and integration strategies. We
analyze baseline multimodal data from a cohort of 317 metastatic NSCLC
patients treated with first-line immunotherapy, including positron emission
tomography images, digitized pathological slides, bulk transcriptomic pro-
files, and clinical information. Testing multiple integration strategies, most of
them yield multimodal models surpassing both the best unimodal models and
established univariate biomarkers, such as PD-L1 expression. Additionally,
several multimodal combinations demonstrate improved patient risk stratifi-
cation compared to models built with routine clinical features only. Our study
thus provides evidence of the superiority of multimodal over unimodal
approaches, advocating for the collection of large multimodal NSCLC datasets
to develop and validate robust and powerful immunotherapy biomarkers.
Anti PD-1/PD-L1 immunotherapy with or without chemotherapy is the
current standard first-line therapy for metastatic non-small cell lung
cancer (NSCLC) without actionable oncogene alterations and without
contraindications to PD-1/PD-L1 inhibitors1. Several clinical trials have
indeed demonstrated significantly improved Overall Survival (OS) and
Progression-Free Survival (PFS) with immunotherapy in comparison to
chemotherapy alone2–6. Nevertheless, half of the patients do not pre-
sent a radiological response to immunotherapy, and the duration of
response remains highly variable from one patient to another
(ranging from 1.1 to 18 months for patients treated with first-line
immunotherapy + chemotherapy)3. Ultimately, the number of patients
with long-term survival is limited. There is thus a critical need for
Received: 9 February 2024
Accepted: 31 December 2024
Check for updates
A full list of affiliations appears at the end of the paper. e-mail: nicolas.captier@polytechnique.org;emmanuel.barillot@curie.fr
Nature Communications | (2025) 16:614 1
1234567890():,;
1234567890():,;
Content courtesy of Springer Nature, terms of use apply. Rights reserved
biomarkers that can predict treatment response accurately. These
biomarkers will pave the way to better personalize the treatment
strategy –immunotherapy as single agent for patients with predicted
prolonged survival, combination with chemotherapy or other agents
for patients with predicted poor response and survival -, to customize
the follow-up and assess adequately treatment sequences.
Machine learning approaches have recently shown their potential
to leverage data collected before treatment initiation, including
clinical7,8, radiological9,10, anatomopathological11,12, or transcriptomic
information13,14, and develop robust prognostic and predictive models
that could outperform approved univariate biomarkers such as PD-L1
expression15. Promising results have fostered the exploration of mul-
timodal approaches to combine all the diverse aspects of the disease
that these different modalities probe. Yet, evidence of the superiority
of multimodal over unimodal biomarkers16 remains limited, possibly
due to challenges in gathering comprehensive multimodal cohorts.
Therefore, there is a pressing need for new studies involving large and
homogeneous NSCLC multimodal cohorts to fully explore the benefits
of multimodality and design strategies to address the challenges
associated with integrating multimodal data.
In this study, we conduct a thorough comparison of unimodal and
multimodal approaches for predicting the outcome of metastatic
NSCLC patients undergoing first-line immunotherapy. Using a new
multimodal cohort of 317 patients—including clinical data, PET/CT
scans, digitized pathological slides, and bulk RNA-seq data—we
demonstrate the superiority of multimodal approaches across the
majority of the explored predictive algorithms and integration stra-
tegies. Mapping each modality to a set of interpretable features, we
also identify the most influential factors for immunotherapy outcomes
and explore their complementarity. These results could guide future
research, fostering efforts to collect and analyze large multimodal
cohorts, ultimately leading to the development and validation of a new
generation of multimodal biomarkers that could transform NSCLC
patient care.
Results
Clinical characteristics of patients with metastatic NSCLC
We identified 317 NSCLC patients treated at Institut Curie, who met the
inclusion criteria: patients with histologically proven advanced NSCLC
who received anti-PD-(L)1 immunotherapy, specifically pem-
brolizumab, as their first-line treatment. Immunotherapy was admi-
nistered either as a standalone treatment for patients with a PD-L1
expression greater than 50% or in combination with chemotherapy,
regardless of the PD-L1 expression, as per clinical practice guidelines1.
PD-L1 expression was evaluated by immunohistochemistry (Sp263 and
QR1 assays), with the Tumor Proportion Score (TPS) representing the
percentage of tumor cells exhibiting membrane PD-L1 staining. The
patients who received pembrolizumab as monotherapy were treated
between October 2017 and January 2023 while those who received
pembrolizumab combined with chemotherapy were treated between
July 2019 and January 2023. The clinical characteristics of the multi-
modal cohort are detailed in Table 1.
Median OS and PFS wererespectively723 days (95% CI [446–987])
and 301 days (95% CI [145–598]) for the patients treated with immu-
notherapy alone, and 763 days (95% CI [576-NR]) and 290 days (95% CI
[241–372]) for the patients treated with a combination of immu-
notherapy and chemotherapy (Fig. 1A). Interestingly, no significant
difference was observed between the two treatment groups for OS
(log-rank p-value = 0.44, Fig. 1B), even for the patients with PD-L1
expression greater than 50% only (Supplementary Fig. s1). We
observed that for PFS, the immunotherapy + chemotherapy group had
fewer early progressors, although this was compensated by an increase
in late progressors compared to the immunotherapy-only
group (Fig. 1A).
Standard univariate biomarkers show limited predictive power
PD-L1 expression was able to stratify patients, with significant differ-
ences in PFS and OS in patients with negative expression (< 1%) from
those with positive expression (≥1%) (Fig. 1B and Supplementary
Fig. s2). However, it yielded a mild performance as a univariate bio-
marker for patient survival (C-index OS = 0.54, bootstrap 95% CI
[0.51–0.57], permutation p-value = 0.014, n= 298) . Besides, no sig-
nificant performance was observed when using PD-L1 expression as a
continuous score, where negative expressions were replaced by 0%,
and the score was calculated as 100 - TPS (C-index OS = 0.53, bootstrap
95% CI [0.48–0.58], permutation p-value = 0.104, n= 295). Other stan-
dard clinical biomarkers, such as the Tumor Mutational Burden (TMB)
or Tumor Infiltrating Lymphocytes (TILs) —with TILs being semi-
quantitatively assessed on routine pathological sections without any
cutoff —did not exhibit significant association with patient outcome
(Fig. 1C and Supplementary Fig. s3).
Collection of multiple baseline modalities to predict immu-
notherapy outcome
Clinical information from routine care, [18F]FDG-PET/CT scans, digi-
tized pathological slides from the initial diagnosis, and bulk RNA-seq
profiles from solid biopsies were collected at baseline. For each data
modality, we first selected and computed several hand-crafted fea-
tures to serve as input for both unimodal and multimodal predictive
models, including 30 clinical features, 30 radiomic features, 134
pathomic features, and 34 transcriptomic features. We then leveraged
this multimodal dataset to conduct an extensive comparison of the
performance of unimodal and multimodal approaches using a 10-fold
cross-validation scheme applied to the entire cohort and repeated 100
times (Supplementary Fig. s4).
237 out of the 317 patients had at least one missing modality
(Fig. 1D). To ensure a fair comparison of all possible modality combi-
nations, we, therefore, restricted the evaluation of prediction perfor-
mance to the 80 patients with a complete multimodal profile (i.e.,
collecting the predictions of these 80 patients only, from the test sets
of the cross-validation scheme applied to the whole cohort; Supple-
mentary Fig. s4). Log-rank tests indicated no significant differences
between the survival distributions of patients with missing modalities
and those with available modalities (Supplementary Fig. s5).
Comparison of unimodal performances across multiple
prediction tasks
We first evaluated the predictive value of each modality individually.
This evaluation involved predicting risk scores for time-to-event out-
comes (OS and PFS) and classifying patients into two groups: those
who would die within one year of treatment and those who would not
(1-year death), or those who would experience disease progression
before 6 months of treatment and those who would not (6-month
progression). We focused on two standard Machine Learning
approaches that are well-suited for datasets with modest numbers of
samples17: linear methods (logistic regression and Cox regression with
elastic net penalties) and tree ensemble methods (Random Survival
Forest18 andgradient-boostedtree
19 algorithms). The four modalities
exhibited varying degrees of predictive power for patient outcome,
with the RNA modality standing out for the prediction of 1-year death
(AUC = 0.75 ± 0.04 (± 1std); Table 2and Supplementary Table s1). PFS
and 6-month progression predictions were more challenging than OS
and 1-year death predictions. Except for pathological data, all mod-
alities yielded greater performance (using either linear or tree
ensemble algorithms) in predicting OS and 1-year death compared to
PFS and 6-month progression. Across all modalities and models, the
highest scores achieved were a C-index of 0.59 ( ± 0.02) for RNA
modality in predicting PFS and an AUC of 0.61 (± 0.03) for clinical,
pathomic, or RNA modality in predicting 6-month progression.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 2
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Feature importance analyses highlight relevant clinical and
transcriptomic features
We first investigated feature importance for the prediction of OS and
1-year death, providing insights into the information learned by each
unimodal model (see Methods). Notably, it revealed that clinical
models consistently learned that patients with a low level of serum
albumin, a negative PD-L1 status (i.e., TPS < 1%), or abundant circulat-
ing neutrophils were more likely to have a poor prognosis (Fig. 2A).
This analysis also highlighted the transcriptomic features that the RNA
models used to predict OS and 1-year death (Fig. 2B). RNA models
consistently associated an abundance of dendritic cells (DC), as scored
by the MCP-counter method20, or a high expression of NTRK1 gene
with a good prognosis while they associated high expression of NRAS
and KRAS genes with a poor prognosis. Interestingly, among the 13
consensus transcriptomic features identified with feature importance
analysis (see Methods), only 3 exhibited significantly different values
between biopsy sites in the 84 patients for whom this information was
available (Supplementary Fig. s6), suggesting that the other 10 may be
used independently of the biopsy location. Radiomic models primarily
focused on information related to the Total Metabolic Tumor Volume
(TMTV) as well as to the total metabolic volume of extra-thoracic
metastases (Supplementary Fig. s7). Lastly, the interpretation of
pathomic models unveiled features that encoded the proportion of
inflammatory cells within the biopsy sections as well as their spatial
organization (Supplementary Fig. s8).
We then turned to feature importance analysis of unimodal
models for the prediction of PFS and 6-month progression. Notably, it
confirmed that a high expression of the NTRK1 gene and an abundance
of dendritic cells was associated with a favorable prognosis since they
were also ranked among the top ten most important transcriptomic
features for multivariate predictions and showed significant univariate
association with both PFS and 6-months progression (Supplementary
Fig. s9A). Similarly, it highlighted the favorable influence of positive
PD-L1 expression or a high level of serum albumin on the prognosis of
multivariate models and the negative influence of a high neutrophils-
to-lymphocytes ratio (Supplementary Fig. s9B). Finally, this analysis
showed that, similarly to OS and 1-year death prediction, radiomic
models were predominantly driven by the TMTV (Supplementary
Fig. s10A).
The consensus important features (see Methods) identified for OS
and 1-year death predictions, along with those for PFS and 6-month
progression predictions exhibited mild to low inter-modal correlations
(Supplementary Fig. s11), suggesting that the different collected
modalities may capture distinct aspects of each patient’scondition
and response to therapy.
Late fusion of unimodal predictors improves the prediction of
immunotherapy outcome
We then developed multimodal predictors with the hypothesis that
multimodal data would provide richer and more comprehensive
information. We first applied late fusion as a baseline strategy for
integrating all the modalities into multimodal predictors of OS, 1-year
death, PFS, and 6-month progression (Supplementary Fig. s4). Late
fusion consists of averaging the predictions of each individual unim-
odal predictor. We tested every possible combination of two to four
modalities for each predictive task using both linear and tree ensemble
algorithms. The late fusion of tree ensemble models improved the
prediction of patient outcomes across both classification and survival
tasks (Fig. 3and Supplementary Fig. s12). Specifically,for1-yeardeath,
the combination of predictions from clinical, RNA, and radiomic
Table 1 | Clinical characteristics of the multimodal cohort and the subset of patients with a complete multimodal profile
Clinical characteristics Multimodal
cohort (n=317)
Immuno +
chemo (n=196)
Immuno
alone (n=121)
Subset with all
modalities (n=80)
Statistical compar-
ison (80 vs 237)
Age –median (range) 66 (33-92) 64 (33-84) 69 (40-92) 64 (37-82) p1
val =4.9e-3
Sex - n (%) Men 189 (60) 113 (58) 76 (63) 46 (57) p2
val =6.0e-1
Women 128 (40) 83 (42) 45 (37) 34 (43)
1st line therapy –
n(%)
Pembrolizumab+ chemotherapy 196 (62) 196 (100) –55 (69) p2
val =9.2e-2
Pembrolizumab 121 (38) –121 (100) 25 (31)
Histology - n (%) Adenocarcinomas 232 (73) 152 (77) 80 (66) 54 (68) p2
val =2.8e-2
Squamous cell carcinomas 44 (14) 17 (9) 27 (22) 13 (16)
Other subtypes/not available 41 (13) 27 (14) 14 (12) 13 (16)
PD-L1 expression –
n(%)
≥50% 163 (51) 49 (25) 114 (94) 42 (52) p2
val =6.2e-1
1–49% 82 (26) 78 (40) 4 (3) 23 (29)
Negative 56 (18) 56 (29) 0 (0) 11 (14)
Not available 16 (5) 13 (6) 3 (3) 4 (5)
Smoking status –
n(%)
Current/former 287 (91) 180 (92) 107 (88) 71 (89) p2
val =6.9e-1
Never 29 (9) 15 (8) 14 (12) 8 (10)
Not available 1 (< 1) 1 (<1) 0 (0) 1 (1)
Performance status
–n(%)
ECOG 0/1 244 (77) 158 (81) 86 (71) 71 (89) p2
val =1.5e-3
ECOG ≥236(11)14(7)22(18)1(1)
Not available 37 (12) 24 (12) 13 (11) 8 (10)
TILs –n (%) Positive 159 (50) 82 (42) 77 (64) 49 (61) p2
val =2.8e-2
Negative 18 (6) 9 (5) 9 (7) 3 (4)
Not available 140 (44) 105 (53) 35 (29) 28 (35)
Median Overall Survival –days (95% CI) 756 (592–910) 723 (446–987) 763 (576-NR) 846 (650-NR) p3
val =2.7e-1
Median Progression Free Survival –days (95% CI) 296 (241–372) 301 (145–598) 290 (241–372) 386 (275–711) p3
val =1.6e-2
ECOG Eastern Cooperative Oncology Group, TILs Tumor-Infiltrating Lymphocytes, NR Not Reached, p1
val Welch’sttest p-value - p2
val Chi-squared test p-value - p3
val Log-rank test p-value.
The results of the statisticalcomparison between the subset of patients with a complete profile and the restof the cohort, using two-sided Welch’sttests, one-way Chi-squared tests,and Log-rank
tests, ar e prese nted in th e last co lumn.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved
models demonstrated the highest performance (AUC = 0.81 ± 0.03)
while, for OS, the combination of predictions from clinical and RNA
models performed best (C-index = 0.75 ± 0.01). For both OS and 1-year
death, paired-permutation tests confirmed the significantly higher
AUC and C-index for the combined model compared to clinical-only,
radiomics-only, and pathomics-only models (Supplementary Fig. s13).
For PFS, the combination of predictions from clinical, RNA, pathomic,
and radiomic models yielded the best performance while, for 6-month
progression, it was the combination of predictions of cl inical, RNA, and
pathomic models. However, the performance of these two combina-
tions was not significantly different from those of unimodal models
with paired-permutation tests. The late fusion of linear models per-
formed better than tree ensemble models for the prediction
of 6-month progression only, with a combination of clinical,
pathomic, and RNA predictions yielding an AUC of 0.67 (± 0.03)
(Supplementary Fig. s14). This can be explained by the greater
A. B.
C. D.
Clinical (317)
Pathological
(236)
RNA
(134)
Radiological
(201)
80
13
41
84 31
24 10
34
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 4
Content courtesy of Springer Nature, terms of use apply. Rights reserved
performance of unimodal models with linear approaches for 6-month
progression prediction, underscoring that the performance of late
fusion combinations strongly depends on the performance of their
unimodal components.
To further compare late fusion multimodal models with unimodal
ones, we computed the marginal contribution of each modality to the
final multimodal prediction for each patient. We focused on the best-
performing model that combined clinical, radiomic, and RNA tree
ensemble models for 1-year death prediction (Fig. 3). For several
patients, the different modalities did not influence the multimodal
prediction in the same direction (Fig. 4A). Notably, in 26% of the cases
(20/77), the RNA modality’s contribution was discordant with the final
multimodal prediction. Among these discordant cases, one-third (6/
20) were correctly influenced by the RNA modality but misclassified by
the multimodal model, while two-thirds (14/20) were negatively influ-
enced by the RNA modality but correctly classified by the multimodal
model,with the radiomic and clinical modalities guiding the prediction
towards the correct outcome (clusters 1&2). Analyzing the feature
importance for the 14 patients where the multimodal prediction was
correct despite the negative RNA contribution revealed that features
from different modalities provided opposing information, balancing
each other to guide the multimodal prediction in the correct direction
(Fig. 4B). For instance, in some cases, high expression of NRAS gene
negatively influenced the prediction, but incorporating clinical and
radiomic information—such as elevated serum albumin level or high
spleen metabolism—helped achieve a correct prediction. Overall, the
three fused modalities exhibited diverse behaviors, with weak corre-
lations between their unimodal predictions (Fig. 4C). Averaging their
decisions impacted the predicted outcomes for several patients—not
just isolated cases—and improved overall performance.
Benchmark of integration strategies reveals a consistent benefit
of multimodal approaches
We compared the late fusion approach with early fusion (Supple-
mentary Fig. s4). The baseline early fusion approach consists of con-
catenating the features from the different modalities and using these
concatenated vectors as input to a single predictor. For binary classi-
fication tasks, we also re-implemented and tested an attention-based
fusion approach known as DyAM16, which was recently applied to
NSCLC multimodal data. Early fusion and DyAM models were trained
both without and with prior univariate feature selection to balance the
dimensions of the different modalities (see Methods). The comparison
of these different integrationstrategies forpredicting OS, 1-year death,
PFS, and 6-month progression did not identify a single best strategy
(Fig. 5).Thelatefusionoftreeensemblemodelsyieldedthebest
performance for the prediction of OS and 1-year death, while the early
fusion of tree ensemble models and the DyAM model, both with prior
univariate feature selection, outperformed the other strategies for PFS
and 6-month progression prediction, respectively. This comparison
demonstrated the potential of multimodal approaches to enhance
unimodal predictions, as for each prediction task the majority of
integration strategies resulted in multimodal combinations that out-
performed the best unimodal models. Furthermore, this comparison
highlighted modalities that were consistently involved in the best
multimodal combinations across the different integration strategies,
particularly for 1-year death and 6-month progression prediction. For
1-year death, the integration strategies that outperformed the best
unimodal modelwith their optimal combination combined clinical (5/7
strategies), RNA (7/7 strategies), and radiomic (3/7 strategies) mod-
alities. For 6-month progression, they combined clinical (7/8 strate-
gies), pathomic (7/8 strategies), and RNA (8/8 strategies) modalities.
The combination of clinical and RNA modalities also performed best
for OS prediction, while for PFS prediction, it was the combination of
clinical, pathomic, and RNA. Lastly, the superiority of multimodal
approaches was confirmed when comparing the average performance
of the different integration strategies acrossall possible combinations
of one, two, three, and four modalities (Fig. 6and Supplementary
Fig. s15). Indeed, the average performance at a fixed number of mod-
alities increased with the number of integrated modalities for every
strategy and every prediction task (except for the early fusion with a
linear model and no prior feature selection). Paired sample ttests
showed that multimodal combinations (involving two, three, or four
Fig. 1 | Survival of NSCLC patients and Venn diagram summarizing the multi-
modal cohort. A OS and PFS Kaplan-Meier survival curve (solid lines) for the whole
NSCLC cohort (n=311for OS andn=316 forPFS)witha95%confidence interval
(shaded areas). Patients are stratified with respect to their first-line therapy, either
pembrolizumab alone or pembrolizumab+ chemotherapy. Log-rank p-values are
reportedto characterize the separationof the survival curves. BOS andPFS Kaplan-
Meier survival curves (solid lines) with 95% confidence interval (shaded areas) and
log-rank p-values for the patients with available PD-L1 expression (n=295 forOS
and n= 300 for PFS). Patients are stratified with respect to their PD-L1 status
(positive vs negative). COS Kaplan-Meier survival curves (solid lines) with 95%
confidence interval (shaded areas) and log-rank p-values for the 43 patients with
available TMB and the 174 patients with available TILs status. For the TMB, patients
are stratified with a threshold of 15 mutations per megabase (see Methods). For
TILs, patients are stratified with respect to their positive vs negative TILs status.
DOverview of the multimodal cohort with a Venn diagram. The four data mod-
alities and their intersections are represented (i.e., PET/CT images, clinical data,
pathological slides, and bulk RNA-seq profiles). Source data are provided as a
Source Data file.
Table 2 | Unimodal performance for the prediction of OS, 1-year death, PFS, and 6-month progression
Target (number of patients) OS (n= 79) 1-year death (n=77) PFS(n= 80) 6-month progression (n=75)
Metric C-index AUC C-index AUC
Clinical Tree ensembles 0.67 ± 0.01* 0.59 ± 0.05 0.56 ± 0.02 0.58 ± 0.04
Linear 0.60 ± 0.02* 0.73 ± 0.02* 0.53 ± 0.03 0.61 ±0.03*
Radiomics Tree ensembles 0.61 ± 0.02* 0.62 ± 0.04 0.57 ± 0.01 0.56± 0.05
Linear 0.61 ± 0.02* 0.47 ± 0.03 0.55 ± 0.02 0.48 ± 0.04
Pathomics Tree ensembles 0.59 ± 0.02 0.54 ± 0.05 0.56 ± 0.02 0.58 ± 0.06*
Linear 0.58 ± 0.02 0.56 ± 0.03 0.51 ± 0.02 0. 61±0.03*
RNA Tree ensembles 0. 69±0. 02* 0.75 ±0.04* 0.57 ± 0.02 0.60 ± 0.04*
Linear 0.58 ± 0.02 0.65 ± 0.03 0.59±0.02* 0.61 ±0.03
*one-sided permutation p-value ≤0.05 (exact p-values are reported in Supplementary Table s1).
Unimodal performance ofeach data modalityfor the prediction of OS,1-year death,PFS, and 6-monthprogressionwith linear andtree ensemblealgorithms(mean ± std over the100 cross-validation
schemes). The best performances for each column are highlighted in bold. Source data are provided as a Source Data file.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 5
Content courtesy of Springer Nature, terms of use apply. Rights reserved
modalities) consistently led to performance improvements compared
to unimodal models. The performances of all the multimodal models
are detailed in the supplementary materials (Supplementary
Figs. s14, s16–23).
We also explored whether the observed multimodal benefit
depended on our initial selection of features within each modality. We
focused on the transcriptomic modality, which demonstrated the
highest unimodal performance (Table 2), and assessed whether our
multimodal models could outperform any transcriptomic signature,
not just the one derived from the initially selected features. For each
predictive task, we applied the same cross-validation schemes as
before and compared the predictive performance of the best multi-
modal model with 36 transcriptomic signatures previously associated
with immunotherapy in the literature (see Methods, Supplemen-
tary Table s2). The best multimodal model outperformed all tran-
scriptomic signatures, except for the prediction of 6-month
progression, where it outperformed 33 out of 36 signatures (Fig. 7).
Interestingly, for OS and 1-year death, our best unimodal modelranked
among the top two transcriptomic signatures, whereas for PFS and
6-month progression, it did not rank within the top ten.
Multimodal predictions demonstrate improved patient
stratification for OS
Kaplan-Meier analysis showed that the predictions of multimodal
models, integrating clinical data with other modalities when available,
effectively stratified patients’OS. After adjusting the log-rank p-values,
93% of all the combinations across the different prediction tasks (i.e.,
328/352) exhibited significant differences between the survival dis-
tributions of their low-risk and high-risk groups (Fig. 8A and Supple-
mentary Fig. s24). Notably, 74% of the combinations (i.e., 260/352)
yielded a lower p-value than the binary PD-L1 status (log-rank
p-value = 0.0025, n= 265). For each model, low-risk and high-risk
A.
B.
* *
* *
*
* *
*
*
*
* *
* *
* *
* *
* *
*
*
1y-D OS
1y-D OS
Univariate
associations
Univariate
associations
Fig. 2 | Feature importance ranking for the prediction of overall survival, for
clinicaland transcriptomic modalities. Feature importance ranking was obtained
by aggregating theSHAP values collectedfrom both tasks (OSand 1-year death) and
both approaches (linear and tree ensemble methods) (see Methods). Features that
were significantly associated with 1-year death (one-sided permutation test with
univariate AUCs) after Benjamini-Hochberg (BH) correction (α=0:05) are shown
with a * on the left side, while features that were significantly associated with OS
(one-sided permutation test with univariate C-index) after BH correction are
annotated with a * on the right side. * corresponds to an adjusted p-value below
0.05. AConsensus feature importance ranking for the clinical data modality (left)
and heatmap of correlations between consensus clinical features (right). Correla-
tions were evaluated by Spearman correlation coefficients (for continuous feature
vs continuous feature), AUCs rescaled to ½1, 1(for continuous feature vs binary
categorical feature), or Matthews correlation coefficient (for binary categorical
feature vs binary categorical feature). BConsensus feature importance ranking for
the RNA data modality (left) and heatmap of Spearman correlations between
consensus RNA features (right). Source data are provided as a Source Data file.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 6
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1-year death (AUC)
OS (C-index)
6-months progression (AUC)
PFS (C-index)
A.
B.
Fig. 3 | Performance of all the possible multimodal combinations, with a late
fusion strategy and tree ensemble methods. The bar height corresponds to the
performance metric (either ROC AUC or C-index) averaged across the 100 cross-
validation schemes, and the error bar corresponds to ± 1 standard deviation, esti-
mated across the 100 cross-validation schemes. AROC AUCs associated with the
prediction of 1-year death with XGBoost algorithms(top) and estimated with n=77
patients. C-indexes associated with the prediction of OS with Random Survival
Forest algorithms (bottom) and estimated with n= 79 patients. BROC AUCs
associated with the prediction of 6-month progression with XGBoost algorithms
(top) and estimated with n= 75 patients. C-indexes associated with the prediction
of PFS with Random SurvivalForest algorithms (bottom) and estimated with n=80
patients. * C: clinical, R: radiomic, P:pathomic, RNA: transcriptomic. Source data are
provided as a Source Data file.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 7
Content courtesy of Springer Nature, terms of use apply. Rights reserved
group membership was defined by optimizing the cutoff on the
training set of each cross-validation fold to maximize the log-rank test
statistic and then applying that cutoff to the corresponding test set. In
the case of classification tasks, a 0.5 cutoff on the predicted prob-
abilities was also considered(see Methods). The clinical modality alone
effectively separated patients into two risk groups, with the predic-
tions of a linear model trained to predict 1-year death yielding the best
p-value (log-rank p-value = 1.26e-06, Fig. 8B). For all prediction tasks, a
range of one to seven multimodal models, out of the 56 possible
models (i.e., integration strategy + multimodal combination), demon-
strated superior risk stratification (as measured by the log-rank test
statistic) compared to the clinical models with the lowest log-rank
p-values (Fig. 8A and Supplementary Fig. s24). Specifically, a combi-
nation of clinical, pathomic, and RNA modalities, trained to predict
1-year death with a tree ensemble algorithm, yielded the best p-value
(log-rank p-value = 3.51e-09, Fig. 8B). We further compared the multi-
modal score resulting from this combination (i.e., the test predictions
averaged across the different cross-validation schemes) with the
0.5 0.60.4
Multimodal predictions
(C+R+RNA)
True Negatives
False Positives
True Positives
False Negatives
Marginal
contribution
ClinicalRNA Radiomic
1
2
Feature value
Low High
Increase
predicted risk
Decrease
predicted risk
Clinical
Radiomic
Clinical
RNA
Radiomic
RNA
B.
C.
A. 1
2
Cluster
Cluster
Patients
Patients
Cluster
Cluster
Fig. 4 | Marginal contribution of each modality to the multimodal predictions
for late fusion strategy and XGBoost classifiers. A Heatmap of the marginal
contribution (i.e., Shapley value) of each modality to the 1-year death prediction
using the C + R + RNA late fusion model with XGBoost classifiers. Marginal con-
tributions indicate how each modality influences the prediction relative to a ran-
dom baseline of 0.5. Patients are stratified based on the multimodal model’sfinal
prediction (with a 0.5 threshold), where the positive class corresponds to those
who died within 1 year, and the negative class corresponds to those who survived.
BFor each modality and patient in clusters 1 and 2 (see A), represented by vertical
lines, this plot shows the feature with the highest SHAP value that aligns with the
modality’s marginal contribution. The size of each triangle indicates the absolute
SHAP value, while its orientation correspondsto its sign (up for positive values that
increase the predicted probability of death within 1 year and down for negative
values that decrease it). The color scale represents the associated feature value
relative to the whole patient cohort. CRelationship between the unimodal pre-
dictions from clinical, radiomic, and RNA modalities (i.e., unimodal tree ensemble
models).Each dot is colored according to the patient’s truelabel. *In theseplots, all
marginal contributions, SHAP values, and predictions were obtained for the 77
patientswith complete multimodal profiles and available 1-year death labels across
the cross-validation test sets. They were collected for each of the 100 cross-
validation schemes (see Methods) and subsequently averaged for each patient.
Source data are provided as a Source Data file.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 8
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Late
Late
Early DyAM
Early DyAM
Best multimodal combination for different integration strategies
Best unimodal models
A.
Best multimodal combination for different integration strategies
Best unimodal models
B.
Late
Late
Early
Early
Fig. 5 | Best unimodal and multimodal performances across all the possible
combinations of modalities and predictive algorithms. The top barplot displays
the performance ofthe best multimodal combination foreach integration strategy,
while the bottom barplot shows the performance of the best unimodal algorithm
for each data modality. Bar heights and error bars correspond to the mean metric
(AUC or C-index) and± 1 standard deviation, respectively, estimated acrossthe 100
cross-validation schemes (except for the dyam_optim models for which only 10
cross-validation schemes were used, due to computational constraints). ABest
performance (AUC) for the prediction of 1-year death and 6-month progression
(n=77for1-yeardeathandn= 75 for 6-month progression). BBestperformance (C-
index) for the prediction of OS and PFS (n= 79 for OS and n= 80 for PFS). Source
data are provided as a Source Data file.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 9
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1-year death (AUC)
OS (C-index)
**
**
**
**
***
**
***
***
**
***
***
***
Fig. 6 | Average performance across all models with 1, 2, 3,and 4 modalities for
1-year death and OS. Markers and error bars correspond to the mean average
performance and ± 1 standard deviation respectively, estimated across the 100
cross-validation schemes. The box-and-whisker plots show the three quartiles and
the minimum and maximum as whiskers up to 1:5×IQR(25–75%). Mean increases
are represented with dashed lines and bold annotations. Red annotations corre-
spond to two-sidedpaired sample ttest p-values to compare the different numbers
of integrated modalities (e.g., 1 modality vs 2 modalities), with nmodels = 8 for 1-year
death and nmodels = 6 for OS. *: 1e-2 <pval ≤5e-2, **: 1e-4 <pval ≤1e-3, ***: pval ≤1e-4.
Source data are provided as a Source Data file.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 10
Content courtesy of Springer Nature, terms of use apply. Rights reserved
late_XGBoost
C+ R + RNA
late_RF
C + RNA
dyam_select
C+ P + RNA
early_select_RF
C+ P + RNA
6-months progression
PFS
OS
1-year death
AUC
AUC
C-index
C-index
B.
A
.
Fig. 7 | Comparison of the performance of transcriptomic signatures with our
best transcriptomicand multimodal models. Comparison of the performance of
36 transcriptomic signatures previously associated with immunotherapy(from the
literature) against the best unimodal transcriptomic model and the best multi-
modal model from our analysis for each prediction task. The bar height corre-
sponds to the performance metric (either ROC AUC or C-index), averaged across
100 cross-validation schemes and estimated for the 80 patients with a complete
multimodal profile. The error bar indicates ± 1 standard deviation (for signatures
without a training step, this standard deviation is zero). Performance metrics were
transformed using max(x, 1-x) to account for signatures with a performance below
0.5. Blue bars represent performances below 0.5 (higher signature values are
associated withbetter prognosis), whilered bars representperformances above 0.5
(higher signature values are associated with worse prognosis). AComparison for
1-year death prediction (n= 77 patients with a complete profile and available 1-year
death label) and OS prediction (n= 79 patie nts with compl ete profile and available
OS information). BComparison for 6-month progression prediction (n=75
patients with complete profile and available 6-month progression label) and PFS
prediction (n= 80 patients with complete profile and available PFS information).
Source data are provided as a Source Data file.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved
clinical score derived from the linear model described above (Fig. 8B).
To do so, we divided the cohort into quartiles based on these two
scores and performed Kaplan-Meier analysis for OS within the first year
of therapy (Fig. 8C). Both scores yielded a lowest quartile group (low
risk) with a 14% death rate (9/66 patients) within the first year of
therapy. However, the multimodal score identified a highest quartile
group (high risk) with a 52% death rate (35/67 patients, 20 treated with
immunotherapy + chemotherapy and 15 treated with immunotherapy
only), whereas the clinical scoreyielded a highest quartile group with a
40% death rate (27/67 patients, 13 treated with immunotherapy +
chemotherapy and 14 treated with immunotherapy only).
Finally, the multimodal score resulting from the combination of
clinical, pathomic, and RNA modalities demonstrated a significant
association with OS when integrated into a multivariate Cox model
along with the clinical features (Fig. 8D). Likelihood-ratio tests indi-
cated a significant effect of this multimodal score compared to a Cox
model fitted only with clinical information collected from routine care
(p-value = 1.09e-05). Five clinical variables were also significant,
+ 29 patients
+ 48 patients
+ 65 patients
+ 27 patients
From low-risk group to high-
risk group
From high-risk group to low-
risk group
* Perceptron * Early XGBoost (prior selection)
A.
B.
Binary classifiersSurvival models
Binary classifiersSurvival models
Minimum log-rank adjusted pvalue for different prediction tasks Distribution of log-rank adjusted pvalues for different prediction tasks
+
C.
Cox - OS
Cox - OS
D.
* Perceptron
* Early XGBoost (prior selection)
pval = 1.8e-04
pval = 1.3e-07
OS
OS
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved
including sex, ecog_1, braf, errb2, and pdl1 (see Supplementary Meth-
ods). When comparing the multimodal score with the best unimodal
scores (i.e., derived from the top-performing models for 1-year death
prediction, Fig. 5A), only the multimodal score (hazard ratio (HR) =
1.35, 95% CI [1.07-1.69], p-value = 1.11e-02) and the clinical score (HR =
1.25, 95% CI [1.02-1.54], p-value = 3.17e-02) demonstrated a significant
effect (Fig. 8D).
Discussion
Multimodal approaches for developing accurate biomarkers for the
outcome of metastatic NSCLC patients treated by immunotherapy are
highly promising but have been rarely explored so far. In this study, we
built a new multimodal NSCLC cohort to investigate the benefitof
integrative strategies. We extracted interpretable features from clin-
ical data, PET/CT images, digitized pathological slides, and bulk RNA-
seq profiles and compared the performance of unimodal and multi-
modal machine learning models to accurately predict patient out-
comes. We conducted an extensive exploration of different
algorithms, integration strategies, and outcome encodings (i.e., binary
vs. continuous targets) to highlight consistent trends that remain
robust regardless of the specific choices made within the analysis
pipeline.
We trained several unimodal models capable of predicting OS,
1-year death, PFS, and 6-month progression using pre-treatment clin-
ical and transcriptomic data. Our design choice to base the entire
workflow on interpretable features enabled us to conduct a thorough
feature importance analysis. It revealed that clinical models integrated
previously established biomarkers into efficient multivariate pre-
dictive models7, while RNA models used signatures of the Tumor
Micro-Environment (TME)20 as well as the expression of specific
oncogenes, which were robust to the biopsy location. Notably, our
analysis highlighted the positive impact of a high abundance of den-
dritic cells (DC), as scored by MCP counter20, on patient survival. These
findings thus provide further evidence ofthe potential of RNA-seq data
to predict immunotherapy response. They corroborate recent studies
that have demonstrated the enhanced predictive power of RNA-seq
data compared to conventional modalities used in clinical practice,
such as mutational data or immunohistochemistry21,22. Radiomics and
pathomics, used as standalone predictors, exhibited limited predictive
ability in our analysis. Radiomic models predominantly relied on the
TMTV to predict OS and 1-year death, confirming its strong predictive
value23. Nonetheless, the other aggregated features that we investi-
gated did not increase performance, highlighting the need to explore
and design additional radiomic features that could effectively com-
plement TMTV.
The importance of DCs for the prediction of patient outcomes is
in line with previous pre-clinical mechanistic studies that highlighted
their central role in shaping anti-tumor immunity24,25. In particular,
type 1 DCs (DC1) present antigens to CD8 + T cells, and their abun-
dance in the TME has been linked to increased survival and improved
response to immunotherapy in both animal models26 and human
cancer lesions27. Tertiary lymphoid structures (TLS) are ectopic for-
mations containing high densities of B cells, T cells, and dendritic cells,
at sites of persistent inflammatory stimulation, including tumors28.
Given the accumulating evidence linking the presence of tumor TLS
and good prognosis in cancer patients29, we investigated a possible
association between DCs and TLS. However, visualization of H&E sec-
tions and the poorcorrelation between B cells and DCs inour data did
not support this hypothesis. The link between DCs and prognosis/
survival observed in the present study can most likely be explained by
DCs’ability to capture tumor antigen in the tumor lesion and present it
to T cells in draining lymph nodes.
Our study provided further evidence supporting the superiority
of multimodal over unimodal approaches to build accurate bio-
markers for the outcome of metastatic NSCLC patients treated with
immunotherapy. In all prediction tasks related to OS, 1-year death, PFS,
and 6-month progression, we identified multiple multimodal combi-
nations that outperformed the unimodalmodels, including themodels
relying on standard clinical data. Furthermore, several combinations
demonstrated enhanced patient risk stratification for OS, out-
performing the best clinical model across the whole cohort. Multi-
variate Cox models, combined with Kaplan-Meier analyses,
underscored the enhanced prognostic value provided by multimodal
predictions beyond routine clinical biomarkers. They further high-
lighted that multimodal scores could help better identify patients with
the most severe prognosis, thereby guiding tailored treatment stra-
tegies such as intensified follow-up care or considering chemotherapy
even in cases withhigh PD-L1 expression. No single integration strategy
outperformed others for all prediction tasks, but most of them effec-
tively built multimodal predictors that outperformed the best unim-
odal predictors. While we confirmed the potential of the DyAM
method16, especially with prior feature selection, we found that a much
simpler model based on late fusion frequently compared favorably to
more complex integration strategies. We assume that the robust per-
formance of the simple late fusion approach is due to its ability to
handle missing modalities. Although it is not ruled out that more
complex methods might ultimately yield better results on ideal and
large datasets,it should be considered thatsuch ideal scenarios will be
quite rare in clinical reality. Overall, our comprehensive benchmark of
multiple algorithms and integration strategies highlighted a consistent
benefit of combining multiple modalities to predict immunotherapy
outcomes, irrespective of specific settings or methodologies.
Our study has several limitations. First, we dealt with a relatively
modest number of patients, many of whom had missing data, which
limited the statistical power of our analyses, particularly when testing
the association between patient survival under immunotherapy and
standard clinical biomarkers (Fig. 1C and Supplementary Fig. S3).
Besides, we did not have access to an external validation cohort to
assessthe reproducibilityand robustness of our results.Due to missing
modalities, our evaluation was conducted on a subset of 80 patients
with a complete multimodal profile to ensure a fair comparison
between multimodal combinations. Therefore, the absolute
Fig. 8 | Risk stratification and survival analysis for OS with the predicted
multimodal scores. A Comparison of the stratification of the patients into high-
risk and low-risk groupsfor OS, for different predictive tasks,with log-rank p-values
(n= 265 patients with the 4 targets available for a fair comparison). Only the
combinations, including the clinicalmodality, are compared (seeMethods). On the
left, clinical and multimodal models are compared by showing the lowest log-rank
adjusted p-values from all clinical (left) and multimodal (right) models for each
prediction task. On the right, the box-and-whisker plots show the three quartiles,
with whiskers extending up to 1:5×IQR (25–75%) to show the range of adjusted
p-values. BKaplan-Meier survival curves (solid lines) with 95% confidence interval
(shaded areas) for the high-risk and low-risk OS groups defined by PDL1-status
(left), the clinical model with the lowest log-rank p-value (middle), and the multi-
modal model with lowest log-rank p-value (right). Unlike in A, unadjusted p-values
are displayed here. CKaplan-Meier survivalcurves (solid lines) with 95% confidence
interval (shaded areas) for OS within the first year of therapy. The cohort is stra-
tified into quartiles based on either the clinical score derived from the clinical
perceptron predictions (top) or the multimodalscore derived from the predictions
of the clinical + pathomics + RNAmodel (bottom).DLog hazard ratios (points)with
95% confidence intervals (error bars) and likelihood-ratio test p-values associated
with multivariate Cox models trained to predict patient’sOS(n=265). Coxmodel
with the clinical + pathomics + RNA score as well as the clinical features collected in
this study (left). Cox model with the clinical + pathomics + RNA score,as well as the
best unimodal scores derivedfrom the top performing unimodal models for 1-year
death prediction (right), with the best clinical score corresponding to the clinical
perceptron identified in panels (Band C). Source data are provided as a Source
Data file.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved
performance scores highlighted in this study should be interpreted
cautiously. Nevertheless, the relative comparison of these scores
between different combinations of modalities, as wellas in comparison
with random predictors and standard clinical biomarkers, all trained
and evaluated with the same methodology, remains valuable. It con-
firms the superiority of multimodal approaches over unimodal models
and thus may motivate the collection of new large and multi-centric
multimodal NSCLC cohorts. These multi-centric cohorts are needed to
address challenges related to missing modalities, pa rticularly RNA, and
further validate our findings. Furthermore, our study did not include
advanced NSCLC patients who did not receive immunotherapy, which
prevented us from distinguishing between predictive and prognostic
biomarkers30. It is likely that some features highlighted in our analyses,
particularly clinical and radiomic ones, are more prognostic than
predictive, asthey have already been associated with patient outcomes
for other treatments31,32. Nonetheless, our study identified promising
features, especially transcriptomic ones, as well as several combina-
tions of features —whether multimodal or not—that foster further
research to evaluate their predictive value and their association with
immunotherapy response. In addition, our data collection pipeline
involved time-consuming processes, such as the manual segmentation
and annotation of PET/CT scans by nuclear medicine physicians, which
could deter the collection of new external cohorts and limit their size.
However, deep learning methods forautomatically segmenting lesions
on PET scans33 or computing surrogate radiomic features34 have
recently shown very promising results and may soon be incorporated
into the multimodal pipeline to overcome this bottleneck. Finally,
despite being one of the most powerful modalities in our analysis, the
RNA modality is not yet routinely available in clinical practice in many
places, unlike clinical information, PET/CT images, or pathological
slides. Its collection involves additional costs and is often affected by
thelowqualityoftheremainingtissuesamplesfromthebiopsy.
However, it could be used as an initial step to identify prognostic
mechanisms, which could subsequently be assessed with more cost-
effective technologies.
Our multimodal cohort allowed us to demonstrate the ability of
clinical, radiological, pathological, and transcriptomic data to inform
powerful multimodal predictors for the outcome of patients treated
with immunotherapy in metastatic NSCLC. It provided several pro-
mising predictors that outperformed both established biomarkers
(e.g., PD-L1 expression) and unimodal predictors. They now require
refinement and validation in multicentric studies. These results foster
further efforts to gather large multimodal cohorts and explore multi-
modal biomarkers that could efficiently guide therapeutic decisions.
Methods
This study was approved by the institutional review board at Institut
Curie (DATA200053) and informed consent from all patients was
obtained through institutional processes. Data were de-identified,
collected, and stored in compliance with GDPR.
Clinical data
Baseline clinical data for all 317 patients were collected from the
Electronic Medical Record of Institut Curie Hospital using a pr edefined
case report form based on the ESME-AMLC database.
Each patient’s response to immunotherapy was assessed through
OS and PFS. OS was defined as the duration from the initiation of
first-line immunotherapy (with or without chemotherapy) to the
patient’s death or last available status update. PFS was defined as the
duration from the initiation of first-line immunotherapy to the occur-
rence of the first progression event or last available status update,
including the emergence of new lesions or the progression of pre-
existing ones. We also considered binary outcomes, specifically 1-year
death (0 for patients who were still alive after one year of immu-
notherapy and 1 otherwise) and 6-month progression (0 for patients
whose disease did not progress after six months of immunotherapy
and 1 otherwise). Patients whose OS or PFS was censored before one
year or six months, respectively, were excluded from the analysis with
binary outcomes.
We selected 30 baseline clinical features for our predictive mod-
els. A detailed list of thesefeatures and their definitions are provided in
Supplementary Methods. We applied one-hot encoding to all catego-
rical features.
Radiomic data
Baseline [18 F]FDG-PET /CT scans were collected for 201 patients. Two
experienced nuclear medicine physicians delineated all tumor foci in
all PET scans using LIFEx software v.7.3 (https://www.lifexsoft.org/)35.
In addition, they annotated the location of each lesion using an ana-
tomical partition inspired by TNM staging36. For instance, ipsilateral
and contralateral lung metastases were distinguished since they are
not associated with the same TNM stage. Subsequently, all images
were resampled to a fixed 2x2x2 mm3 voxel size, and the segmented
tumor regions were processed by applying a fixed threshold of
2.5 standardized uptake value (SUV) units to exclude voxels with SUV
values below this threshold. The SUVmax value, the volume, and the
centroid of each resulting tumor region were extracted with the IBSI-
compliant PyRadiomicsPython package37,38. The SUVmean values from
spherical ROIs manually delineated in the healthy regions of the liver
and spleen were also extracted for each patient, without prior 2.5 SUV
thresholding (liver ROIs mean volume = 24 cm3, standard deviation
18 cm3–spleen ROIs mean volume = 10 cm3,standarddeviation8cm
3).
All these extracted data were then aggregated into 30 baseline
whole-body radiomic features to capture the spread of the metastatic
disease as well as its metabolic activity. A detailed list of these features
and their definitionsisprovidedinSupplementaryMethods.Wecon-
sidered the SUVmean values of healthy ROIs inthe spleen and the liver,
the TMTV23, and the number of invaded organs visible on the PET scan,
including the lungs, sub- and supra-diaphragmatic lymph nodes, the
pleura, the liver, the bones, the adrenal gland, and a final category for
other regions. In addition, using the centroids of the processed tumor
regions, we calculated the standardized Dmax39 –the largest distance
between two lesions normalized by the body surface area - and the
quartile dispersion of the distances between each tumor region’s
centroid and the global centroid. Finally, for each TNM stage, we took
into account all the tumors located in associated regions and com-
puted the TMTV as well as the mean, standard deviation, and max-
imum value of all the SUVmax values. For instance, for the T stage, we
considered the primary lung tumor as well as all the ipsilateral lung
metastases. In cases where no tumor was present in these regions, the
feature values were set to zero. Except for features associated with
SUVmax, we excluded lesions corresponding to lymphangitic spread,
diffuse pleural metastases, diffuse myocardial metastases, and diffuse
subdiaphragmatic metastases because their accurate segmentation
was questionable.
All the features associated with metabolic volume were log-
transformed (i.e., logðx+1Þ) to deal with right-skewed distributions.
Pathomic data
Baseline pathological slides stained with Hematoxylin-Erythrosine-
Saffron (HES) were collected from the FFPE biopsy blocks of 236
patients and subsequently scanned by the Experimental Pathology
Platform at Institut Curie. From each slide, we segmented all nuclei
with a custom automatic pipeline derived from Lerousseau et al.40,
trained in a weakly supervised setting with publicly available data from
TCGA as well as data sets from Institut Curie. The slides from the
current study were not used to train the segmentation pipeline.
The cell nuclei of six cell types were annotated on each slide,
including stromal, epithelial, dead, tumor, connective, and inflamma-
tory cells. Theseannotations were then used in a pathomic approach41
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 14
Content courtesy of Springer Nature, terms of use apply. Rights reserved
to extract 134 relevant features, characterizing the density, the relative
proportion, or the spatial organization of these different cell types
within the scanned tissue.
Transcriptomic data
Residual FFPE biopsy specimens containing sufficient RNA were col-
lected for 134 patients and RNA sequencing was performed at the
Sequencing Core facility of Institut Curie with the Illumina TruSeq RNA
Access technology. The RNA-seq data were then processed with the
Institut Curie RNA-seq pipeline v4.0.042. The raw bulk RNA-seq read
counts were normalized with TPM (Transcripts Per Million) and log-
transformed (i.e., logðx+1Þ).
The abundance of 8 immune cells and 2 stromal cells in the Tumor
Micro-Environment was estimated using the MCP-counter method20.
In addition, log expressions of 22 oncogenes associated with lung
cancer were used as features (KRAS,NRAS,EGFR,MET,BRAF,ROS1,
ALK,ERBB2,ERBB4,FGFR1,FGFR2,FGFR3,NTRK1,NTRK2,NTRK3,LTK,
RET,RIT1,MAP2K1,DDR2,ALK,andCD274). The biopsy site was also
considered as a categorical feature and one-hot encoded, distin-
guishing between lungs, pleura, lymph nodes, bones, liver, adrenal
gland, and brain. This information was availablefor 84 patients.Finally,
we used a custom pipeline derived from Jessen et al.43 to estimate the
Tumor Mutational Burden (TMB) from the RNA-seq reads mapped to
the reference genome hg19 with STAR aligner (1-pass). VarDict tool44
was used for variant calling and several filters were applied to detect
somatic variants. This feature, named TMB_RNA, was available for 110
patients.
Genomic data
TMB was estimated for 43 patients using a custom NGS panel of 571
genes called DRAGON (Detection of Relevant Alterations in Genes
involved in Oncogenetics by NGS) and marketed by Agilent under the
name of SureSelect CD Curie CGP. Only non-synonymous alterations
(excluding splice site) were considered, with a threshold of 15 muta-
tions per megabase used todistinguish between patientswith high and
low TMB.
Unimodal analyses - Tree ensemble methods
The Extreme Gradient-Boosted Trees algorithm19 implemented in
XGBoost v.1.7.6 Python package was used to solve 1-year death and
6-month progression classification tasks. We used default parameter
values except for scale_pos_weight which controls the balance between
positive and negative weights and was set to the proportion of nega-
tive over positive labels estimated with the training set. Each XGBoost
classifier was calibrated with Platt’slogisticmodel
45: predictions were
collected with a 10-fold stratified cross-validation scheme on the
training set and then used as input for a univariate logistic regression
model with balanced class weights. To generate the final calibrated
predictions, this logistic model was applied to the raw predictions of
the XGBoost classifier.
The Random Survival Forest algorithm18 implemented in scikit-
survival v.0.21.0 Python package was used to solve survival tasks for OS
and PFS. We used default parameter values except for max_depth
which controls the size of the survival trees and was set to 6 to mitigate
the risk of overfitting. Contrary to XGBoost, Random Survival Forest
algorithm does not handle missing values automatically. Therefore, we
applied median imputation for continuous features and most-frequent
imputation for categorical features, both fitted to the training set.
Unimodal analyses - linear methods
The Logistic Regression algorithm with elastic net penalty imple-
mented in scikit-learn v.1.2.2 Python package was used to solve 1-year
death and 6-month progression classification tasks. We used the saga
optimizer, a regularization parameter C= 0.1, an L1 ratio of 0.5, a
maximum number of iterations of 2500, and balanced class weights.
The Cox’s proportional hazard’s algorithm with elastic net penalty
implemented in scikit-survival v.0.21.0 Python package was used to
solve survival tasks for OS and PFS. We used default parameter values
except for alpha_min_ratio which controls the regularization strength
and was set to 0.01.
In both algorithms, we preprocessed the data by first applying
robust scaling, followedby median imputation for continuous features
and the most frequent imputation for categorical features. All these
operations were fitted to the training set.
Multimodal analyses - late fusion
We used a late fusion strategy to combine every possible subset of
modalities. This analysis was limited to fusions of the same predictive
algorithms with the same parameter values. For instance, for classifi-
cation tasks, we separately explored the fusion of penalized logistic
regression models and the fusion of XGBoost models.
First, we restricted the training set to patients with at least one of
the modalities of the combination of interest available. Then, we
independently trained each unimodal model using the subset of
patients in the training set for whom the associated modality was
available. Finally, for each patient in the test set, we computed the
multimodal prediction by averaging the unimodal predictions for the
available modalities. For survival models, the unimodal predictions
were standardized before averaging, using the mean and standard
deviation values estimated for each modality based on predictions
obtained from a 10-fold cross-validation scheme applied to the train-
ing set (stratified with respect to the censorship rate). XGBoost late
fusion models were also calibrated using Platt’s logistic model45,fol-
lowing the strategy described previously.
Multimodal analyses - early fusion
We used an early fusion strategy to combine every possible subset of
data modalities and compared the results with those obtained with the
late fusion strategy. The training set was once againlimited to patients
with at least one available modality. We first pre-processed each data
modality separately, considering the subset of patients within the
training set for whom that modality was available. Subsequently, we
concatenated the processed features from all the modalities to form
the input for the predictive model. We used the same models and the
same parameter values as in the unimodal analyses (i.e., XGBoost,
Logistic Regression, Random Survival Forest, and Cox model).
XGBoost early fusion models were calibrated using Platt’slogistic
model45.
For linear models, missing modalities were handled by replacing
them with zero values. XGBoost did not require special handling for
missing modalities, while for Random Survival Forest, a double-coding
strategy was applied, inspired by Engemman et al.46. This approach
involved duplicating features, assigning either very high values or very
low values to patients with missing modalities. This allowed the sur-
vival tree to decide on which side of the decision split to place patients
with missing modalities.
We also explored early fusion with a preliminary feature selection
step to maintain a consistent number of features across different
multimodal combinations. We first calculated a univariate score for
each feature using m0:5
jj
,wheremcorresponds to either the AUC
or the C-index computed with the training set. We then ranked all the
features from all the modalities accordingly. To reduce redundancy,
we filtered out highly correlated features by iterating through the
ranked feature list from top to bottom and removing subsequent
features with a Pearson correlation exceeding ρ=0:7. Finally, for each
modality, we selected the top bntotal=nmodas cfeatures from the filtered,
ranked list that belonged to that modality, where ntotal corresponds to
the number of features to keep in the multimodal model (in this ana-
lysis, we used ntotal =40) and nmodas corresponds to the number of
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 15
Content courtesy of Springer Nature, terms of use apply. Rights reserved
modalities in the multimodal combination of interest. For unimodal
combinations, this feature selection step was ignored.
Multimodal analyses - DyAM model
We used our own implementation of the DyAM model, with PyTorch
v.2.0.1 Python package, to combine every possible subset of data
modalities and compared the results with those obtained with late
fusion and early fusion strategies. We adopted the exact same archi-
tecture as described in Vanguri et al.16, which used single-layer feed-
forward neural networks with a tanh activation function for unimodal
predictions and single-layer feedforward neural networks with a soft-
plus activation function for unimodal attention weights. Furthermore,
similar to ref. 16, we trained our models with a binary cross-entropy
loss with balanced class weights, a learning rate of 0.01, 125 training
epochs, an L2 regularization strength of 0.001, and the Adam
optimizer.
Data were pre-processed with robust scaling as well as median
imputation for continuous features and most-frequent imputation for
categorical features. For pathomic data, we also applied a Principal
Component Analysis (PCA) step with 40 components to reduce the
size of the neural networks. We also applied a preliminary feature
selection step as described previously.
We implemented a nested cross-validation scheme with inner 10-
fold stratified cross-validation and a grid-search strategy to optimize
the learning rate and the L2 regularization strength for each combi-
nation. Due to computational constraints, we limited the number of
repetitions to 10 in these cases.
Statistical analysis –performance evaluation
All the predictive models were trained and tested using a 10-fold cross-
validation scheme applied to the entire cohort. The folds were strati-
fied based on class proportion for classification tasks and censorship
rate for survival tasks. Ineach fold and for each modality combination,
patients with all modalities missing were excluded from the training
set. Pre-processing operations, including missing value imputation,
scaling, or univariate feature selection, were fitted to each training set
and then applied to the corresponding test set to prevent any data
leakage (Supplementary Fig. s4). This process was repeated 100 times,
with data shuffling in each iteration. We used the same repeats across
all the experiments.
The performance of each model was evaluated using Uno’sCon-
cordance Index (C-index)47 for survival tasks and the Area Under the
ROC Curve (AUC) for classification tasks. These metrics were computed
for each cross-validation scheme, considering only the predictions in
the test sets of the 80 patients with a complete multimodal profile to
ensure a fair comparison among the different combinations (Supple-
mentary Fig. s4). Subsequently, the metrics were averaged over all 100
repetitions, and their standard deviation was calculated to measure the
variability resulting from the random partition of the data into 10 folds.
Statistical analyses - permutation tests
The significance of the results was assessed with one-sided permuta-
tion tests, running the pipeline described above 100 times with ran-
domly shuffled outcomes. Permutation tests were also used for
univariate predictors, along with 10,000-repeated bootstrap sampling
for computing 95% confidence intervals.
To compare the performance of the different multimodal models
and test for statistically significant differences, we applied a two-step
procedure. First, for each pair of combinations ði,jÞand each cross-
validation scheme s, the superiority of jover iwas assessed with a one-
sided paired permutation test48, resulting in 100 p-values ps
ij
100
s=1.
Subsequently, for each pair of combinations ði,jÞ,these100p-values
were adjusted using the Benjamini-Hochberg procedure (FDR con-
trolled at level α=0.05),and the frequencyof statistically significant
tests across the 100 tests was computed. The performance of the
different predictive models was estimated on the subset of patients
with a complete multimodal profile. Although the paired permutation
test described in Bandos et al.48 was originally designed to compare
two AUCs, we extended it to the comparison of two C-indexes since
both metrics are C statistics and the C-index can be seen as a gen-
eralization of the AUC for censored survival data.
Survival analyses
For overall survival, we evaluated the ability of each predictive model
to stratify patients into two distinct risk groups, including the models
trained to predict PFS or 6-month progression. First, for each fold of
each cross-validation scheme, we explored a range of thresholds going
from the 30th to the 70th percentiles of the training predictions and
selected the one that minimized the log-rank p-value on the training
set. For PFS-related models, we focused on the stratification of PFS to
find the best threshold, mimicking scenarios where OS is not available
during the training process. For classification tasks, the 0.5 threshold
was also considered. The learned threshold was then applied to the
corresponding test set, assigning patients to a low-risk group ora high-
risk group for overall survival. Risk group membership was thus col-
lected for each patient across the test sets of the cross-validation
scheme. This resulted in 100 group memberships for each patient,
corresponding to the 100 cross-validation schemes. Finally, these 100
assignments were aggregated by calculating the frequency of low-risk
and high-risk group assignments for each patient. Patients with a fre-
quency of high-risk group greater than 50% were assigned to the final
high-risk group, while those with a frequency strictly lower than 50%
were assigned to the final low-risk group. We compared the survival
distributions of the final low- and high-risk groups for each predictive
model with Kaplan-Meier curves and a log-rank test. The Benjamini-
Hochberg procedure was used to control for multiple testing (FDR
controlled at the level α=0:05). We focused on the subset of multi-
modal combinations which included clinical data to work with a suf-
ficiently large cohort (i.e., 265 patients with the 4 targets available for
fair comparisons). This analysis thus assessed the risk stratification
ability of multimodal models that incorporated multiple modalities
alongside clinical data, whenever they were available.
Finally, we derived a score from each multimodal predictive
model by collecting its predictions from the test sets of each cross-
validation scheme and averaging them over 100 repeats. This score
was used as input to a multivariate Cox model to predict patients’OS,
along with clinical features or unimodal scores. The Cox model was
then fitted on the 265 patients with the 4 targets available (i.e., OS,
1-year death, PFS, and 6-month progression) for fair comparison
between the models. All the input variables were first standardized to
ensure comparable hazard ratios. To address missing clinical values,
we used median imputation for continuous clinical features and the
most frequent imputation for categorical clinical features. For unim-
odal scores, missing values were replaced by 0.5 for classification
models and 0 for survival models, since all the models were calibrated
with nested cross-validation. We used lifelines v.0.27.4 Python package
to fit the Cox models. Likelihood-ratio tests were computed manually
with the difference between the log-likelihood of the two compared
models and a chi-squared test.
Feature importance analysis
For each algorithm and modality, we used the permutation explainer
provided by SHAP v.0.42.1 Python package49 to compute the SHAP
values for each feature and each patient. SHAP values were computed
only when the patient was in a test set of the cross-validation scheme.
This resulted in npatient s ×nf eatures SHAP values for each cross-
validation scheme, where npatients corresponds to the number of
patients with the modality of interest available and nf eatur es corre-
sponds to the number of features extracted for this modality. All these
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved
values were subsequently averaged across the 100 cross-validation
schemes to produce the final set of npatients ×nf eat ures mean SHAP
values.
A positive SHAP value for a patient pand a feature fmeans that
considering the feature fin the predictive model of interest increases
the patient p’s probability of death or progression for classification
tasks and the patient p’s risk of death or progression for survival tasks.
Conversely, a negative SHAP value means that fdecreases the patient
p’s probability or risk of death or progression.
For each data modality, we applied a three steps procedure to
combine the SHAP values from both related tasks (i.e., OS and 1-year
death, or PFS and 6-month progression) and both approaches (i.e.,
linear and tree ensemble methods) and obtain a consensus ranking of
features with respect to their importance in the predictive models for
overall survival or progression-free survival. First, we filtered out non-
robust features whose impact on the predictions was not consistent
across the four predictive models (for radiomics we did not consider
the Logistic Regression model since its AUC was lower than 0.5). To do
so, we computed, for each feature and each model, the Spearman
correlation between the SHAP values and the values of the feature of
interest and then filtered out the features whose correlation sign was
not consistent across the four models. Then we ranked the remaining
robust features for each of the four models with respect to their
absolute SHAP values averaged across all the patients. For each model
i2f1, ...,4gand each feature fwe thus obtained a rank ri
f2
f1, ...,nFg(with nFthe number of robust features) with 1 corre-
sponding to the least important feature and nFto the most important
one. Finally, we aggregated all these ranks across the four models to
obtain a consensus ranking, taking into account the performance of
each model. The consensus rank of the feature fwas defined as:
rcons
f=1
s1+s2+s3+s4X
4
i=1
sirf
ið1Þ
Where sicorresponds to the score of the model iand is equal to
maxð0, scorei0:5Þ(the score is either the AUC or the C-index). The
remaining robust features were also tested with univariate permuta-
tion tests both for OS and 1-year death (or PFS and 6-month
progression): 1000 univariate AUCs or C-indexes were generated with
permutated labels and then compared to the original AUC or C-index.
Features that remained statistically significant after the Benjamini-
Hochberg correction (FDR controlled at the level α= 0.05) were
reported in the consensus ranking.
The consensus ranks were normalized with respect to the total
number of consensus features. Each rank was also assigned a sign that
corresponded to the sign of the Spearman correlation coefficient
between the SHAP values and the values of the associated feature. A
positive sign means that the effectof the feature on the predicted risk/
probability of the event increases with the feature value, while a
negative sign means that the effect decreases with the feature value.In
this context, the term effect is linked to the SHAP value and can be
either positive (increasing predicted risk) or negative (decreasing
predicted risk).
Benchmark of transcriptomic signatures
We identified 36 transcriptomic biomarkers associated with immu-
notherapy response from a systematic literature search and curation50,
encompassing various cancer types and immune checkpoint inhibitors
(Supplementary Table s2). They were categorized into three groups:
marker genes biomarkers that focused on a subset of genes to com-
pute an overall score for each patient (22 biomarkers), GSEA bio-
markers that applied single-sample gene set enrichment analysis
(ssGSEA) to compare sets of marker genes with non-marker genes (10
biomarkers), and deconvolution biomarkers that used deconvolution
methods to estimate the abundance of different cell populations in
each sample (e.g., CD8 T cells) and combined these estimates into a
score (4 biomarkers). We implemented all these biomarkers using
Python.
For each of the four prediction tasks (i.e., OS, PFS, 1-year death,
and 6-month progression), we evaluated the performance of all tran-
scriptomic signatures using the C-index for survival tasks and the AUC
for classification tasks. We applied the same 100 cross-validation
schemes as in previous experiments, focusing on the subset of 80
patients with a complete multimodal profile. For signatures that
included pre-processing steps, such as standardization or PCA, these
were first trained on the training set of each fold and then applied to
the corresponding test set. We then compared their performance to
that of the best multimodal model and the best transcriptomic model
previously obtained for each task.
Reporting summary
Further information on research design is available in the Nature
Portfolio Reporting Summary linked to this article.
Data availability
The raw data generated in this study, including PET/CT scans, digitized
pathological slides, and RNA-seq profiles, are not publiclyavailable due
to patient privacy requirements. Curated clinical data are available
under restricted access due to patient privacy requirements, access
can be obtained upon request to Emmanuel Barillot and NicolasGirard.
Derived transcriptomic, radiomic, and pathomic features, as well as
clinical outcomes (OS, PFS, and best observed RECIST response), are
available at https://doi.org/10.5281/zenodo.14293431. The results from
the experiments performed in this study are provided as a Source Data
file. Source data are provided in this paper.
Code availability
We have made all our codes available in GitHub repositories with
associated documentation allowing for the reproduction of our mul-
timodal analyses with external data, additional modalities, or different
features (https://github.com/sysbio-curie/multipit,https://github.
com/sysbio-curie/deep-multipit). The Python code to compute the
36 transcriptomic signatures and reproduce the benchmarks pre-
sented in Fig. 7is also available on GitHub (https://github.com/sysbio-
curie/tipit_benchmark_RNA). The Python code to reproduce the fig-
ures is provided in the Source Data file.
References
1. Hendriks, L. E. et al. Non-oncogene-addicted metastatic non-small-
cell lung cancer: ESMO Clinical Practice Guideline for diagnosis,
treatment and follow-up. Ann. Oncol. 34,358–376 (2023).
2. Reck,M.etal.Updatedanalysis of KEYNOTE-024: Pembrolizumab
versus platinum-based chemotherapy for advanced non-small-cell
lung cancer with PD-L1 tumor proportion score of 50% or greater. J.
Clin. Oncol. 37,537–546 (2019).
3. Gandhi, L. et al. Pembrolizumab plus chemotherapy in metastatic
non-small-cell lung cancer. N. Engl. J. Med. 378,2078–2092 (2018).
4. Paz-Ares, L. et al. A randomized, placebo-controlled trial of pem-
brolizumab plus chemotherapy in patients with metastatic squa-
mous NSCLC: Protocol-specified final analysis of KEYNOTE-407. J.
Thorac. Oncol. 15,1657–1669 (2020).
5. Herbst, R. S. et al. Atezolizumab for first-line treatment of PD-L1-
selected patients with NSCLC. N. Engl. J. Med. 383,1328–1339
(2020).
6. Hellmann, M. D. et al. Nivolumab plus Ipilimumab in advanced non-
small-cell lung cancer. N. Engl. J. Med. 381,2020–2031 (2019).
7. Joshi, I. et al. Impact of baseline clinical biomarkers on treatment
outcomes in patients with advanced NSCLC receiving first-line
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 17
Content courtesy of Springer Nature, terms of use apply. Rights reserved
pembrolizumab-based therapy. Clin. Lung Cancer 23,438–445
(2022).
8. Ahn, B. C. et al. Clinical decision support algorithm based on
machine learning to assess the clinical response to anti-
programmed death-1 therapy in patients with non-small-cell lung
cancer. Eur. J. Cancer 153,179–189 (2021).
9. Trebeschi,S.,Drago,S.G.,Birkbak,N.J.,Kurilova,I.,Cǎlin, AM., &
Delli Pizzi, A. et al. Predicting response to cancer immunotherapy
using noninvasive radiomic biomarkers. Ann. Oncol. 30, 998–1004
(2019).
10. Mu, W. et al. Radiomics of 18F-FDG PET/CT images predicts clinical
benefit of advanced NSCLC patients to checkpoint blockade
immunotherapy. Eur. J. Nucl. Med. Mol. Imaging 47, 1168–1182
(2020).
11. Deng, J. et al. Genopathomic profiling identifies signatures for
immunotherapy response of lung adenocarcinoma via confounder-
aware representation learning. iScience 25,https://doi.org/10.
1016/j.isci.2022.105382 (2022).
12. Barrera, C. et al. Deep computational image analysis of immune cell
niches reveals treatment-specific outcome associations in lung
cancer. NPJ Precis. Npj Precis. Oncol. 7, 52 (2023).
13. Patil,N.S.etal.Intratumoralplasma cells predict outcomes to PD-
L1 blockade in non-small cell lung cancer. Cancer Cell. 40,
289–300 (2022).
14. Ravi, A. et al. Genomic and transcriptomic analysis of checkpoint
blockade response in advanced non-small cell lung cancer. Nat.
Genet. 55,807–819 (2023).
15. Zhao,Q.,Xie,R.,Lin,S.,You,X.&Weng,X.Anti-PD-1/PD-L1antibody
therapy for pretreated advanced or metastatic nonsmall cell lung
carcinomas and the correlation between PD-L1 expression and
treatment effectiveness: An update meta-analysis of randomized
clinical trials. Biomed. Res Int. 2018, 3820956 (2018).
16. Vanguri, R. S. et al. Multimodal integration of radiology, pathology
and genomics for prediction of response to PD-(L)1 blockade in
patients with non-small cell lung cancer. Nat. Cancer 3, 1151–1164
(2022).
17. Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based
models still outperform deep learning on typical tabular data? Adv.
Neural Inf. Process. Syst. 35,507–520 (2022).
18. Ishwaran, H., Kogalur, U., Blackstone, E. & Lauer, M. Random sur-
vival forests. Ann. Appl. Stat. 2,841–860 (2008).
19. Chen, T., Guestrin, C. XGBoost: A Scalable Tree Boosting System. In
Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (2016).
20. Becht, E. et al. Estimating the population abundance of tissue-
infiltrating immune and stromal cell populations using gene
expression. Genome Biol. 17, 218 (2016).
21. Sun, D. et al. Classification of tumor immune microenvironment
accordingtoprogrammeddeath-ligand1expressionandimmune
infiltration predicts response to immunotherapy plus chemother-
apy in advanced patients with NSCLC. J. Thorac. Oncol. 18,
869–881 (2023).
22. Li, A. et al. STK11/LKB1-Deficient phenotype rather than mutation
diminishes immunotherapy efficacy and represents STING/type I
interferon/CD8 + T-cell dysfunction in NSCLC. J. Thorac. Oncol. 18,
1714–1730 (2023).
23. Seban, R. D. et al. Baseline metabolic tumor burden on FDG PET/CT
scans predicts outcome in advanced NSCLC patients treated with
immune checkpoint inhibitors. Eur. J. Nucl. Med. Mol. Imaging 47,
1147–1157 (2020).
24. Broz, M. L. et al. Dissecting the tumor myeloid compartment reveals
rare activating antigen-presenting cells critical for T cell immunity.
Cancer Cell. 26,638–652 (2014).
25. Sánchez-Paulete, A. R. et al. Cancer immunotherapy with immu-
nomodulatory anti-CD137 and anti-PD-1 monoclonal antibodies
requires BATF3-dependent dendritic cells. Cancer Discov. 6,
71–79 (2016).
26. Salmon, H. et al. Expansion and activation of CD103(+) dendritic cell
progenitors at the tumor site enhances tumor responses to ther-
apeutic PD-L1 and BRAF inhibition. Immunity 44,924–938 (2016).
27. Barry, K. C. et al. A natural killer-dendritic cell axis defines check-
point therapy-responsive tumor microenvironments. Nat. Med. 24,
1178–1191 (2018).
28. Goc, J. et al. Dendritic cells in tumor-associated tertiary lymphoid
structures signal a Th1 cytotoxic immune contexture and license
the positive prognostic value of infiltrating CD8 + T cells. Cancer
Res. 74,705–715 (2014).
29. Cabrita, R. et al. Tertiary lymphoid structures improve immu-
notherapy and survival in melanoma. Nature 577,561–565 (2020).
30. Ballman, K. Biomarker: Predictive or prognostic? J. Clin. Oncol. 33,
3968–3971 (2015).
31. Stares, M. et al. Hypoalbuminaemia as a prognostic biomarker of
first-line treatment resistance in metastatic non-small cell lung
cancer. Front. Nutr. 8, 734735 (2021).
32. Teramukai, S. et al. Pretreatment neutrophil count as an indepen-
dent prognostic factor in advanced non-small-cell lung cancer: an
analysis of Japan Multinational Trial Organisation LC00-03. Eur. J.
Cancer 45,1950–1958 (2009).
33. Park, J. et al. Automatic lung cancer segmentation in [18 F]FDGPET/
CT using a two-stage deep learning approach. Nucl. Med. Mol.
Imaging 57,86–93 (2023).
34. Girum, K. B. et al. 18F-FDG P. E. T. Maximum-intensity projections
and artificial intelligence: A Win-Win combination to easily measure
prognostic biomarkers in DLBCL patients. J. Nucl. Med. 63,
1925–1932 (2022).
35. Nioche, C. et al. LIFEx: A freeware for radiomic feature calculation in
multimodality imaging to accelerate advances in the characteriza-
tion of tumor heterogeneity. Cancer Res. 78,4786–4789 (2018).
36. Goldstraw, P. et al. The IASLC lung cancer staging project: Propo-
sals for revision of the TNM stage groupings in the forthcoming
(Eighth) edition of the TNM classification for lung cancer. J. Thorac.
Oncol. 11,39–51 (2016).
37. Zwanenburg, A., Vallières, M. & Abdalah, M. A. et al. The Image
Biomarker Standardization Initiative: Standardized Quantitative
Radiomics for High-Throughput Image-based Phenotyping. Radi-
ology 295,328–338 (2020).
38. van Griethuysen, J. J. M. et al. Computational radiomics system to
decode the radiographic phenotype. Cancer Res. 77,e104–e107
(2017).
39. Cottereau,A.S.etal.Riskstratification in diffuse large B-cell lym-
phoma using lesion dissemination and metabolic tumor burden
calculated from baseline PET/CT†.Ann. Oncol. 32,404–411 (2021).
40. Lerousseau, M. et al. Weakly supervised multiple instance learning
histopathological tumor segmentation. Medical Image Computing
andComputerAssistedIntervention(2020).
41. Bülow, R. D., Hölscher, D. L., Costa, I. G. & Boor, P. Extending the
landscape of omics technologies by pathomics. NPJ Syst. Biol. Appl.
9, 38 (2023).
42. Servant, N., La Rosa, P. & Phupe, A. F. bioinfo-pf-curie/RNA-seq:
v4.0.0 (v4.0.0). Zenodo. https://doi.org/10.5281/zenodo.
7837455 (2023).
43. Jessen,E.,Liu,Y.,Davila,J.,Kocher,J.P.&Wang,C.Determining
mutational burden and signature using RNA-seq from tumor-only
samples. BMC Med. Genomics. 14, 65 (2021).
44. Lai, Z. et al. VarDict: a novel and versatile variant caller for next-
generation sequencing in cancer research. Nucleic Acids Res. 44,
e108 (2016).
45. Niculescu-Mizil, A. & Caruana, R. Predicting good probabilities with
supervised learning. Proceedings of the 22nd International Con-
ference on Machine Learning (2005).
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 18
Content courtesy of Springer Nature, terms of use apply. Rights reserved
46. Engemann, D. A. et al. Combining magnetoencephalography with
magnetic resonance imaging enhances learning of surrogate-
biomarkers. Elife 9, e54055 (2020).
47. Uno, H., Cai, T., Pencina, M. J., D’Agostino, R. B. & Wei, L. J. On the
C-statistics for evaluating overall adequacy of risk prediction pro-
cedures with censored surviva l data. Stat. Med. 30, 1105–1117 (2011).
48. Bandos, A. I., Rockette, H. E. & Gur,D. A permutation test sensitive to
differences in areas for comparing ROC curves from a paired
design. Stat. Med. 24,2873–2893 (2005).
49. Lundberg,S.M.&Lee,S.I.Aunified approachto interpreting model
predictions. Advances in Neural Information Processing Systems.
30 (2017).
50. Kang, H. et al. A Comprehensive benchmark of transcriptomic
biomarkers for immune checkpoint blockades. Cancers 15,4094
(2023).
Acknowledgements
We would like to thank the following collaborators at Institut Curie for
their valuable support in data management and processing: A. Nicolas,
R. Goudefroye, C. Martinat, M. Bouvet, and A. Vincent-Salomon from the
experimental pathology platform, L. Chanas, and M. Milder from Institut
Curie’sDataOffice, T. Ramtohul, and H. Brisse from the Department of
Radiology, S. Baulande from the Next-Generation Sequencing platform,
I. Bièche, and C. Callens from the Diagnostic and Theranostic Medicine
Division, C. Reyes, A. Rapinat, and D. Gentien from the Genomics plat-
form, Eugénie Genestant from the Computational Systems Biology of
Cancer team, and C. Kamoun, N. Servant, and P. Hupé from the bioin-
formatics core facility. We also thank M. Lefevre and S. Lefranc from
Institut Mutualiste Montsouris. This work was part of the TIPIT project
(Towards an Integrative Approach for Precision ImmunoTherapy) funded
by Fondation ARC call «SIGN’IT 2020—Signatures in Immunotherapy».
The present study was also supported by the French government under
the management of Agence Nationale de la Recherche as part of the
‘Investissements d’avenir’program, reference ANR-19-P3IA-0001
(PRAIRIE 3IA Institute).
Author contributions
N.C. processed the collected data, performed the analyses, imple-
mented the computational tools, and wrote the manuscript. M.Le.
designed and developed the feature extraction pipeline for pathological
data and provided input on machine learning analysis. F.O. supervised
the collection of PET scans and clinical data and provided input on
radiomic and machine learning analysis. N.H.-B. processed segmented
PET scans and provided input on radiomic analysis. M.Lu. and E.W.
segmented and annotated PET scans and provided input on radiomic
analysis. S.L. managed data collection and patient recruitment. P.S.F.
collected and curated clinical data. C.L. estimated the TMB from the
RNA-seq data and provided input on omics analysis. C.B. collected
pathological slides and provided input on pathological analysis. A.Z.
provided input on omics and machine learning analysis. H.S. analyzed
pathological slides and provided input on the biological interpretation of
predictive models. T.W. supervised the collection of pathological slides,
pathological analysis, and machine learning analysis. I.B. supervised the
collection of PET scans, radiomic analysis, and machine learning ana-
lysis. N.G. supervised data collection, patient recruitment, and data
analysis. E.B. supervised the collection of omics data, omics analysis,
and machine learning analysis. F.O., A.Z., T.W., I.B., N.G., and E.B.
designed the study. N.G. and E.B. led the project. M.Le., F.O., C.L., H.S.,
T.W., I.B., N.G., and E.B. revised and edited the manuscript. All authors
approved the manuscript.
Competing interests
Nicolas Girard has a consulting or advisory role for the following com-
panies: Abbvie, AMGEN, AstraZeneca, BeiGene, Bristol-Myers Squibb,
Daiichi Sankyo/Astra Zeneca, Gilead Sciences, Ipsen, Janssen, LEO
Pharma,Lilly,MSD,Novartis,Pfizer, Roche, Sanofi, Takeda. The other
authors declare no competing interests.
Additional information
Supplementary information The online version contains
supplementary material available at
https://doi.org/10.1038/s41467-025-55847-5.
Correspondence and requests for materials should be addressed to
Nicolas Captier or Emmanuel Barillot.
Peer review information Nature Communications thanks Sanguk Kim
and the other anonymous reviewer(s) for their contribution to the peer
review of this work. A peer review file is available.
Reprints and permissions information is available at
http://www.nature.com/reprints
Publisher’s note Springer Nature remains neutral with regard to jur-
isdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as
long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indicate if
changes were made. The images or other third party material in this
article are included in the article’s Creative Commons licence, unless
indicated otherwise in a credit line to the material. If material is not
included in the article’s Creative Commons licence and your intended
use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright
holder. To view a copy of this licence, visit http://creativecommons.org/
licenses/by/4.0/.
© The Author(s) 2025
1
Laboratoire d’Imagerie Translationnelle en Oncologie, Institut Curie, Inserm U1288, PSL Research University, Orsay, France.
2
Bioinformatics and computa-
tional systems biology of cancer, Institut Curie, Inserm U900, PSL Research University, Paris, France.
3
CBIO-center for Computational Biology, MINES
ParisTech, PSL Research University, Paris, France.
4
Department of medical imaging, Institut Curie, Paris, France.
5
Department of Nuclear Medicine/PET-scan,
Institut Jules Bordet, Université Libre de Bruxelles, Brussels, Belgium.
6
Institut du Thorax Curie-Montsouris, Institut Curie, Paris, France.
7
Department of
pathology, Institut Curie, Paris, France.
8
In silico R&D, Evotec, Toulouse, France.
9
Immunity and cancer, Institut Curie, Inserm U932, PSL Research University,
Paris, France.
10
These authors contributed equally: Thomas Walter, Irène Buvat, Nicolas Girard, Emmanuel Barillot.
e-mail: nicolas.captier@polytechnique.org;emmanuel.barillot@curie.fr
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 19
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
Available via license: CC BY 4.0
Content may be subject to copyright.