ArticlePDF Available

Integration of clinical, pathological, radiological, and transcriptomic data improves prediction for first-line immunotherapy outcome in metastatic non-small cell lung cancer

Springer Nature
Nature Communications
Authors:
  • Institut Curie - Inserm

Abstract and Figures

Immunotherapy is improving the survival of patients with metastatic non-small cell lung cancer (NSCLC), yet reliable biomarkers are needed to identify responders prospectively and optimize patient care. In this study, we explore the benefits of multimodal approaches to predict immunotherapy outcome using multiple machine learning algorithms and integration strategies. We analyze baseline multimodal data from a cohort of 317 metastatic NSCLC patients treated with first-line immunotherapy, including positron emission tomography images, digitized pathological slides, bulk transcriptomic profiles, and clinical information. Testing multiple integration strategies, most of them yield multimodal models surpassing both the best unimodal models and established univariate biomarkers, such as PD-L1 expression. Additionally, several multimodal combinations demonstrate improved patient risk stratification compared to models built with routine clinical features only. Our study thus provides evidence of the superiority of multimodal over unimodal approaches, advocating for the collection of large multimodal NSCLC datasets to develop and validate robust and powerful immunotherapy biomarkers.
Feature importance ranking for the prediction of overall survival, for clinical and transcriptomic modalities Feature importance ranking was obtained by aggregating the SHAP values collected from both tasks (OS and 1-year death) and both approaches (linear and tree ensemble methods) (see Methods). Features that were significantly associated with 1-year death (one-sided permutation test with univariate AUCs) after Benjamini-Hochberg (BH) correction (α=0.05\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha=0.05$$\end{document}) are shown with a * on the left side, while features that were significantly associated with OS (one-sided permutation test with univariate C-index) after BH correction are annotated with a * on the right side. * corresponds to an adjusted p-value below 0.05. A Consensus feature importance ranking for the clinical data modality (left) and heatmap of correlations between consensus clinical features (right). Correlations were evaluated by Spearman correlation coefficients (for continuous feature vs continuous feature), AUCs rescaled to [−1,1]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-1,\,1]\,$$\end{document}(for continuous feature vs binary categorical feature), or Matthews correlation coefficient (for binary categorical feature vs binary categorical feature). B Consensus feature importance ranking for the RNA data modality (left) and heatmap of Spearman correlations between consensus RNA features (right). Source data are provided as a Source Data file.
… 
Marginal contribution of each modality to the multimodal predictions for late fusion strategy and XGBoost classifiers A Heatmap of the marginal contribution (i.e., Shapley value) of each modality to the 1-year death prediction using the C + R + RNA late fusion model with XGBoost classifiers. Marginal contributions indicate how each modality influences the prediction relative to a random baseline of 0.5. Patients are stratified based on the multimodal model’s final prediction (with a 0.5 threshold), where the positive class corresponds to those who died within 1 year, and the negative class corresponds to those who survived. B For each modality and patient in clusters 1 and 2 (see A), represented by vertical lines, this plot shows the feature with the highest SHAP value that aligns with the modality’s marginal contribution. The size of each triangle indicates the absolute SHAP value, while its orientation corresponds to its sign (up for positive values that increase the predicted probability of death within 1 year and down for negative values that decrease it). The color scale represents the associated feature value relative to the whole patient cohort. C Relationship between the unimodal predictions from clinical, radiomic, and RNA modalities (i.e., unimodal tree ensemble models). Each dot is colored according to the patient’s true label. *In these plots, all marginal contributions, SHAP values, and predictions were obtained for the 77 patients with complete multimodal profiles and available 1-year death labels across the cross-validation test sets. They were collected for each of the 100 cross-validation schemes (see Methods) and subsequently averaged for each patient. Source data are provided as a Source Data file.
… 
This content is subject to copyright. Terms and conditions apply.
Article https://doi.org/10.1038/s41467-025-55847-5
Integration of clinical, pathological,
radiological, and transcriptomic data
improves prediction for rst-line
immunotherapy outcome in metastatic
non-small cell lung cancer
Nicolas Captier
1,2
,MarvinLerousseau
2,3
, Fanny Orlhac
1
,
Narinée Hovhannisyan-Baghdasarian
1
,MarieLuporsi
1,4
,ErwinWoff
1,5
,
Sarah Lagha
6
, Paulette Salamoun Feghali
6
, Christine Lonjou
2
,
Clément Beaulaton
7
,AndreiZinovyev
8
, Hélène Salmon
9
,
Thomas Walter
2,3,10
,IrèneBuvat
1,10
, Nicolas Girard
6,10
&
Emmanuel Barillot
2,10
Immunotherapy is improving the survival of patients with metastatic non-
small cell lung cancer (NSCLC), yet reliable biomarkers are needed to identify
responders prospectively and optimize patient care. In this study, we explore
the benets of multimodal approaches to predict immunotherapy outcome
using multiple machine learning algorithms and integration strategies. We
analyze baseline multimodal data from a cohort of 317 metastatic NSCLC
patients treated with rst-line immunotherapy, including positron emission
tomography images, digitized pathological slides, bulk transcriptomic pro-
les, and clinical information. Testing multiple integration strategies, most of
them yield multimodal models surpassing both the best unimodal models and
established univariate biomarkers, such as PD-L1 expression. Additionally,
several multimodal combinations demonstrate improved patient risk strati-
cation compared to models built with routine clinical features only. Our study
thus provides evidence of the superiority of multimodal over unimodal
approaches, advocating for the collection of large multimodal NSCLC datasets
to develop and validate robust and powerful immunotherapy biomarkers.
Anti PD-1/PD-L1 immunotherapy with or without chemotherapy is the
current standard rst-line therapy for metastatic non-small cell lung
cancer (NSCLC) without actionable oncogene alterations and without
contraindications to PD-1/PD-L1 inhibitors1. Several clinical trials have
indeed demonstrated signicantly improved Overall Survival (OS) and
Progression-Free Survival (PFS) with immunotherapy in comparison to
chemotherapy alone26. Nevertheless, half of the patients do not pre-
sent a radiological response to immunotherapy, and the duration of
response remains highly variable from one patient to another
(ranging from 1.1 to 18 months for patients treated with rst-line
immunotherapy + chemotherapy)3. Ultimately, the number of patients
with long-term survival is limited. There is thus a critical need for
Received: 9 February 2024
Accepted: 31 December 2024
Check for updates
A full list of afliations appears at the end of the paper. e-mail: nicolas.captier@polytechnique.org;emmanuel.barillot@curie.fr
Nature Communications | (2025) 16:614 1
1234567890():,;
1234567890():,;
Content courtesy of Springer Nature, terms of use apply. Rights reserved
biomarkers that can predict treatment response accurately. These
biomarkers will pave the way to better personalize the treatment
strategy immunotherapy as single agent for patients with predicted
prolonged survival, combination with chemotherapy or other agents
for patients with predicted poor response and survival -, to customize
the follow-up and assess adequately treatment sequences.
Machine learning approaches have recently shown their potential
to leverage data collected before treatment initiation, including
clinical7,8, radiological9,10, anatomopathological11,12, or transcriptomic
information13,14, and develop robust prognostic and predictive models
that could outperform approved univariate biomarkers such as PD-L1
expression15. Promising results have fostered the exploration of mul-
timodal approaches to combine all the diverse aspects of the disease
that these different modalities probe. Yet, evidence of the superiority
of multimodal over unimodal biomarkers16 remains limited, possibly
due to challenges in gathering comprehensive multimodal cohorts.
Therefore, there is a pressing need for new studies involving large and
homogeneous NSCLC multimodal cohorts to fully explore the benets
of multimodality and design strategies to address the challenges
associated with integrating multimodal data.
In this study, we conduct a thorough comparison of unimodal and
multimodal approaches for predicting the outcome of metastatic
NSCLC patients undergoing rst-line immunotherapy. Using a new
multimodal cohort of 317 patientsincluding clinical data, PET/CT
scans, digitized pathological slides, and bulk RNA-seq datawe
demonstrate the superiority of multimodal approaches across the
majority of the explored predictive algorithms and integration stra-
tegies. Mapping each modality to a set of interpretable features, we
also identify the most inuential factors for immunotherapy outcomes
and explore their complementarity. These results could guide future
research, fostering efforts to collect and analyze large multimodal
cohorts, ultimately leading to the development and validation of a new
generation of multimodal biomarkers that could transform NSCLC
patient care.
Results
Clinical characteristics of patients with metastatic NSCLC
We identied 317 NSCLC patients treated at Institut Curie, who met the
inclusion criteria: patients with histologically proven advanced NSCLC
who received anti-PD-(L)1 immunotherapy, specically pem-
brolizumab, as their rst-line treatment. Immunotherapy was admi-
nistered either as a standalone treatment for patients with a PD-L1
expression greater than 50% or in combination with chemotherapy,
regardless of the PD-L1 expression, as per clinical practice guidelines1.
PD-L1 expression was evaluated by immunohistochemistry (Sp263 and
QR1 assays), with the Tumor Proportion Score (TPS) representing the
percentage of tumor cells exhibiting membrane PD-L1 staining. The
patients who received pembrolizumab as monotherapy were treated
between October 2017 and January 2023 while those who received
pembrolizumab combined with chemotherapy were treated between
July 2019 and January 2023. The clinical characteristics of the multi-
modal cohort are detailed in Table 1.
Median OS and PFS wererespectively723 days (95% CI [446987])
and 301 days (95% CI [145598]) for the patients treated with immu-
notherapy alone, and 763 days (95% CI [576-NR]) and 290 days (95% CI
[241372]) for the patients treated with a combination of immu-
notherapy and chemotherapy (Fig. 1A). Interestingly, no signicant
difference was observed between the two treatment groups for OS
(log-rank p-value = 0.44, Fig. 1B), even for the patients with PD-L1
expression greater than 50% only (Supplementary Fig. s1). We
observed that for PFS, the immunotherapy + chemotherapy group had
fewer early progressors, although this was compensated by an increase
in late progressors compared to the immunotherapy-only
group (Fig. 1A).
Standard univariate biomarkers show limited predictive power
PD-L1 expression was able to stratify patients, with signicant differ-
ences in PFS and OS in patients with negative expression (< 1%) from
those with positive expression (1%) (Fig. 1B and Supplementary
Fig. s2). However, it yielded a mild performance as a univariate bio-
marker for patient survival (C-index OS = 0.54, bootstrap 95% CI
[0.510.57], permutation p-value = 0.014, n= 298) . Besides, no sig-
nicant performance was observed when using PD-L1 expression as a
continuous score, where negative expressions were replaced by 0%,
and the score was calculated as 100 - TPS (C-index OS = 0.53, bootstrap
95% CI [0.480.58], permutation p-value = 0.104, n= 295). Other stan-
dard clinical biomarkers, such as the Tumor Mutational Burden (TMB)
or Tumor Inltrating Lymphocytes (TILs) with TILs being semi-
quantitatively assessed on routine pathological sections without any
cutoff did not exhibit signicant association with patient outcome
(Fig. 1C and Supplementary Fig. s3).
Collection of multiple baseline modalities to predict immu-
notherapy outcome
Clinical information from routine care, [18F]FDG-PET/CT scans, digi-
tized pathological slides from the initial diagnosis, and bulk RNA-seq
proles from solid biopsies were collected at baseline. For each data
modality, we rst selected and computed several hand-crafted fea-
tures to serve as input for both unimodal and multimodal predictive
models, including 30 clinical features, 30 radiomic features, 134
pathomic features, and 34 transcriptomic features. We then leveraged
this multimodal dataset to conduct an extensive comparison of the
performance of unimodal and multimodal approaches using a 10-fold
cross-validation scheme applied to the entire cohort and repeated 100
times (Supplementary Fig. s4).
237 out of the 317 patients had at least one missing modality
(Fig. 1D). To ensure a fair comparison of all possible modality combi-
nations, we, therefore, restricted the evaluation of prediction perfor-
mance to the 80 patients with a complete multimodal prole (i.e.,
collecting the predictions of these 80 patients only, from the test sets
of the cross-validation scheme applied to the whole cohort; Supple-
mentary Fig. s4). Log-rank tests indicated no signicant differences
between the survival distributions of patients with missing modalities
and those with available modalities (Supplementary Fig. s5).
Comparison of unimodal performances across multiple
prediction tasks
We rst evaluated the predictive value of each modality individually.
This evaluation involved predicting risk scores for time-to-event out-
comes (OS and PFS) and classifying patients into two groups: those
who would die within one year of treatment and those who would not
(1-year death), or those who would experience disease progression
before 6 months of treatment and those who would not (6-month
progression). We focused on two standard Machine Learning
approaches that are well-suited for datasets with modest numbers of
samples17: linear methods (logistic regression and Cox regression with
elastic net penalties) and tree ensemble methods (Random Survival
Forest18 andgradient-boostedtree
19 algorithms). The four modalities
exhibited varying degrees of predictive power for patient outcome,
with the RNA modality standing out for the prediction of 1-year death
(AUC = 0.75 ± 0.04 1std); Table 2and Supplementary Table s1). PFS
and 6-month progression predictions were more challenging than OS
and 1-year death predictions. Except for pathological data, all mod-
alities yielded greater performance (using either linear or tree
ensemble algorithms) in predicting OS and 1-year death compared to
PFS and 6-month progression. Across all modalities and models, the
highest scores achieved were a C-index of 0.59 ( ± 0.02) for RNA
modality in predicting PFS and an AUC of 0.61 0.03) for clinical,
pathomic, or RNA modality in predicting 6-month progression.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 2
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Feature importance analyses highlight relevant clinical and
transcriptomic features
We rst investigated feature importance for the prediction of OS and
1-year death, providing insights into the information learned by each
unimodal model (see Methods). Notably, it revealed that clinical
models consistently learned that patients with a low level of serum
albumin, a negative PD-L1 status (i.e., TPS < 1%), or abundant circulat-
ing neutrophils were more likely to have a poor prognosis (Fig. 2A).
This analysis also highlighted the transcriptomic features that the RNA
models used to predict OS and 1-year death (Fig. 2B). RNA models
consistently associated an abundance of dendritic cells (DC), as scored
by the MCP-counter method20, or a high expression of NTRK1 gene
with a good prognosis while they associated high expression of NRAS
and KRAS genes with a poor prognosis. Interestingly, among the 13
consensus transcriptomic features identied with feature importance
analysis (see Methods), only 3 exhibited signicantly different values
between biopsy sites in the 84 patients for whom this information was
available (Supplementary Fig. s6), suggesting that the other 10 may be
used independently of the biopsy location. Radiomic models primarily
focused on information related to the Total Metabolic Tumor Volume
(TMTV) as well as to the total metabolic volume of extra-thoracic
metastases (Supplementary Fig. s7). Lastly, the interpretation of
pathomic models unveiled features that encoded the proportion of
inammatory cells within the biopsy sections as well as their spatial
organization (Supplementary Fig. s8).
We then turned to feature importance analysis of unimodal
models for the prediction of PFS and 6-month progression. Notably, it
conrmed that a high expression of the NTRK1 gene and an abundance
of dendritic cells was associated with a favorable prognosis since they
were also ranked among the top ten most important transcriptomic
features for multivariate predictions and showed signicant univariate
association with both PFS and 6-months progression (Supplementary
Fig. s9A). Similarly, it highlighted the favorable inuence of positive
PD-L1 expression or a high level of serum albumin on the prognosis of
multivariate models and the negative inuence of a high neutrophils-
to-lymphocytes ratio (Supplementary Fig. s9B). Finally, this analysis
showed that, similarly to OS and 1-year death prediction, radiomic
models were predominantly driven by the TMTV (Supplementary
Fig. s10A).
The consensus important features (see Methods) identied for OS
and 1-year death predictions, along with those for PFS and 6-month
progression predictions exhibited mild to low inter-modal correlations
(Supplementary Fig. s11), suggesting that the different collected
modalities may capture distinct aspects of each patientscondition
and response to therapy.
Late fusion of unimodal predictors improves the prediction of
immunotherapy outcome
We then developed multimodal predictors with the hypothesis that
multimodal data would provide richer and more comprehensive
information. We rst applied late fusion as a baseline strategy for
integrating all the modalities into multimodal predictors of OS, 1-year
death, PFS, and 6-month progression (Supplementary Fig. s4). Late
fusion consists of averaging the predictions of each individual unim-
odal predictor. We tested every possible combination of two to four
modalities for each predictive task using both linear and tree ensemble
algorithms. The late fusion of tree ensemble models improved the
prediction of patient outcomes across both classication and survival
tasks (Fig. 3and Supplementary Fig. s12). Specically,for1-yeardeath,
the combination of predictions from clinical, RNA, and radiomic
Table 1 | Clinical characteristics of the multimodal cohort and the subset of patients with a complete multimodal prole
Clinical characteristics Multimodal
cohort (n=317)
Immuno +
chemo (n=196)
Immuno
alone (n=121)
Subset with all
modalities (n=80)
Statistical compar-
ison (80 vs 237)
Age median (range) 66 (33-92) 64 (33-84) 69 (40-92) 64 (37-82) p1
val =4.9e-3
Sex - n (%) Men 189 (60) 113 (58) 76 (63) 46 (57) p2
val =6.0e-1
Women 128 (40) 83 (42) 45 (37) 34 (43)
1st line therapy
n(%)
Pembrolizumab+ chemotherapy 196 (62) 196 (100) 55 (69) p2
val =9.2e-2
Pembrolizumab 121 (38) 121 (100) 25 (31)
Histology - n (%) Adenocarcinomas 232 (73) 152 (77) 80 (66) 54 (68) p2
val =2.8e-2
Squamous cell carcinomas 44 (14) 17 (9) 27 (22) 13 (16)
Other subtypes/not available 41 (13) 27 (14) 14 (12) 13 (16)
PD-L1 expression
n(%)
50% 163 (51) 49 (25) 114 (94) 42 (52) p2
val =6.2e-1
149% 82 (26) 78 (40) 4 (3) 23 (29)
Negative 56 (18) 56 (29) 0 (0) 11 (14)
Not available 16 (5) 13 (6) 3 (3) 4 (5)
Smoking status
n(%)
Current/former 287 (91) 180 (92) 107 (88) 71 (89) p2
val =6.9e-1
Never 29 (9) 15 (8) 14 (12) 8 (10)
Not available 1 (< 1) 1 (<1) 0 (0) 1 (1)
Performance status
n(%)
ECOG 0/1 244 (77) 158 (81) 86 (71) 71 (89) p2
val =1.5e-3
ECOG 236(11)14(7)22(18)1(1)
Not available 37 (12) 24 (12) 13 (11) 8 (10)
TILs n (%) Positive 159 (50) 82 (42) 77 (64) 49 (61) p2
val =2.8e-2
Negative 18 (6) 9 (5) 9 (7) 3 (4)
Not available 140 (44) 105 (53) 35 (29) 28 (35)
Median Overall Survival days (95% CI) 756 (592910) 723 (446987) 763 (576-NR) 846 (650-NR) p3
val =2.7e-1
Median Progression Free Survival days (95% CI) 296 (241372) 301 (145598) 290 (241372) 386 (275711) p3
val =1.6e-2
ECOG Eastern Cooperative Oncology Group, TILs Tumor-Inltrating Lymphocytes, NR Not Reached, p1
val Welchsttest p-value - p2
val Chi-squared test p-value - p3
val Log-rank test p-value.
The results of the statisticalcomparison between the subset of patients with a complete prole and the restof the cohort, using two-sided Welchsttests, one-way Chi-squared tests,and Log-rank
tests, ar e prese nted in th e last co lumn.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved
models demonstrated the highest performance (AUC = 0.81 ± 0.03)
while, for OS, the combination of predictions from clinical and RNA
models performed best (C-index = 0.75 ± 0.01). For both OS and 1-year
death, paired-permutation tests conrmed the signicantly higher
AUC and C-index for the combined model compared to clinical-only,
radiomics-only, and pathomics-only models (Supplementary Fig. s13).
For PFS, the combination of predictions from clinical, RNA, pathomic,
and radiomic models yielded the best performance while, for 6-month
progression, it was the combination of predictions of cl inical, RNA, and
pathomic models. However, the performance of these two combina-
tions was not signicantly different from those of unimodal models
with paired-permutation tests. The late fusion of linear models per-
formed better than tree ensemble models for the prediction
of 6-month progression only, with a combination of clinical,
pathomic, and RNA predictions yielding an AUC of 0.67 0.03)
(Supplementary Fig. s14). This can be explained by the greater
A. B.
C. D.
Clinical (317)
Pathological
(236)
RNA
(134)
Radiological
(201)
80
13
41
84 31
24 10
34
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 4
Content courtesy of Springer Nature, terms of use apply. Rights reserved
performance of unimodal models with linear approaches for 6-month
progression prediction, underscoring that the performance of late
fusion combinations strongly depends on the performance of their
unimodal components.
To further compare late fusion multimodal models with unimodal
ones, we computed the marginal contribution of each modality to the
nal multimodal prediction for each patient. We focused on the best-
performing model that combined clinical, radiomic, and RNA tree
ensemble models for 1-year death prediction (Fig. 3). For several
patients, the different modalities did not inuence the multimodal
prediction in the same direction (Fig. 4A). Notably, in 26% of the cases
(20/77), the RNA modalitys contribution was discordant with the nal
multimodal prediction. Among these discordant cases, one-third (6/
20) were correctly inuenced by the RNA modality but misclassied by
the multimodal model, while two-thirds (14/20) were negatively inu-
enced by the RNA modality but correctly classied by the multimodal
model,with the radiomic and clinical modalities guiding the prediction
towards the correct outcome (clusters 1&2). Analyzing the feature
importance for the 14 patients where the multimodal prediction was
correct despite the negative RNA contribution revealed that features
from different modalities provided opposing information, balancing
each other to guide the multimodal prediction in the correct direction
(Fig. 4B). For instance, in some cases, high expression of NRAS gene
negatively inuenced the prediction, but incorporating clinical and
radiomic informationsuch as elevated serum albumin level or high
spleen metabolismhelped achieve a correct prediction. Overall, the
three fused modalities exhibited diverse behaviors, with weak corre-
lations between their unimodal predictions (Fig. 4C). Averaging their
decisions impacted the predicted outcomes for several patientsnot
just isolated casesand improved overall performance.
Benchmark of integration strategies reveals a consistent benet
of multimodal approaches
We compared the late fusion approach with early fusion (Supple-
mentary Fig. s4). The baseline early fusion approach consists of con-
catenating the features from the different modalities and using these
concatenated vectors as input to a single predictor. For binary classi-
cation tasks, we also re-implemented and tested an attention-based
fusion approach known as DyAM16, which was recently applied to
NSCLC multimodal data. Early fusion and DyAM models were trained
both without and with prior univariate feature selection to balance the
dimensions of the different modalities (see Methods). The comparison
of these different integrationstrategies forpredicting OS, 1-year death,
PFS, and 6-month progression did not identify a single best strategy
(Fig. 5).Thelatefusionoftreeensemblemodelsyieldedthebest
performance for the prediction of OS and 1-year death, while the early
fusion of tree ensemble models and the DyAM model, both with prior
univariate feature selection, outperformed the other strategies for PFS
and 6-month progression prediction, respectively. This comparison
demonstrated the potential of multimodal approaches to enhance
unimodal predictions, as for each prediction task the majority of
integration strategies resulted in multimodal combinations that out-
performed the best unimodal models. Furthermore, this comparison
highlighted modalities that were consistently involved in the best
multimodal combinations across the different integration strategies,
particularly for 1-year death and 6-month progression prediction. For
1-year death, the integration strategies that outperformed the best
unimodal modelwith their optimal combination combined clinical (5/7
strategies), RNA (7/7 strategies), and radiomic (3/7 strategies) mod-
alities. For 6-month progression, they combined clinical (7/8 strate-
gies), pathomic (7/8 strategies), and RNA (8/8 strategies) modalities.
The combination of clinical and RNA modalities also performed best
for OS prediction, while for PFS prediction, it was the combination of
clinical, pathomic, and RNA. Lastly, the superiority of multimodal
approaches was conrmed when comparing the average performance
of the different integration strategies acrossall possible combinations
of one, two, three, and four modalities (Fig. 6and Supplementary
Fig. s15). Indeed, the average performance at a xed number of mod-
alities increased with the number of integrated modalities for every
strategy and every prediction task (except for the early fusion with a
linear model and no prior feature selection). Paired sample ttests
showed that multimodal combinations (involving two, three, or four
Fig. 1 | Survival of NSCLC patients and Venn diagram summarizing the multi-
modal cohort. A OS and PFS Kaplan-Meier survival curve (solid lines) for the whole
NSCLC cohort (n=311for OS andn=316 forPFS)witha95%condence interval
(shaded areas). Patients are stratied with respect to their rst-line therapy, either
pembrolizumab alone or pembrolizumab+ chemotherapy. Log-rank p-values are
reportedto characterize the separationof the survival curves. BOS andPFS Kaplan-
Meier survival curves (solid lines) with 95% condence interval (shaded areas) and
log-rank p-values for the patients with available PD-L1 expression (n=295 forOS
and n= 300 for PFS). Patients are stratied with respect to their PD-L1 status
(positive vs negative). COS Kaplan-Meier survival curves (solid lines) with 95%
condence interval (shaded areas) and log-rank p-values for the 43 patients with
available TMB and the 174 patients with available TILs status. For the TMB, patients
are stratied with a threshold of 15 mutations per megabase (see Methods). For
TILs, patients are stratied with respect to their positive vs negative TILs status.
DOverview of the multimodal cohort with a Venn diagram. The four data mod-
alities and their intersections are represented (i.e., PET/CT images, clinical data,
pathological slides, and bulk RNA-seq proles). Source data are provided as a
Source Data le.
Table 2 | Unimodal performance for the prediction of OS, 1-year death, PFS, and 6-month progression
Target (number of patients) OS (n= 79) 1-year death (n=77) PFS(n= 80) 6-month progression (n=75)
Metric C-index AUC C-index AUC
Clinical Tree ensembles 0.67 ± 0.01* 0.59 ± 0.05 0.56 ± 0.02 0.58 ± 0.04
Linear 0.60 ± 0.02* 0.73 ± 0.02* 0.53 ± 0.03 0.61 ±0.03*
Radiomics Tree ensembles 0.61 ± 0.02* 0.62 ± 0.04 0.57 ± 0.01 0.56± 0.05
Linear 0.61 ± 0.02* 0.47 ± 0.03 0.55 ± 0.02 0.48 ± 0.04
Pathomics Tree ensembles 0.59 ± 0.02 0.54 ± 0.05 0.56 ± 0.02 0.58 ± 0.06*
Linear 0.58 ± 0.02 0.56 ± 0.03 0.51 ± 0.02 0. 61±0.03*
RNA Tree ensembles 0. 69±0. 02* 0.75 ±0.04* 0.57 ± 0.02 0.60 ± 0.04*
Linear 0.58 ± 0.02 0.65 ± 0.03 0.59±0.02* 0.61 ±0.03
*one-sided permutation p-value 0.05 (exact p-values are reported in Supplementary Table s1).
Unimodal performance ofeach data modalityfor the prediction of OS,1-year death,PFS, and 6-monthprogressionwith linear andtree ensemblealgorithms(mean ± std over the100 cross-validation
schemes). The best performances for each column are highlighted in bold. Source data are provided as a Source Data le.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 5
Content courtesy of Springer Nature, terms of use apply. Rights reserved
modalities) consistently led to performance improvements compared
to unimodal models. The performances of all the multimodal models
are detailed in the supplementary materials (Supplementary
Figs. s14, s1623).
We also explored whether the observed multimodal benet
depended on our initial selection of features within each modality. We
focused on the transcriptomic modality, which demonstrated the
highest unimodal performance (Table 2), and assessed whether our
multimodal models could outperform any transcriptomic signature,
not just the one derived from the initially selected features. For each
predictive task, we applied the same cross-validation schemes as
before and compared the predictive performance of the best multi-
modal model with 36 transcriptomic signatures previously associated
with immunotherapy in the literature (see Methods, Supplemen-
tary Table s2). The best multimodal model outperformed all tran-
scriptomic signatures, except for the prediction of 6-month
progression, where it outperformed 33 out of 36 signatures (Fig. 7).
Interestingly, for OS and 1-year death, our best unimodal modelranked
among the top two transcriptomic signatures, whereas for PFS and
6-month progression, it did not rank within the top ten.
Multimodal predictions demonstrate improved patient
stratication for OS
Kaplan-Meier analysis showed that the predictions of multimodal
models, integrating clinical data with other modalities when available,
effectively stratied patientsOS. After adjusting the log-rank p-values,
93% of all the combinations across the different prediction tasks (i.e.,
328/352) exhibited signicant differences between the survival dis-
tributions of their low-risk and high-risk groups (Fig. 8A and Supple-
mentary Fig. s24). Notably, 74% of the combinations (i.e., 260/352)
yielded a lower p-value than the binary PD-L1 status (log-rank
p-value = 0.0025, n= 265). For each model, low-risk and high-risk
A.
B.
* *
* *
*
* *
*
*
*
* *
* *
* *
* *
* *
*
*
1y-D OS
1y-D OS
Univariate
associations
Univariate
associations
Fig. 2 | Feature importance ranking for the prediction of overall survival, for
clinicaland transcriptomic modalities. Feature importance ranking was obtained
by aggregating theSHAP values collectedfrom both tasks (OSand 1-year death) and
both approaches (linear and tree ensemble methods) (see Methods). Features that
were signicantly associated with 1-year death (one-sided permutation test with
univariate AUCs) after Benjamini-Hochberg (BH) correction (α=0:05) are shown
with a * on the left side, while features that were signicantly associated with OS
(one-sided permutation test with univariate C-index) after BH correction are
annotated with a * on the right side. * corresponds to an adjusted p-value below
0.05. AConsensus feature importance ranking for the clinical data modality (left)
and heatmap of correlations between consensus clinical features (right). Correla-
tions were evaluated by Spearman correlation coefcients (for continuous feature
vs continuous feature), AUCs rescaled to ½1, 1(for continuous feature vs binary
categorical feature), or Matthews correlation coefcient (for binary categorical
feature vs binary categorical feature). BConsensus feature importance ranking for
the RNA data modality (left) and heatmap of Spearman correlations between
consensus RNA features (right). Source data are provided as a Source Data le.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 6
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1-year death (AUC)
OS (C-index)
6-months progression (AUC)
PFS (C-index)
A.
B.
Fig. 3 | Performance of all the possible multimodal combinations, with a late
fusion strategy and tree ensemble methods. The bar height corresponds to the
performance metric (either ROC AUC or C-index) averaged across the 100 cross-
validation schemes, and the error bar corresponds to ± 1 standard deviation, esti-
mated across the 100 cross-validation schemes. AROC AUCs associated with the
prediction of 1-year death with XGBoost algorithms(top) and estimated with n=77
patients. C-indexes associated with the prediction of OS with Random Survival
Forest algorithms (bottom) and estimated with n= 79 patients. BROC AUCs
associated with the prediction of 6-month progression with XGBoost algorithms
(top) and estimated with n= 75 patients. C-indexes associated with the prediction
of PFS with Random SurvivalForest algorithms (bottom) and estimated with n=80
patients. * C: clinical, R: radiomic, P:pathomic, RNA: transcriptomic. Source data are
provided as a Source Data le.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 7
Content courtesy of Springer Nature, terms of use apply. Rights reserved
group membership was dened by optimizing the cutoff on the
training set of each cross-validation fold to maximize the log-rank test
statistic and then applying that cutoff to the corresponding test set. In
the case of classication tasks, a 0.5 cutoff on the predicted prob-
abilities was also considered(see Methods). The clinical modality alone
effectively separated patients into two risk groups, with the predic-
tions of a linear model trained to predict 1-year death yielding the best
p-value (log-rank p-value = 1.26e-06, Fig. 8B). For all prediction tasks, a
range of one to seven multimodal models, out of the 56 possible
models (i.e., integration strategy + multimodal combination), demon-
strated superior risk stratication (as measured by the log-rank test
statistic) compared to the clinical models with the lowest log-rank
p-values (Fig. 8A and Supplementary Fig. s24). Specically, a combi-
nation of clinical, pathomic, and RNA modalities, trained to predict
1-year death with a tree ensemble algorithm, yielded the best p-value
(log-rank p-value = 3.51e-09, Fig. 8B). We further compared the multi-
modal score resulting from this combination (i.e., the test predictions
averaged across the different cross-validation schemes) with the
0.5 0.60.4
Multimodal predictions
(C+R+RNA)
True Negatives
False Positives
True Positives
False Negatives
Marginal
contribution
ClinicalRNA Radiomic
1
2
Feature value
Low High
Increase
predicted risk
Decrease
predicted risk
Clinical
Radiomic
Clinical
RNA
Radiomic
RNA
B.
C.
A. 1
2
Cluster
Cluster
Patients
Patients
Cluster
Cluster
Fig. 4 | Marginal contribution of each modality to the multimodal predictions
for late fusion strategy and XGBoost classiers. A Heatmap of the marginal
contribution (i.e., Shapley value) of each modality to the 1-year death prediction
using the C + R + RNA late fusion model with XGBoost classiers. Marginal con-
tributions indicate how each modality inuences the prediction relative to a ran-
dom baseline of 0.5. Patients are stratied based on the multimodal modelsnal
prediction (with a 0.5 threshold), where the positive class corresponds to those
who died within 1 year, and the negative class corresponds to those who survived.
BFor each modality and patient in clusters 1 and 2 (see A), represented by vertical
lines, this plot shows the feature with the highest SHAP value that aligns with the
modalitys marginal contribution. The size of each triangle indicates the absolute
SHAP value, while its orientation correspondsto its sign (up for positive values that
increase the predicted probability of death within 1 year and down for negative
values that decrease it). The color scale represents the associated feature value
relative to the whole patient cohort. CRelationship between the unimodal pre-
dictions from clinical, radiomic, and RNA modalities (i.e., unimodal tree ensemble
models).Each dot is colored according to the patients truelabel. *In theseplots, all
marginal contributions, SHAP values, and predictions were obtained for the 77
patientswith complete multimodal proles and available 1-year death labels across
the cross-validation test sets. They were collected for each of the 100 cross-
validation schemes (see Methods) and subsequently averaged for each patient.
Source data are provided as a Source Data le.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 8
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Late
Late
Early DyAM
Early DyAM
Best multimodal combination for different integration strategies
Best unimodal models
A.
Best multimodal combination for different integration strategies
Best unimodal models
B.
Late
Late
Early
Early
Fig. 5 | Best unimodal and multimodal performances across all the possible
combinations of modalities and predictive algorithms. The top barplot displays
the performance ofthe best multimodal combination foreach integration strategy,
while the bottom barplot shows the performance of the best unimodal algorithm
for each data modality. Bar heights and error bars correspond to the mean metric
(AUC or C-index) and± 1 standard deviation, respectively, estimated acrossthe 100
cross-validation schemes (except for the dyam_optim models for which only 10
cross-validation schemes were used, due to computational constraints). ABest
performance (AUC) for the prediction of 1-year death and 6-month progression
(n=77for1-yeardeathandn= 75 for 6-month progression). BBestperformance (C-
index) for the prediction of OS and PFS (n= 79 for OS and n= 80 for PFS). Source
data are provided as a Source Data le.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 9
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1-year death (AUC)
OS (C-index)
**
**
**
**
***
**
***
***
**
***
***
***
Fig. 6 | Average performance across all models with 1, 2, 3,and 4 modalities for
1-year death and OS. Markers and error bars correspond to the mean average
performance and ± 1 standard deviation respectively, estimated across the 100
cross-validation schemes. The box-and-whisker plots show the three quartiles and
the minimum and maximum as whiskers up to 1:IQR(2575%). Mean increases
are represented with dashed lines and bold annotations. Red annotations corre-
spond to two-sidedpaired sample ttest p-values to compare the different numbers
of integrated modalities (e.g., 1 modality vs 2 modalities), with nmodels = 8 for 1-year
death and nmodels = 6 for OS. *: 1e-2 <pval 5e-2, **: 1e-4 <pval 1e-3, ***: pval 1e-4.
Source data are provided as a Source Data le.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 10
Content courtesy of Springer Nature, terms of use apply. Rights reserved
late_XGBoost
C+ R + RNA
late_RF
C + RNA
dyam_select
C+ P + RNA
early_select_RF
C+ P + RNA
6-months progression
PFS
OS
1-year death
AUC
AUC
C-index
C-index
B.
A
.
Fig. 7 | Comparison of the performance of transcriptomic signatures with our
best transcriptomicand multimodal models. Comparison of the performance of
36 transcriptomic signatures previously associated with immunotherapy(from the
literature) against the best unimodal transcriptomic model and the best multi-
modal model from our analysis for each prediction task. The bar height corre-
sponds to the performance metric (either ROC AUC or C-index), averaged across
100 cross-validation schemes and estimated for the 80 patients with a complete
multimodal prole. The error bar indicates ± 1 standard deviation (for signatures
without a training step, this standard deviation is zero). Performance metrics were
transformed using max(x, 1-x) to account for signatures with a performance below
0.5. Blue bars represent performances below 0.5 (higher signature values are
associated withbetter prognosis), whilered bars representperformances above 0.5
(higher signature values are associated with worse prognosis). AComparison for
1-year death prediction (n= 77 patients with a complete prole and available 1-year
death label) and OS prediction (n= 79 patie nts with compl ete prole and available
OS information). BComparison for 6-month progression prediction (n=75
patients with complete prole and available 6-month progression label) and PFS
prediction (n= 80 patients with complete prole and available PFS information).
Source data are provided as a Source Data le.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved
clinical score derived from the linear model described above (Fig. 8B).
To do so, we divided the cohort into quartiles based on these two
scores and performed Kaplan-Meier analysis for OS within the rst year
of therapy (Fig. 8C). Both scores yielded a lowest quartile group (low
risk) with a 14% death rate (9/66 patients) within the rst year of
therapy. However, the multimodal score identied a highest quartile
group (high risk) with a 52% death rate (35/67 patients, 20 treated with
immunotherapy + chemotherapy and 15 treated with immunotherapy
only), whereas the clinical scoreyielded a highest quartile group with a
40% death rate (27/67 patients, 13 treated with immunotherapy +
chemotherapy and 14 treated with immunotherapy only).
Finally, the multimodal score resulting from the combination of
clinical, pathomic, and RNA modalities demonstrated a signicant
association with OS when integrated into a multivariate Cox model
along with the clinical features (Fig. 8D). Likelihood-ratio tests indi-
cated a signicant effect of this multimodal score compared to a Cox
model tted only with clinical information collected from routine care
(p-value = 1.09e-05). Five clinical variables were also signicant,
+ 29 patients
+ 48 patients
+ 65 patients
+ 27 patients
From low-risk group to high-
risk group
From high-risk group to low-
risk group
* Perceptron * Early XGBoost (prior selection)
A.
B.
Binary classifiersSurvival models
Binary classifiersSurvival models
Minimum log-rank adjusted pvalue for different prediction tasks Distribution of log-rank adjusted pvalues for different prediction tasks
+
C.
Cox - OS
Cox - OS
D.
* Perceptron
* Early XGBoost (prior selection)
pval = 1.8e-04
pval = 1.3e-07
OS
OS
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved
including sex, ecog_1, braf, errb2, and pdl1 (see Supplementary Meth-
ods). When comparing the multimodal score with the best unimodal
scores (i.e., derived from the top-performing models for 1-year death
prediction, Fig. 5A), only the multimodal score (hazard ratio (HR) =
1.35, 95% CI [1.07-1.69], p-value = 1.11e-02) and the clinical score (HR =
1.25, 95% CI [1.02-1.54], p-value = 3.17e-02) demonstrated a signicant
effect (Fig. 8D).
Discussion
Multimodal approaches for developing accurate biomarkers for the
outcome of metastatic NSCLC patients treated by immunotherapy are
highly promising but have been rarely explored so far. In this study, we
built a new multimodal NSCLC cohort to investigate the benetof
integrative strategies. We extracted interpretable features from clin-
ical data, PET/CT images, digitized pathological slides, and bulk RNA-
seq proles and compared the performance of unimodal and multi-
modal machine learning models to accurately predict patient out-
comes. We conducted an extensive exploration of different
algorithms, integration strategies, and outcome encodings (i.e., binary
vs. continuous targets) to highlight consistent trends that remain
robust regardless of the specic choices made within the analysis
pipeline.
We trained several unimodal models capable of predicting OS,
1-year death, PFS, and 6-month progression using pre-treatment clin-
ical and transcriptomic data. Our design choice to base the entire
workow on interpretable features enabled us to conduct a thorough
feature importance analysis. It revealed that clinical models integrated
previously established biomarkers into efcient multivariate pre-
dictive models7, while RNA models used signatures of the Tumor
Micro-Environment (TME)20 as well as the expression of specic
oncogenes, which were robust to the biopsy location. Notably, our
analysis highlighted the positive impact of a high abundance of den-
dritic cells (DC), as scored by MCP counter20, on patient survival. These
ndings thus provide further evidence ofthe potential of RNA-seq data
to predict immunotherapy response. They corroborate recent studies
that have demonstrated the enhanced predictive power of RNA-seq
data compared to conventional modalities used in clinical practice,
such as mutational data or immunohistochemistry21,22. Radiomics and
pathomics, used as standalone predictors, exhibited limited predictive
ability in our analysis. Radiomic models predominantly relied on the
TMTV to predict OS and 1-year death, conrming its strong predictive
value23. Nonetheless, the other aggregated features that we investi-
gated did not increase performance, highlighting the need to explore
and design additional radiomic features that could effectively com-
plement TMTV.
The importance of DCs for the prediction of patient outcomes is
in line with previous pre-clinical mechanistic studies that highlighted
their central role in shaping anti-tumor immunity24,25. In particular,
type 1 DCs (DC1) present antigens to CD8 + T cells, and their abun-
dance in the TME has been linked to increased survival and improved
response to immunotherapy in both animal models26 and human
cancer lesions27. Tertiary lymphoid structures (TLS) are ectopic for-
mations containing high densities of B cells, T cells, and dendritic cells,
at sites of persistent inammatory stimulation, including tumors28.
Given the accumulating evidence linking the presence of tumor TLS
and good prognosis in cancer patients29, we investigated a possible
association between DCs and TLS. However, visualization of H&E sec-
tions and the poorcorrelation between B cells and DCs inour data did
not support this hypothesis. The link between DCs and prognosis/
survival observed in the present study can most likely be explained by
DCsability to capture tumor antigen in the tumor lesion and present it
to T cells in draining lymph nodes.
Our study provided further evidence supporting the superiority
of multimodal over unimodal approaches to build accurate bio-
markers for the outcome of metastatic NSCLC patients treated with
immunotherapy. In all prediction tasks related to OS, 1-year death, PFS,
and 6-month progression, we identied multiple multimodal combi-
nations that outperformed the unimodalmodels, including themodels
relying on standard clinical data. Furthermore, several combinations
demonstrated enhanced patient risk stratication for OS, out-
performing the best clinical model across the whole cohort. Multi-
variate Cox models, combined with Kaplan-Meier analyses,
underscored the enhanced prognostic value provided by multimodal
predictions beyond routine clinical biomarkers. They further high-
lighted that multimodal scores could help better identify patients with
the most severe prognosis, thereby guiding tailored treatment stra-
tegies such as intensied follow-up care or considering chemotherapy
even in cases withhigh PD-L1 expression. No single integration strategy
outperformed others for all prediction tasks, but most of them effec-
tively built multimodal predictors that outperformed the best unim-
odal predictors. While we conrmed the potential of the DyAM
method16, especially with prior feature selection, we found that a much
simpler model based on late fusion frequently compared favorably to
more complex integration strategies. We assume that the robust per-
formance of the simple late fusion approach is due to its ability to
handle missing modalities. Although it is not ruled out that more
complex methods might ultimately yield better results on ideal and
large datasets,it should be considered thatsuch ideal scenarios will be
quite rare in clinical reality. Overall, our comprehensive benchmark of
multiple algorithms and integration strategies highlighted a consistent
benet of combining multiple modalities to predict immunotherapy
outcomes, irrespective of specic settings or methodologies.
Our study has several limitations. First, we dealt with a relatively
modest number of patients, many of whom had missing data, which
limited the statistical power of our analyses, particularly when testing
the association between patient survival under immunotherapy and
standard clinical biomarkers (Fig. 1C and Supplementary Fig. S3).
Besides, we did not have access to an external validation cohort to
assessthe reproducibilityand robustness of our results.Due to missing
modalities, our evaluation was conducted on a subset of 80 patients
with a complete multimodal prole to ensure a fair comparison
between multimodal combinations. Therefore, the absolute
Fig. 8 | Risk stratication and survival analysis for OS with the predicted
multimodal scores. A Comparison of the stratication of the patients into high-
risk and low-risk groupsfor OS, for different predictive tasks,with log-rank p-values
(n= 265 patients with the 4 targets available for a fair comparison). Only the
combinations, including the clinicalmodality, are compared (seeMethods). On the
left, clinical and multimodal models are compared by showing the lowest log-rank
adjusted p-values from all clinical (left) and multimodal (right) models for each
prediction task. On the right, the box-and-whisker plots show the three quartiles,
with whiskers extending up to 1:IQR (2575%) to show the range of adjusted
p-values. BKaplan-Meier survival curves (solid lines) with 95% condence interval
(shaded areas) for the high-risk and low-risk OS groups dened by PDL1-status
(left), the clinical model with the lowest log-rank p-value (middle), and the multi-
modal model with lowest log-rank p-value (right). Unlike in A, unadjusted p-values
are displayed here. CKaplan-Meier survivalcurves (solid lines) with 95% condence
interval (shaded areas) for OS within the rst year of therapy. The cohort is stra-
tied into quartiles based on either the clinical score derived from the clinical
perceptron predictions (top) or the multimodalscore derived from the predictions
of the clinical + pathomics + RNAmodel (bottom).DLog hazard ratios (points)with
95% condence intervals (error bars) and likelihood-ratio test p-values associated
with multivariate Cox models trained to predict patientsOS(n=265). Coxmodel
with the clinical + pathomics + RNA score as well as the clinical features collected in
this study (left). Cox model with the clinical + pathomics + RNA score,as well as the
best unimodal scores derivedfrom the top performing unimodal models for 1-year
death prediction (right), with the best clinical score corresponding to the clinical
perceptron identied in panels (Band C). Source data are provided as a Source
Data le.
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved
performance scores highlighted in this study should be interpreted
cautiously. Nevertheless, the relative comparison of these scores
between different combinations of modalities, as wellas in comparison
with random predictors and standard clinical biomarkers, all trained
and evaluated with the same methodology, remains valuable. It con-
rms the superiority of multimodal approaches over unimodal models
and thus may motivate the collection of new large and multi-centric
multimodal NSCLC cohorts. These multi-centric cohorts are needed to
address challenges related to missing modalities, pa rticularly RNA, and
further validate our ndings. Furthermore, our study did not include
advanced NSCLC patients who did not receive immunotherapy, which
prevented us from distinguishing between predictive and prognostic
biomarkers30. It is likely that some features highlighted in our analyses,
particularly clinical and radiomic ones, are more prognostic than
predictive, asthey have already been associated with patient outcomes
for other treatments31,32. Nonetheless, our study identied promising
features, especially transcriptomic ones, as well as several combina-
tions of features whether multimodal or notthat foster further
research to evaluate their predictive value and their association with
immunotherapy response. In addition, our data collection pipeline
involved time-consuming processes, such as the manual segmentation
and annotation of PET/CT scans by nuclear medicine physicians, which
could deter the collection of new external cohorts and limit their size.
However, deep learning methods forautomatically segmenting lesions
on PET scans33 or computing surrogate radiomic features34 have
recently shown very promising results and may soon be incorporated
into the multimodal pipeline to overcome this bottleneck. Finally,
despite being one of the most powerful modalities in our analysis, the
RNA modality is not yet routinely available in clinical practice in many
places, unlike clinical information, PET/CT images, or pathological
slides. Its collection involves additional costs and is often affected by
thelowqualityoftheremainingtissuesamplesfromthebiopsy.
However, it could be used as an initial step to identify prognostic
mechanisms, which could subsequently be assessed with more cost-
effective technologies.
Our multimodal cohort allowed us to demonstrate the ability of
clinical, radiological, pathological, and transcriptomic data to inform
powerful multimodal predictors for the outcome of patients treated
with immunotherapy in metastatic NSCLC. It provided several pro-
mising predictors that outperformed both established biomarkers
(e.g., PD-L1 expression) and unimodal predictors. They now require
renement and validation in multicentric studies. These results foster
further efforts to gather large multimodal cohorts and explore multi-
modal biomarkers that could efciently guide therapeutic decisions.
Methods
This study was approved by the institutional review board at Institut
Curie (DATA200053) and informed consent from all patients was
obtained through institutional processes. Data were de-identied,
collected, and stored in compliance with GDPR.
Clinical data
Baseline clinical data for all 317 patients were collected from the
Electronic Medical Record of Institut Curie Hospital using a pr edened
case report form based on the ESME-AMLC database.
Each patients response to immunotherapy was assessed through
OS and PFS. OS was dened as the duration from the initiation of
rst-line immunotherapy (with or without chemotherapy) to the
patients death or last available status update. PFS was dened as the
duration from the initiation of rst-line immunotherapy to the occur-
rence of the rst progression event or last available status update,
including the emergence of new lesions or the progression of pre-
existing ones. We also considered binary outcomes, specically 1-year
death (0 for patients who were still alive after one year of immu-
notherapy and 1 otherwise) and 6-month progression (0 for patients
whose disease did not progress after six months of immunotherapy
and 1 otherwise). Patients whose OS or PFS was censored before one
year or six months, respectively, were excluded from the analysis with
binary outcomes.
We selected 30 baseline clinical features for our predictive mod-
els. A detailed list of thesefeatures and their denitions are provided in
Supplementary Methods. We applied one-hot encoding to all catego-
rical features.
Radiomic data
Baseline [18 F]FDG-PET /CT scans were collected for 201 patients. Two
experienced nuclear medicine physicians delineated all tumor foci in
all PET scans using LIFEx software v.7.3 (https://www.lifexsoft.org/)35.
In addition, they annotated the location of each lesion using an ana-
tomical partition inspired by TNM staging36. For instance, ipsilateral
and contralateral lung metastases were distinguished since they are
not associated with the same TNM stage. Subsequently, all images
were resampled to a xed 2x2x2 mm3 voxel size, and the segmented
tumor regions were processed by applying a xed threshold of
2.5 standardized uptake value (SUV) units to exclude voxels with SUV
values below this threshold. The SUVmax value, the volume, and the
centroid of each resulting tumor region were extracted with the IBSI-
compliant PyRadiomicsPython package37,38. The SUVmean values from
spherical ROIs manually delineated in the healthy regions of the liver
and spleen were also extracted for each patient, without prior 2.5 SUV
thresholding (liver ROIs mean volume = 24 cm3, standard deviation
18 cm3spleen ROIs mean volume = 10 cm3,standarddeviation8cm
3).
All these extracted data were then aggregated into 30 baseline
whole-body radiomic features to capture the spread of the metastatic
disease as well as its metabolic activity. A detailed list of these features
and their denitionsisprovidedinSupplementaryMethods.Wecon-
sidered the SUVmean values of healthy ROIs inthe spleen and the liver,
the TMTV23, and the number of invaded organs visible on the PET scan,
including the lungs, sub- and supra-diaphragmatic lymph nodes, the
pleura, the liver, the bones, the adrenal gland, and a nal category for
other regions. In addition, using the centroids of the processed tumor
regions, we calculated the standardized Dmax39 the largest distance
between two lesions normalized by the body surface area - and the
quartile dispersion of the distances between each tumor regions
centroid and the global centroid. Finally, for each TNM stage, we took
into account all the tumors located in associated regions and com-
puted the TMTV as well as the mean, standard deviation, and max-
imum value of all the SUVmax values. For instance, for the T stage, we
considered the primary lung tumor as well as all the ipsilateral lung
metastases. In cases where no tumor was present in these regions, the
feature values were set to zero. Except for features associated with
SUVmax, we excluded lesions corresponding to lymphangitic spread,
diffuse pleural metastases, diffuse myocardial metastases, and diffuse
subdiaphragmatic metastases because their accurate segmentation
was questionable.
All the features associated with metabolic volume were log-
transformed (i.e., logðx+1Þ) to deal with right-skewed distributions.
Pathomic data
Baseline pathological slides stained with Hematoxylin-Erythrosine-
Saffron (HES) were collected from the FFPE biopsy blocks of 236
patients and subsequently scanned by the Experimental Pathology
Platform at Institut Curie. From each slide, we segmented all nuclei
with a custom automatic pipeline derived from Lerousseau et al.40,
trained in a weakly supervised setting with publicly available data from
TCGA as well as data sets from Institut Curie. The slides from the
current study were not used to train the segmentation pipeline.
The cell nuclei of six cell types were annotated on each slide,
including stromal, epithelial, dead, tumor, connective, and inamma-
tory cells. Theseannotations were then used in a pathomic approach41
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 14
Content courtesy of Springer Nature, terms of use apply. Rights reserved
to extract 134 relevant features, characterizing the density, the relative
proportion, or the spatial organization of these different cell types
within the scanned tissue.
Transcriptomic data
Residual FFPE biopsy specimens containing sufcient RNA were col-
lected for 134 patients and RNA sequencing was performed at the
Sequencing Core facility of Institut Curie with the Illumina TruSeq RNA
Access technology. The RNA-seq data were then processed with the
Institut Curie RNA-seq pipeline v4.0.042. The raw bulk RNA-seq read
counts were normalized with TPM (Transcripts Per Million) and log-
transformed (i.e., logðx+1Þ).
The abundance of 8 immune cells and 2 stromal cells in the Tumor
Micro-Environment was estimated using the MCP-counter method20.
In addition, log expressions of 22 oncogenes associated with lung
cancer were used as features (KRAS,NRAS,EGFR,MET,BRAF,ROS1,
ALK,ERBB2,ERBB4,FGFR1,FGFR2,FGFR3,NTRK1,NTRK2,NTRK3,LTK,
RET,RIT1,MAP2K1,DDR2,ALK,andCD274). The biopsy site was also
considered as a categorical feature and one-hot encoded, distin-
guishing between lungs, pleura, lymph nodes, bones, liver, adrenal
gland, and brain. This information was availablefor 84 patients.Finally,
we used a custom pipeline derived from Jessen et al.43 to estimate the
Tumor Mutational Burden (TMB) from the RNA-seq reads mapped to
the reference genome hg19 with STAR aligner (1-pass). VarDict tool44
was used for variant calling and several lters were applied to detect
somatic variants. This feature, named TMB_RNA, was available for 110
patients.
Genomic data
TMB was estimated for 43 patients using a custom NGS panel of 571
genes called DRAGON (Detection of Relevant Alterations in Genes
involved in Oncogenetics by NGS) and marketed by Agilent under the
name of SureSelect CD Curie CGP. Only non-synonymous alterations
(excluding splice site) were considered, with a threshold of 15 muta-
tions per megabase used todistinguish between patientswith high and
low TMB.
Unimodal analyses - Tree ensemble methods
The Extreme Gradient-Boosted Trees algorithm19 implemented in
XGBoost v.1.7.6 Python package was used to solve 1-year death and
6-month progression classication tasks. We used default parameter
values except for scale_pos_weight which controls the balance between
positive and negative weights and was set to the proportion of nega-
tive over positive labels estimated with the training set. Each XGBoost
classier was calibrated with Plattslogisticmodel
45: predictions were
collected with a 10-fold stratied cross-validation scheme on the
training set and then used as input for a univariate logistic regression
model with balanced class weights. To generate the nal calibrated
predictions, this logistic model was applied to the raw predictions of
the XGBoost classier.
The Random Survival Forest algorithm18 implemented in scikit-
survival v.0.21.0 Python package was used to solve survival tasks for OS
and PFS. We used default parameter values except for max_depth
which controls the size of the survival trees and was set to 6 to mitigate
the risk of overtting. Contrary to XGBoost, Random Survival Forest
algorithm does not handle missing values automatically. Therefore, we
applied median imputation for continuous features and most-frequent
imputation for categorical features, both tted to the training set.
Unimodal analyses - linear methods
The Logistic Regression algorithm with elastic net penalty imple-
mented in scikit-learn v.1.2.2 Python package was used to solve 1-year
death and 6-month progression classication tasks. We used the saga
optimizer, a regularization parameter C= 0.1, an L1 ratio of 0.5, a
maximum number of iterations of 2500, and balanced class weights.
The Coxs proportional hazards algorithm with elastic net penalty
implemented in scikit-survival v.0.21.0 Python package was used to
solve survival tasks for OS and PFS. We used default parameter values
except for alpha_min_ratio which controls the regularization strength
and was set to 0.01.
In both algorithms, we preprocessed the data by rst applying
robust scaling, followedby median imputation for continuous features
and the most frequent imputation for categorical features. All these
operations were tted to the training set.
Multimodal analyses - late fusion
We used a late fusion strategy to combine every possible subset of
modalities. This analysis was limited to fusions of the same predictive
algorithms with the same parameter values. For instance, for classi-
cation tasks, we separately explored the fusion of penalized logistic
regression models and the fusion of XGBoost models.
First, we restricted the training set to patients with at least one of
the modalities of the combination of interest available. Then, we
independently trained each unimodal model using the subset of
patients in the training set for whom the associated modality was
available. Finally, for each patient in the test set, we computed the
multimodal prediction by averaging the unimodal predictions for the
available modalities. For survival models, the unimodal predictions
were standardized before averaging, using the mean and standard
deviation values estimated for each modality based on predictions
obtained from a 10-fold cross-validation scheme applied to the train-
ing set (stratied with respect to the censorship rate). XGBoost late
fusion models were also calibrated using Platts logistic model45,fol-
lowing the strategy described previously.
Multimodal analyses - early fusion
We used an early fusion strategy to combine every possible subset of
data modalities and compared the results with those obtained with the
late fusion strategy. The training set was once againlimited to patients
with at least one available modality. We rst pre-processed each data
modality separately, considering the subset of patients within the
training set for whom that modality was available. Subsequently, we
concatenated the processed features from all the modalities to form
the input for the predictive model. We used the same models and the
same parameter values as in the unimodal analyses (i.e., XGBoost,
Logistic Regression, Random Survival Forest, and Cox model).
XGBoost early fusion models were calibrated using Plattslogistic
model45.
For linear models, missing modalities were handled by replacing
them with zero values. XGBoost did not require special handling for
missing modalities, while for Random Survival Forest, a double-coding
strategy was applied, inspired by Engemman et al.46. This approach
involved duplicating features, assigning either very high values or very
low values to patients with missing modalities. This allowed the sur-
vival tree to decide on which side of the decision split to place patients
with missing modalities.
We also explored early fusion with a preliminary feature selection
step to maintain a consistent number of features across different
multimodal combinations. We rst calculated a univariate score for
each feature using m0:5
jj
,wheremcorresponds to either the AUC
or the C-index computed with the training set. We then ranked all the
features from all the modalities accordingly. To reduce redundancy,
we ltered out highly correlated features by iterating through the
ranked feature list from top to bottom and removing subsequent
features with a Pearson correlation exceeding ρ=0:7. Finally, for each
modality, we selected the top bntotal=nmodas cfeatures from the ltered,
ranked list that belonged to that modality, where ntotal corresponds to
the number of features to keep in the multimodal model (in this ana-
lysis, we used ntotal =40) and nmodas corresponds to the number of
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 15
Content courtesy of Springer Nature, terms of use apply. Rights reserved
modalities in the multimodal combination of interest. For unimodal
combinations, this feature selection step was ignored.
Multimodal analyses - DyAM model
We used our own implementation of the DyAM model, with PyTorch
v.2.0.1 Python package, to combine every possible subset of data
modalities and compared the results with those obtained with late
fusion and early fusion strategies. We adopted the exact same archi-
tecture as described in Vanguri et al.16, which used single-layer feed-
forward neural networks with a tanh activation function for unimodal
predictions and single-layer feedforward neural networks with a soft-
plus activation function for unimodal attention weights. Furthermore,
similar to ref. 16, we trained our models with a binary cross-entropy
loss with balanced class weights, a learning rate of 0.01, 125 training
epochs, an L2 regularization strength of 0.001, and the Adam
optimizer.
Data were pre-processed with robust scaling as well as median
imputation for continuous features and most-frequent imputation for
categorical features. For pathomic data, we also applied a Principal
Component Analysis (PCA) step with 40 components to reduce the
size of the neural networks. We also applied a preliminary feature
selection step as described previously.
We implemented a nested cross-validation scheme with inner 10-
fold stratied cross-validation and a grid-search strategy to optimize
the learning rate and the L2 regularization strength for each combi-
nation. Due to computational constraints, we limited the number of
repetitions to 10 in these cases.
Statistical analysis performance evaluation
All the predictive models were trained and tested using a 10-fold cross-
validation scheme applied to the entire cohort. The folds were strati-
ed based on class proportion for classication tasks and censorship
rate for survival tasks. Ineach fold and for each modality combination,
patients with all modalities missing were excluded from the training
set. Pre-processing operations, including missing value imputation,
scaling, or univariate feature selection, were tted to each training set
and then applied to the corresponding test set to prevent any data
leakage (Supplementary Fig. s4). This process was repeated 100 times,
with data shufing in each iteration. We used the same repeats across
all the experiments.
The performance of each model was evaluated using UnosCon-
cordance Index (C-index)47 for survival tasks and the Area Under the
ROC Curve (AUC) for classication tasks. These metrics were computed
for each cross-validation scheme, considering only the predictions in
the test sets of the 80 patients with a complete multimodal prole to
ensure a fair comparison among the different combinations (Supple-
mentary Fig. s4). Subsequently, the metrics were averaged over all 100
repetitions, and their standard deviation was calculated to measure the
variability resulting from the random partition of the data into 10 folds.
Statistical analyses - permutation tests
The signicance of the results was assessed with one-sided permuta-
tion tests, running the pipeline described above 100 times with ran-
domly shufed outcomes. Permutation tests were also used for
univariate predictors, along with 10,000-repeated bootstrap sampling
for computing 95% condence intervals.
To compare the performance of the different multimodal models
and test for statistically signicant differences, we applied a two-step
procedure. First, for each pair of combinations ði,jÞand each cross-
validation scheme s, the superiority of jover iwas assessed with a one-
sided paired permutation test48, resulting in 100 p-values ps
ij

100
s=1.
Subsequently, for each pair of combinations ði,jÞ,these100p-values
were adjusted using the Benjamini-Hochberg procedure (FDR con-
trolled at level α=0.05),and the frequencyof statistically signicant
tests across the 100 tests was computed. The performance of the
different predictive models was estimated on the subset of patients
with a complete multimodal prole. Although the paired permutation
test described in Bandos et al.48 was originally designed to compare
two AUCs, we extended it to the comparison of two C-indexes since
both metrics are C statistics and the C-index can be seen as a gen-
eralization of the AUC for censored survival data.
Survival analyses
For overall survival, we evaluated the ability of each predictive model
to stratify patients into two distinct risk groups, including the models
trained to predict PFS or 6-month progression. First, for each fold of
each cross-validation scheme, we explored a range of thresholds going
from the 30th to the 70th percentiles of the training predictions and
selected the one that minimized the log-rank p-value on the training
set. For PFS-related models, we focused on the stratication of PFS to
nd the best threshold, mimicking scenarios where OS is not available
during the training process. For classication tasks, the 0.5 threshold
was also considered. The learned threshold was then applied to the
corresponding test set, assigning patients to a low-risk group ora high-
risk group for overall survival. Risk group membership was thus col-
lected for each patient across the test sets of the cross-validation
scheme. This resulted in 100 group memberships for each patient,
corresponding to the 100 cross-validation schemes. Finally, these 100
assignments were aggregated by calculating the frequency of low-risk
and high-risk group assignments for each patient. Patients with a fre-
quency of high-risk group greater than 50% were assigned to the nal
high-risk group, while those with a frequency strictly lower than 50%
were assigned to the nal low-risk group. We compared the survival
distributions of the nal low- and high-risk groups for each predictive
model with Kaplan-Meier curves and a log-rank test. The Benjamini-
Hochberg procedure was used to control for multiple testing (FDR
controlled at the level α=0:05). We focused on the subset of multi-
modal combinations which included clinical data to work with a suf-
ciently large cohort (i.e., 265 patients with the 4 targets available for
fair comparisons). This analysis thus assessed the risk stratication
ability of multimodal models that incorporated multiple modalities
alongside clinical data, whenever they were available.
Finally, we derived a score from each multimodal predictive
model by collecting its predictions from the test sets of each cross-
validation scheme and averaging them over 100 repeats. This score
was used as input to a multivariate Cox model to predict patientsOS,
along with clinical features or unimodal scores. The Cox model was
then tted on the 265 patients with the 4 targets available (i.e., OS,
1-year death, PFS, and 6-month progression) for fair comparison
between the models. All the input variables were rst standardized to
ensure comparable hazard ratios. To address missing clinical values,
we used median imputation for continuous clinical features and the
most frequent imputation for categorical clinical features. For unim-
odal scores, missing values were replaced by 0.5 for classication
models and 0 for survival models, since all the models were calibrated
with nested cross-validation. We used lifelines v.0.27.4 Python package
to t the Cox models. Likelihood-ratio tests were computed manually
with the difference between the log-likelihood of the two compared
models and a chi-squared test.
Feature importance analysis
For each algorithm and modality, we used the permutation explainer
provided by SHAP v.0.42.1 Python package49 to compute the SHAP
values for each feature and each patient. SHAP values were computed
only when the patient was in a test set of the cross-validation scheme.
This resulted in npatient s ×nf eatures SHAP values for each cross-
validation scheme, where npatients corresponds to the number of
patients with the modality of interest available and nf eatur es corre-
sponds to the number of features extracted for this modality. All these
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 16
Content courtesy of Springer Nature, terms of use apply. Rights reserved
values were subsequently averaged across the 100 cross-validation
schemes to produce the nal set of npatients ×nf eat ures mean SHAP
values.
A positive SHAP value for a patient pand a feature fmeans that
considering the feature fin the predictive model of interest increases
the patient ps probability of death or progression for classication
tasks and the patient ps risk of death or progression for survival tasks.
Conversely, a negative SHAP value means that fdecreases the patient
ps probability or risk of death or progression.
For each data modality, we applied a three steps procedure to
combine the SHAP values from both related tasks (i.e., OS and 1-year
death, or PFS and 6-month progression) and both approaches (i.e.,
linear and tree ensemble methods) and obtain a consensus ranking of
features with respect to their importance in the predictive models for
overall survival or progression-free survival. First, we ltered out non-
robust features whose impact on the predictions was not consistent
across the four predictive models (for radiomics we did not consider
the Logistic Regression model since its AUC was lower than 0.5). To do
so, we computed, for each feature and each model, the Spearman
correlation between the SHAP values and the values of the feature of
interest and then ltered out the features whose correlation sign was
not consistent across the four models. Then we ranked the remaining
robust features for each of the four models with respect to their
absolute SHAP values averaged across all the patients. For each model
i2f1, ...,4gand each feature fwe thus obtained a rank ri
f2
f1, ...,nFg(with nFthe number of robust features) with 1 corre-
sponding to the least important feature and nFto the most important
one. Finally, we aggregated all these ranks across the four models to
obtain a consensus ranking, taking into account the performance of
each model. The consensus rank of the feature fwas dened as:
rcons
f=1
s1+s2+s3+s4X
4
i=1
sirf
ið1Þ
Where sicorresponds to the score of the model iand is equal to
maxð0, scorei0:5Þ(the score is either the AUC or the C-index). The
remaining robust features were also tested with univariate permuta-
tion tests both for OS and 1-year death (or PFS and 6-month
progression): 1000 univariate AUCs or C-indexes were generated with
permutated labels and then compared to the original AUC or C-index.
Features that remained statistically signicant after the Benjamini-
Hochberg correction (FDR controlled at the level α= 0.05) were
reported in the consensus ranking.
The consensus ranks were normalized with respect to the total
number of consensus features. Each rank was also assigned a sign that
corresponded to the sign of the Spearman correlation coefcient
between the SHAP values and the values of the associated feature. A
positive sign means that the effectof the feature on the predicted risk/
probability of the event increases with the feature value, while a
negative sign means that the effect decreases with the feature value.In
this context, the term effect is linked to the SHAP value and can be
either positive (increasing predicted risk) or negative (decreasing
predicted risk).
Benchmark of transcriptomic signatures
We identied 36 transcriptomic biomarkers associated with immu-
notherapy response from a systematic literature search and curation50,
encompassing various cancer types and immune checkpoint inhibitors
(Supplementary Table s2). They were categorized into three groups:
marker genes biomarkers that focused on a subset of genes to com-
pute an overall score for each patient (22 biomarkers), GSEA bio-
markers that applied single-sample gene set enrichment analysis
(ssGSEA) to compare sets of marker genes with non-marker genes (10
biomarkers), and deconvolution biomarkers that used deconvolution
methods to estimate the abundance of different cell populations in
each sample (e.g., CD8 T cells) and combined these estimates into a
score (4 biomarkers). We implemented all these biomarkers using
Python.
For each of the four prediction tasks (i.e., OS, PFS, 1-year death,
and 6-month progression), we evaluated the performance of all tran-
scriptomic signatures using the C-index for survival tasks and the AUC
for classication tasks. We applied the same 100 cross-validation
schemes as in previous experiments, focusing on the subset of 80
patients with a complete multimodal prole. For signatures that
included pre-processing steps, such as standardization or PCA, these
were rst trained on the training set of each fold and then applied to
the corresponding test set. We then compared their performance to
that of the best multimodal model and the best transcriptomic model
previously obtained for each task.
Reporting summary
Further information on research design is available in the Nature
Portfolio Reporting Summary linked to this article.
Data availability
The raw data generated in this study, including PET/CT scans, digitized
pathological slides, and RNA-seq proles, are not publiclyavailable due
to patient privacy requirements. Curated clinical data are available
under restricted access due to patient privacy requirements, access
can be obtained upon request to Emmanuel Barillot and NicolasGirard.
Derived transcriptomic, radiomic, and pathomic features, as well as
clinical outcomes (OS, PFS, and best observed RECIST response), are
available at https://doi.org/10.5281/zenodo.14293431. The results from
the experiments performed in this study are provided as a Source Data
le. Source data are provided in this paper.
Code availability
We have made all our codes available in GitHub repositories with
associated documentation allowing for the reproduction of our mul-
timodal analyses with external data, additional modalities, or different
features (https://github.com/sysbio-curie/multipit,https://github.
com/sysbio-curie/deep-multipit). The Python code to compute the
36 transcriptomic signatures and reproduce the benchmarks pre-
sented in Fig. 7is also available on GitHub (https://github.com/sysbio-
curie/tipit_benchmark_RNA). The Python code to reproduce the g-
ures is provided in the Source Data le.
References
1. Hendriks, L. E. et al. Non-oncogene-addicted metastatic non-small-
cell lung cancer: ESMO Clinical Practice Guideline for diagnosis,
treatment and follow-up. Ann. Oncol. 34,358376 (2023).
2. Reck,M.etal.Updatedanalysis of KEYNOTE-024: Pembrolizumab
versus platinum-based chemotherapy for advanced non-small-cell
lung cancer with PD-L1 tumor proportion score of 50% or greater. J.
Clin. Oncol. 37,537546 (2019).
3. Gandhi, L. et al. Pembrolizumab plus chemotherapy in metastatic
non-small-cell lung cancer. N. Engl. J. Med. 378,20782092 (2018).
4. Paz-Ares, L. et al. A randomized, placebo-controlled trial of pem-
brolizumab plus chemotherapy in patients with metastatic squa-
mous NSCLC: Protocol-specied nal analysis of KEYNOTE-407. J.
Thorac. Oncol. 15,16571669 (2020).
5. Herbst, R. S. et al. Atezolizumab for rst-line treatment of PD-L1-
selected patients with NSCLC. N. Engl. J. Med. 383,13281339
(2020).
6. Hellmann, M. D. et al. Nivolumab plus Ipilimumab in advanced non-
small-cell lung cancer. N. Engl. J. Med. 381,20202031 (2019).
7. Joshi, I. et al. Impact of baseline clinical biomarkers on treatment
outcomes in patients with advanced NSCLC receiving rst-line
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 17
Content courtesy of Springer Nature, terms of use apply. Rights reserved
pembrolizumab-based therapy. Clin. Lung Cancer 23,438445
(2022).
8. Ahn, B. C. et al. Clinical decision support algorithm based on
machine learning to assess the clinical response to anti-
programmed death-1 therapy in patients with non-small-cell lung
cancer. Eur. J. Cancer 153,179189 (2021).
9. Trebeschi,S.,Drago,S.G.,Birkbak,N.J.,Kurilova,I.,Cǎlin, AM., &
Delli Pizzi, A. et al. Predicting response to cancer immunotherapy
using noninvasive radiomic biomarkers. Ann. Oncol. 30, 9981004
(2019).
10. Mu, W. et al. Radiomics of 18F-FDG PET/CT images predicts clinical
benet of advanced NSCLC patients to checkpoint blockade
immunotherapy. Eur. J. Nucl. Med. Mol. Imaging 47, 11681182
(2020).
11. Deng, J. et al. Genopathomic proling identies signatures for
immunotherapy response of lung adenocarcinoma via confounder-
aware representation learning. iScience 25,https://doi.org/10.
1016/j.isci.2022.105382 (2022).
12. Barrera, C. et al. Deep computational image analysis of immune cell
niches reveals treatment-specic outcome associations in lung
cancer. NPJ Precis. Npj Precis. Oncol. 7, 52 (2023).
13. Patil,N.S.etal.Intratumoralplasma cells predict outcomes to PD-
L1 blockade in non-small cell lung cancer. Cancer Cell. 40,
289300 (2022).
14. Ravi, A. et al. Genomic and transcriptomic analysis of checkpoint
blockade response in advanced non-small cell lung cancer. Nat.
Genet. 55,807819 (2023).
15. Zhao,Q.,Xie,R.,Lin,S.,You,X.&Weng,X.Anti-PD-1/PD-L1antibody
therapy for pretreated advanced or metastatic nonsmall cell lung
carcinomas and the correlation between PD-L1 expression and
treatment effectiveness: An update meta-analysis of randomized
clinical trials. Biomed. Res Int. 2018, 3820956 (2018).
16. Vanguri, R. S. et al. Multimodal integration of radiology, pathology
and genomics for prediction of response to PD-(L)1 blockade in
patients with non-small cell lung cancer. Nat. Cancer 3, 11511164
(2022).
17. Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based
models still outperform deep learning on typical tabular data? Adv.
Neural Inf. Process. Syst. 35,507520 (2022).
18. Ishwaran, H., Kogalur, U., Blackstone, E. & Lauer, M. Random sur-
vival forests. Ann. Appl. Stat. 2,841860 (2008).
19. Chen, T., Guestrin, C. XGBoost: A Scalable Tree Boosting System. In
Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (2016).
20. Becht, E. et al. Estimating the population abundance of tissue-
inltrating immune and stromal cell populations using gene
expression. Genome Biol. 17, 218 (2016).
21. Sun, D. et al. Classication of tumor immune microenvironment
accordingtoprogrammeddeath-ligand1expressionandimmune
inltration predicts response to immunotherapy plus chemother-
apy in advanced patients with NSCLC. J. Thorac. Oncol. 18,
869881 (2023).
22. Li, A. et al. STK11/LKB1-Decient phenotype rather than mutation
diminishes immunotherapy efcacy and represents STING/type I
interferon/CD8 + T-cell dysfunction in NSCLC. J. Thorac. Oncol. 18,
17141730 (2023).
23. Seban, R. D. et al. Baseline metabolic tumor burden on FDG PET/CT
scans predicts outcome in advanced NSCLC patients treated with
immune checkpoint inhibitors. Eur. J. Nucl. Med. Mol. Imaging 47,
11471157 (2020).
24. Broz, M. L. et al. Dissecting the tumor myeloid compartment reveals
rare activating antigen-presenting cells critical for T cell immunity.
Cancer Cell. 26,638652 (2014).
25. Sánchez-Paulete, A. R. et al. Cancer immunotherapy with immu-
nomodulatory anti-CD137 and anti-PD-1 monoclonal antibodies
requires BATF3-dependent dendritic cells. Cancer Discov. 6,
7179 (2016).
26. Salmon, H. et al. Expansion and activation of CD103(+) dendritic cell
progenitors at the tumor site enhances tumor responses to ther-
apeutic PD-L1 and BRAF inhibition. Immunity 44,924938 (2016).
27. Barry, K. C. et al. A natural killer-dendritic cell axis denes check-
point therapy-responsive tumor microenvironments. Nat. Med. 24,
11781191 (2018).
28. Goc, J. et al. Dendritic cells in tumor-associated tertiary lymphoid
structures signal a Th1 cytotoxic immune contexture and license
the positive prognostic value of inltrating CD8 + T cells. Cancer
Res. 74,705715 (2014).
29. Cabrita, R. et al. Tertiary lymphoid structures improve immu-
notherapy and survival in melanoma. Nature 577,561565 (2020).
30. Ballman, K. Biomarker: Predictive or prognostic? J. Clin. Oncol. 33,
39683971 (2015).
31. Stares, M. et al. Hypoalbuminaemia as a prognostic biomarker of
rst-line treatment resistance in metastatic non-small cell lung
cancer. Front. Nutr. 8, 734735 (2021).
32. Teramukai, S. et al. Pretreatment neutrophil count as an indepen-
dent prognostic factor in advanced non-small-cell lung cancer: an
analysis of Japan Multinational Trial Organisation LC00-03. Eur. J.
Cancer 45,19501958 (2009).
33. Park, J. et al. Automatic lung cancer segmentation in [18 F]FDGPET/
CT using a two-stage deep learning approach. Nucl. Med. Mol.
Imaging 57,8693 (2023).
34. Girum, K. B. et al. 18F-FDG P. E. T. Maximum-intensity projections
and articial intelligence: A Win-Win combination to easily measure
prognostic biomarkers in DLBCL patients. J. Nucl. Med. 63,
19251932 (2022).
35. Nioche, C. et al. LIFEx: A freeware for radiomic feature calculation in
multimodality imaging to accelerate advances in the characteriza-
tion of tumor heterogeneity. Cancer Res. 78,47864789 (2018).
36. Goldstraw, P. et al. The IASLC lung cancer staging project: Propo-
sals for revision of the TNM stage groupings in the forthcoming
(Eighth) edition of the TNM classication for lung cancer. J. Thorac.
Oncol. 11,3951 (2016).
37. Zwanenburg, A., Vallières, M. & Abdalah, M. A. et al. The Image
Biomarker Standardization Initiative: Standardized Quantitative
Radiomics for High-Throughput Image-based Phenotyping. Radi-
ology 295,328338 (2020).
38. van Griethuysen, J. J. M. et al. Computational radiomics system to
decode the radiographic phenotype. Cancer Res. 77,e104e107
(2017).
39. Cottereau,A.S.etal.Riskstratication in diffuse large B-cell lym-
phoma using lesion dissemination and metabolic tumor burden
calculated from baseline PET/CT.Ann. Oncol. 32,404411 (2021).
40. Lerousseau, M. et al. Weakly supervised multiple instance learning
histopathological tumor segmentation. Medical Image Computing
andComputerAssistedIntervention(2020).
41. Bülow, R. D., Hölscher, D. L., Costa, I. G. & Boor, P. Extending the
landscape of omics technologies by pathomics. NPJ Syst. Biol. Appl.
9, 38 (2023).
42. Servant, N., La Rosa, P. & Phupe, A. F. bioinfo-pf-curie/RNA-seq:
v4.0.0 (v4.0.0). Zenodo. https://doi.org/10.5281/zenodo.
7837455 (2023).
43. Jessen,E.,Liu,Y.,Davila,J.,Kocher,J.P.&Wang,C.Determining
mutational burden and signature using RNA-seq from tumor-only
samples. BMC Med. Genomics. 14, 65 (2021).
44. Lai, Z. et al. VarDict: a novel and versatile variant caller for next-
generation sequencing in cancer research. Nucleic Acids Res. 44,
e108 (2016).
45. Niculescu-Mizil, A. & Caruana, R. Predicting good probabilities with
supervised learning. Proceedings of the 22nd International Con-
ference on Machine Learning (2005).
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 18
Content courtesy of Springer Nature, terms of use apply. Rights reserved
46. Engemann, D. A. et al. Combining magnetoencephalography with
magnetic resonance imaging enhances learning of surrogate-
biomarkers. Elife 9, e54055 (2020).
47. Uno, H., Cai, T., Pencina, M. J., DAgostino, R. B. & Wei, L. J. On the
C-statistics for evaluating overall adequacy of risk prediction pro-
cedures with censored surviva l data. Stat. Med. 30, 11051117 (2011).
48. Bandos, A. I., Rockette, H. E. & Gur,D. A permutation test sensitive to
differences in areas for comparing ROC curves from a paired
design. Stat. Med. 24,28732893 (2005).
49. Lundberg,S.M.&Lee,S.I.Aunied approachto interpreting model
predictions. Advances in Neural Information Processing Systems.
30 (2017).
50. Kang, H. et al. A Comprehensive benchmark of transcriptomic
biomarkers for immune checkpoint blockades. Cancers 15,4094
(2023).
Acknowledgements
We would like to thank the following collaborators at Institut Curie for
their valuable support in data management and processing: A. Nicolas,
R. Goudefroye, C. Martinat, M. Bouvet, and A. Vincent-Salomon from the
experimental pathology platform, L. Chanas, and M. Milder from Institut
CuriesDataOfce, T. Ramtohul, and H. Brisse from the Department of
Radiology, S. Baulande from the Next-Generation Sequencing platform,
I. Bièche, and C. Callens from the Diagnostic and Theranostic Medicine
Division, C. Reyes, A. Rapinat, and D. Gentien from the Genomics plat-
form, Eugénie Genestant from the Computational Systems Biology of
Cancer team, and C. Kamoun, N. Servant, and P. Hupé from the bioin-
formatics core facility. We also thank M. Lefevre and S. Lefranc from
Institut Mutualiste Montsouris. This work was part of the TIPIT project
(Towards an Integrative Approach for Precision ImmunoTherapy) funded
by Fondation ARC call «SIGNIT 2020Signatures in Immunotherapy».
The present study was also supported by the French government under
the management of Agence Nationale de la Recherche as part of the
Investissements davenirprogram, reference ANR-19-P3IA-0001
(PRAIRIE 3IA Institute).
Author contributions
N.C. processed the collected data, performed the analyses, imple-
mented the computational tools, and wrote the manuscript. M.Le.
designed and developed the feature extraction pipeline for pathological
data and provided input on machine learning analysis. F.O. supervised
the collection of PET scans and clinical data and provided input on
radiomic and machine learning analysis. N.H.-B. processed segmented
PET scans and provided input on radiomic analysis. M.Lu. and E.W.
segmented and annotated PET scans and provided input on radiomic
analysis. S.L. managed data collection and patient recruitment. P.S.F.
collected and curated clinical data. C.L. estimated the TMB from the
RNA-seq data and provided input on omics analysis. C.B. collected
pathological slides and provided input on pathological analysis. A.Z.
provided input on omics and machine learning analysis. H.S. analyzed
pathological slides and provided input on the biological interpretation of
predictive models. T.W. supervised the collection of pathological slides,
pathological analysis, and machine learning analysis. I.B. supervised the
collection of PET scans, radiomic analysis, and machine learning ana-
lysis. N.G. supervised data collection, patient recruitment, and data
analysis. E.B. supervised the collection of omics data, omics analysis,
and machine learning analysis. F.O., A.Z., T.W., I.B., N.G., and E.B.
designed the study. N.G. and E.B. led the project. M.Le., F.O., C.L., H.S.,
T.W., I.B., N.G., and E.B. revised and edited the manuscript. All authors
approved the manuscript.
Competing interests
Nicolas Girard has a consulting or advisory role for the following com-
panies: Abbvie, AMGEN, AstraZeneca, BeiGene, Bristol-Myers Squibb,
Daiichi Sankyo/Astra Zeneca, Gilead Sciences, Ipsen, Janssen, LEO
Pharma,Lilly,MSD,Novartis,Pzer, Roche, Sano, Takeda. The other
authors declare no competing interests.
Additional information
Supplementary information The online version contains
supplementary material available at
https://doi.org/10.1038/s41467-025-55847-5.
Correspondence and requests for materials should be addressed to
Nicolas Captier or Emmanuel Barillot.
Peer review information Nature Communications thanks Sanguk Kim
and the other anonymous reviewer(s) for their contribution to the peer
review of this work. A peer review le is available.
Reprints and permissions information is available at
http://www.nature.com/reprints
Publishers note Springer Nature remains neutral with regard to jur-
isdictional claims in published maps and institutional afliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as
long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indicate if
changes were made. The images or other third party material in this
article are included in the articles Creative Commons licence, unless
indicated otherwise in a credit line to the material. If material is not
included in the articles Creative Commons licence and your intended
use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright
holder. To view a copy of this licence, visit http://creativecommons.org/
licenses/by/4.0/.
© The Author(s) 2025
1
Laboratoire dImagerie Translationnelle en Oncologie, Institut Curie, Inserm U1288, PSL Research University, Orsay, France.
2
Bioinformatics and computa-
tional systems biology of cancer, Institut Curie, Inserm U900, PSL Research University, Paris, France.
3
CBIO-center for Computational Biology, MINES
ParisTech, PSL Research University, Paris, France.
4
Department of medical imaging, Institut Curie, Paris, France.
5
Department of Nuclear Medicine/PET-scan,
Institut Jules Bordet, Université Libre de Bruxelles, Brussels, Belgium.
6
Institut du Thorax Curie-Montsouris, Institut Curie, Paris, France.
7
Department of
pathology, Institut Curie, Paris, France.
8
In silico R&D, Evotec, Toulouse, France.
9
Immunity and cancer, Institut Curie, Inserm U932, PSL Research University,
Paris, France.
10
These authors contributed equally: Thomas Walter, Irène Buvat, Nicolas Girard, Emmanuel Barillot.
e-mail: nicolas.captier@polytechnique.org;emmanuel.barillot@curie.fr
Article https://doi.org/10.1038/s41467-025-55847-5
Nature Communications | (2025) 16:614 19
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
Article
Full-text available
Bronchoalveolar lavage fluid (BALF) is a liquid sample that reflects the biological status of lung tissues, containing a wealth of components such as cells and proteins. These components provide a non-invasive method to obtain pathological information about the lungs, serving as a powerful complement to traditional lung biopsies. However, the similarity in morphology and function of cells in BALF, combined with the diversity of sample processing and analysis methods, can lead to confusion in recognizing and distinguishing these cellular features. This study presents an improved Yolov10 method for the detection and classification of BALF cells, specifically targeting macrophages, lymphocytes, neutrophils, and eosinophils. The backbone network incorporates the PLWA module in place of the PSA module to enhance the acquisition of useful information, and the C2f-DC module replaces the C2f module to improve image feature extraction capabilities. Furthermore, the head network employs the Cross-Attention Fusion module (CAP) to enhance the retrieval of image information. Experimental results demonstrate that the model achieves a mean Average Precision (mAP) of 86.5% and a recall rate of 79.1%, confirming the model’s effectiveness.
Article
Full-text available
Immune checkpoint blockades (ICBs) have revolutionized cancer therapy by inducing durable clinical responses, but only a small percentage of patients can benefit from ICB treatments. Many studies have established various biomarkers to predict ICB responses. However, different biomarkers were found with diverse performances in practice, and a timely and unbiased assessment has yet to be conducted due to the complexity of ICB-related studies and trials. In this study, we manually curated 29 published datasets with matched transcriptome and clinical data from more than 1400 patients, and uniformly preprocessed these datasets for further analyses. In addition, we collected 39 sets of transcriptomic biomarkers, and based on the nature of the corresponding computational methods, we categorized them into the gene-set-like group (with the self-contained design and the competitive design, respectively) and the deconvolution-like group. Next, we investigated the correlations and patterns of these biomarkers and utilized a standardized workflow to systematically evaluate their performance in predicting ICB responses and survival statuses across different datasets, cancer types, antibodies, biopsy times, and combinatory treatments. In our benchmark, most biomarkers showed poor performance in terms of stability and robustness across different datasets. Two scores (TIDE and CYT) had a competitive performance for ICB response prediction, and two others (PASS-ON and EIGS_ssGSEA) showed the best association with clinical outcome. Finally, we developed ICB-Portal to host the datasets, biomarkers, and benchmark results and to implement the computational methods for researchers to test their custom biomarkers. Our work provided valuable resources and a one-stop solution to facilitate ICB-related research.
Article
Full-text available
The tumor immune composition influences prognosis and treatment sensitivity in lung cancer. The presence of effective adaptive immune responses is associated with increased clinical benefit after immune checkpoint blockers. Conversely, immunotherapy resistance can occur as a consequence of local T-cell exhaustion/dysfunction and upregulation of immunosuppressive signals and regulatory cells. Consequently, merely measuring the amount of tumor-infiltrating lymphocytes (TILs) may not accurately reflect the complexity of tumor-immune interactions and T-cell functional states and may not be valuable as a treatment-specific biomarker. In this work, we investigate an immune-related biomarker (PhenoTIL) and its value in associating with treatment-specific outcomes in non-small cell lung cancer (NSCLC). PhenoTIL is a novel computational pathology approach that uses machine learning to capture spatial interplay and infer functional features of immune cell niches associated with tumor rejection and patient outcomes. PhenoTIL’s advantage is the computational characterization of the tumor immune microenvironment extracted from H&E-stained preparations. Association with clinical outcome and major non-small cell lung cancer (NSCLC) histology variants was studied in baseline tumor specimens from 1,774 lung cancer patients treated with immunotherapy and/or chemotherapy, including the clinical trial Checkmate 057 (NCT01673867).
Article
Full-text available
Anti-PD-1/PD-L1 agents have transformed the treatment landscape of advanced non-small cell lung cancer (NSCLC). To expand our understanding of the molecular features underlying response to checkpoint inhibitors in NSCLC, we describe here the first joint analysis of the Stand Up To Cancer-Mark Foundation cohort, a resource of whole exome and/or RNA sequencing from 393 patients with NSCLC treated with anti-PD-(L)1 therapy, along with matched clinical response annotation. We identify a number of associations between molecular features and outcome, including (1) favorable (for example, ATM altered) and unfavorable (for example, TERT amplified) genomic subgroups, (2) a prominent association between expression of inducible components of the immunoproteasome and response and (3) a dedifferentiated tumor-intrinsic subtype with enhanced response to checkpoint blockade. Taken together, results from this cohort demonstrate the complexity of biological determinants underlying immunotherapy outcomes and reinforce the discovery potential of integrative analysis within large, well-curated, cancer-specific cohorts.
Article
Full-text available
Immunotherapy shows durable response but only in a subset of patients and test for predictive biomarkers requires procedures in addition to routine workflow. We proposed a confounder-aware representation learning-based system, genopathomic biomarker for immunotherapy response (PITER), that uses only diagnosis-acquired hematoxylin-eosin (H&E) stained pathological slides by leveraging histopathological and genetic characteristics to identify candidates for immunotherapy. PITER was generated and tested with three datasets containing 1944 slides of 1239 patients. PITER was found to be a useful biomarker to identify lung adenocarcinoma patients with both favorable progression-free and overall survival in the immunotherapy cohort (p<0.05). PITER was significantly associated with pathways involved in active cell division and a more immune activating microenvironment, which indicated the biological basis in identifying patients with favorable outcome of immunotherapy. Thus, PITER may be a potential biomarker to identify patients of lung adenocarcinoma with a good response to immunotherapy, and potentially provide precise treatment.
Article
Full-text available
Immunotherapy is used to treat almost all patients with advanced non-small cell lung cancer (NSCLC); however, identifying robust predictive biomarkers remains challenging. Here we show the predictive capacity of integrating medical imaging, histopathologic and genomic features to predict immunotherapy response using a cohort of 247 patients with advanced NSCLC with multimodal baseline data obtained during diagnostic clinical workup, including computed tomography scan images, digitized programmed death ligand-1 immunohistochemistry slides and known outcomes to immunotherapy. Using domain expert annotations, we developed a computational workflow to extract patient-level features and used a machine-learning approach to integrate multimodal features into a risk prediction model. Our multimodal model (area under the curve (AUC) = 0.80, 95% confidence interval (CI) 0.74–0.86) outperformed unimodal measures, including tumor mutational burden (AUC = 0.61, 95% CI 0.52–0.70) and programmed death ligand-1 immunohistochemistry score (AUC = 0.73, 95% CI 0.65–0.81). Our study therefore provides a quantitative rationale for using multimodal features to improve prediction of immunotherapy response in patients with NSCLC using expert-guided machine learning.
Article
Full-text available
Total metabolic tumor volume (TMTV) and tumor dissemination (Dmax) calculated from baseline 18F-FDG PET/CT images are prognostic biomarkers in diffuse large B-cell lymphoma (DLBCL) patients. Yet, their automated calculation remains challenging. The purpose of this study was to investigate whether TMTV and Dmax features could be replaced by surrogate features automatically calculated using an artificial intelligence (AI) algorithm from only 2 maximum-intensity projections (MIPs) of the whole-body 18F-FDG PET images. Methods: Two cohorts of DLBCL patients from the REMARC (NCT01122472) and LNH073B (NCT00498043) trials were retrospectively analyzed. Experts delineated lymphoma lesions from the baseline whole-body 18F-FDG PET/CT images, from which TMTV and Dmax were measured. Coronal and sagittal MIP images and associated 2-dimensional reference lesion masks were calculated. An AI algorithm was trained on the REMARC MIP data to segment lymphoma regions. The AI algorithm was then used to estimate surrogate TMTV (sTMTV) and surrogate Dmax (sDmax) on both datasets. The ability of the original and surrogate TMTV and Dmax to stratify patients was compared. Results: Three hundred eighty-two patients (mean age ± SD, 62.1 y ± 13.4 y; 207 men) were evaluated. sTMTV was highly correlated with TMTV for REMARC and LNH073B datasets (Spearman r = 0.878 and 0.752, respectively), and so were sDmax and Dmax (r = 0.709 and 0.714, respectively). The hazard ratios for progression free survival of volume and MIP-based features derived using AI were similar, for example, TMTV: 11.24 (95% CI: 2.10-46.20), sTMTV: 11.81 (95% CI: 3.29-31.77), and Dmax: 9.0 (95% CI: 2.53-23.63), sDmax: 12.49 (95% CI: 3.42-34.50). Conclusion: Surrogate TMTV and Dmax calculated from only 2 PET MIP images are prognostic biomarkers in DLBCL patients and can be automatically estimated using an AI algorithm.
Article
Background: Conflicting findings have been reported regarding the association between STK11/LKB1 mutations and immune checkpoint inhibitor (ICB) efficacy in NSCLC. It has been demonstrated that tumors could exhibit impaired STK11/LKB1 function even without STK11 mutations. We hypothesized that STK11 deficient phenotype rather than mutation may better stratify ICB outcomes. Methods: Selected functional STK11 events and LKB1 protein data were leveraged to establish a transcriptomics-based classifier of STK11 phenotype (STK11 -deficient or -proficient). We analyzed in-house and Genentech/Roche's data of three randomized trials of PD-(L)1 inhibition in NSCLC (ORIENT-11, N=171; OAK, N=699; POPLAR, N=192) and TCGA-NSCLC cohort. Results: Tissue STK11 mutation did not affect ICB outcomes. However, the survival benefit of ICB versus chemotherapy lost or reversed in STK11-deficient tumors (HR for death, 95% CI: OAK [0.97, 0.69-1.35]; POPLAR [1.61, 0.88-2.97]; ORIENT-11 [1.07, 0.50-2.29]), while remaining in STK11-proficient tumors (HR for death, 95% CI: OAK [0.81, 0.66-0.99]; POPLAR [0.66, 0.46-0.95]; ORIENT-11 [0.59, 0.37-0.92]). In tumors differentially classified by phenotype and mutation status, STK11-WT/deficient tumors had significantly worse ICB outcomes than STK11-MUT/proficient tumors. The deleterious impact of STK11 deficiency was independent of STK11/KRAS/KEAP1 status or PD-L1 expression. The STING/IFN-I signaling, which was previously shown to be suppressed in STK11-MUT models, was perturbed in patients with STK11-deficient tumors but not in those with STK11-MUT tumors. Surprisingly, while high CD8+ T cell infiltration was significantly associated with prolonged survival with ICB in STK11-proficient tumors, it predicted an opposite trend towards worse ICB outcomes in STK11-deficient tumors across three trials. This suggested an association between STK11 deficiency and CD8+ T cell dysfunction which might not be reversed by PD-(L)1 blockade. Conclusions: STK11 phenotype rather than mutation status can accurately distinguish ICB-refractory NSCLC patients and reflect immune suppression. It can help refine stratification algorithm for future clinical research and also provide a reliable resource aiding basic and translational studies in identifying therapeutic targets.
Article
Background: According to mechanisms of adaptive immune resistance, tumor immune microenvironment (TIME) is classified into four types: PD-L1-/TIL- (type I); PD-L1+/TIL+ (type II); PD-L1-/TIL+ (type III); and PD-L1+/TIL- (type IV). However, the relationship of TIME classification model and immunotherapy efficacy has not been validated by any large-scale randomized controlled clinical trial among patients with advanced non-small cell lung cancer (NSCLC). Methods: Based on RNA-sequencing and immunohistochemistry data from ORIENT-11 study, we optimized TIME classification model and evaluated its predictive value for efficacy of immunotherapy plus chemotherapy. Results: PD-L1 mRNA expression and immune score calculated by the ESTIMATE method were strongest predictors for efficacy of immunotherapy plus chemotherapy. Therefore, they were determined as optimized definition of TIME classification system. When compared between combination therapy and chemotherapy alone, only type II subpopulation with high immune score and high PD-L1 mRNA expression was significantly associated with improved PFS (HR, 0.12; 95% CI, 0.06-0.25; P<.001) and OS (HR, 0.27; 95% CI, 0.13-0.55; P<.001). In combination group, type II subpopulation had much longer survival time, not even reaching median PFS or OS, but the other three subpopulations were prone to have similar PFS. In chemotherapy group, there was no significant association between survival outcomes and TIME subtypes. Conclusion: Only patients with both high PD-L1 expression and high immune infiltration could benefit from chemotherapy plus immunotherapy in first-line treatment of advanced NSCLC. For patients lacking either PD-L1 expression or immune infiltration, chemotherapy alone might be a better treatment option to avoid unnecessary toxicities and financial burdens.