Nicolas Captier’s research while affiliated with Mines Paris, PSL University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (7)


Survival of NSCLC patients and Venn diagram summarizing the multimodal cohort
A OS and PFS Kaplan-Meier survival curve (solid lines) for the whole NSCLC cohort (n = 311 for OS and n = 316 for PFS) with a 95% confidence interval (shaded areas). Patients are stratified with respect to their first-line therapy, either pembrolizumab alone or pembrolizumab + chemotherapy. Log-rank p-values are reported to characterize the separation of the survival curves. B OS and PFS Kaplan-Meier survival curves (solid lines) with 95% confidence interval (shaded areas) and log-rank p-values for the patients with available PD-L1 expression (n = 295 for OS and n = 300 for PFS). Patients are stratified with respect to their PD-L1 status (positive vs negative). C OS Kaplan-Meier survival curves (solid lines) with 95% confidence interval (shaded areas) and log-rank p-values for the 43 patients with available TMB and the 174 patients with available TILs status. For the TMB, patients are stratified with a threshold of 15 mutations per megabase (see Methods). For TILs, patients are stratified with respect to their positive vs negative TILs status. D Overview of the multimodal cohort with a Venn diagram. The four data modalities and their intersections are represented (i.e., PET/CT images, clinical data, pathological slides, and bulk RNA-seq profiles). Source data are provided as a Source Data file.
Feature importance ranking for the prediction of overall survival, for clinical and transcriptomic modalities
Feature importance ranking was obtained by aggregating the SHAP values collected from both tasks (OS and 1-year death) and both approaches (linear and tree ensemble methods) (see Methods). Features that were significantly associated with 1-year death (one-sided permutation test with univariate AUCs) after Benjamini-Hochberg (BH) correction (α=0.05\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha=0.05$$\end{document}) are shown with a * on the left side, while features that were significantly associated with OS (one-sided permutation test with univariate C-index) after BH correction are annotated with a * on the right side. * corresponds to an adjusted p-value below 0.05. A Consensus feature importance ranking for the clinical data modality (left) and heatmap of correlations between consensus clinical features (right). Correlations were evaluated by Spearman correlation coefficients (for continuous feature vs continuous feature), AUCs rescaled to [−1,1]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-1,\,1]\,$$\end{document}(for continuous feature vs binary categorical feature), or Matthews correlation coefficient (for binary categorical feature vs binary categorical feature). B Consensus feature importance ranking for the RNA data modality (left) and heatmap of Spearman correlations between consensus RNA features (right). Source data are provided as a Source Data file.
Performance of all the possible multimodal combinations, with a late fusion strategy and tree ensemble methods
The bar height corresponds to the performance metric (either ROC AUC or C-index) averaged across the 100 cross-validation schemes, and the error bar corresponds to ± 1 standard deviation, estimated across the 100 cross-validation schemes. A ROC AUCs associated with the prediction of 1-year death with XGBoost algorithms (top) and estimated with n = 77 patients. C-indexes associated with the prediction of OS with Random Survival Forest algorithms (bottom) and estimated with n = 79 patients. B ROC AUCs associated with the prediction of 6-month progression with XGBoost algorithms (top) and estimated with n = 75 patients. C-indexes associated with the prediction of PFS with Random Survival Forest algorithms (bottom) and estimated with n = 80 patients. * C: clinical, R: radiomic, P: pathomic, RNA: transcriptomic. Source data are provided as a Source Data file.
Marginal contribution of each modality to the multimodal predictions for late fusion strategy and XGBoost classifiers
A Heatmap of the marginal contribution (i.e., Shapley value) of each modality to the 1-year death prediction using the C + R + RNA late fusion model with XGBoost classifiers. Marginal contributions indicate how each modality influences the prediction relative to a random baseline of 0.5. Patients are stratified based on the multimodal model’s final prediction (with a 0.5 threshold), where the positive class corresponds to those who died within 1 year, and the negative class corresponds to those who survived. B For each modality and patient in clusters 1 and 2 (see A), represented by vertical lines, this plot shows the feature with the highest SHAP value that aligns with the modality’s marginal contribution. The size of each triangle indicates the absolute SHAP value, while its orientation corresponds to its sign (up for positive values that increase the predicted probability of death within 1 year and down for negative values that decrease it). The color scale represents the associated feature value relative to the whole patient cohort. C Relationship between the unimodal predictions from clinical, radiomic, and RNA modalities (i.e., unimodal tree ensemble models). Each dot is colored according to the patient’s true label. *In these plots, all marginal contributions, SHAP values, and predictions were obtained for the 77 patients with complete multimodal profiles and available 1-year death labels across the cross-validation test sets. They were collected for each of the 100 cross-validation schemes (see Methods) and subsequently averaged for each patient. Source data are provided as a Source Data file.
Best unimodal and multimodal performances across all the possible combinations of modalities and predictive algorithms
The top barplot displays the performance of the best multimodal combination for each integration strategy, while the bottom barplot shows the performance of the best unimodal algorithm for each data modality. Bar heights and error bars correspond to the mean metric (AUC or C-index) and ± 1 standard deviation, respectively, estimated across the 100 cross-validation schemes (except for the dyam_optim models for which only 10 cross-validation schemes were used, due to computational constraints). A Best performance (AUC) for the prediction of 1-year death and 6-month progression (n = 77 for 1-year death and n = 75 for 6-month progression). B Best performance (C-index) for the prediction of OS and PFS (n = 79 for OS and n = 80 for PFS). Source data are provided as a Source Data file.

+3

Integration of clinical, pathological, radiological, and transcriptomic data improves prediction for first-line immunotherapy outcome in metastatic non-small cell lung cancer
  • Article
  • Full-text available

January 2025

·

75 Reads

·

1 Citation

Nicolas Captier

·

·

·

[...]

·

Immunotherapy is improving the survival of patients with metastatic non-small cell lung cancer (NSCLC), yet reliable biomarkers are needed to identify responders prospectively and optimize patient care. In this study, we explore the benefits of multimodal approaches to predict immunotherapy outcome using multiple machine learning algorithms and integration strategies. We analyze baseline multimodal data from a cohort of 317 metastatic NSCLC patients treated with first-line immunotherapy, including positron emission tomography images, digitized pathological slides, bulk transcriptomic profiles, and clinical information. Testing multiple integration strategies, most of them yield multimodal models surpassing both the best unimodal models and established univariate biomarkers, such as PD-L1 expression. Additionally, several multimodal combinations demonstrate improved patient risk stratification compared to models built with routine clinical features only. Our study thus provides evidence of the superiority of multimodal over unimodal approaches, advocating for the collection of large multimodal NSCLC datasets to develop and validate robust and powerful immunotherapy biomarkers.

Download

Figure 1:
Figure 3:
Figure 4:
Similar performance of 8 machine learning models on 71 censored medical datasets: a case for simplicity

September 2024

·

71 Reads

In the analysis of medical data with censored outcomes, identifying the optimal machine learning pipeline is a challenging task, often requiring extensive preprocessing, feature selection, model testing, and tuning. To investigate the impact of the choice of pipeline on prediction performance, we evaluated 9 machine learning models on 71 medical datasets with censored targets. Only the decision tree model was consistently underperforming, while the other 8 models performed similarly across datasets, with little to no improvement from preprocessing optimization and hyperparameter tuning. Interestingly, more complex models did not outperform simpler ones, and reciprocally. ICARE, a straightforward model univariately learning only the sign of each feature instead of a weight, demonstrated similar performance to other models across most datasets while exhibiting lower overfitting, particularly in high-dimensional datasets. These findings suggest that using the ICARE model to build signatures between centers could improve reproducibility. Our findings also challenge the traditional approach of extensive model testing and tuning to improve performance.


Integration of clinical, pathological, radiological, and transcriptomic data improves the prediction of first-line immunotherapy outcome in metastatic non-small cell lung cancer

June 2024

·

57 Reads

The survival of patients with metastatic non-small cell lung cancer (NSCLC) has been increasing with immunotherapy, yet efficient biomarkers are still needed to optimize patient care. In this study, we explored the benefits of multimodal approaches to predict immunotherapy outcome using multiple machine learning algorithms and integration strategies. We leveraged a novel multimodal cohort of 317 metastatic NSCLC patients treated with first-line immunotherapy, collecting at baseline positron emission tomography images, digitized pathological slides, bulk transcriptomic profiles, and clinical information. Most integration strategies investigated yielded multimodal models surpassing both the best unimodal models and established univariate biomarkers, such as PD-L1 expression. Additionally, several multimodal combinations demonstrated improved patient risk stratification compared to models built with routine clinical features only. Our study thus provided new evidence of the superiority of multimodal over unimodal approaches, advocating for the collection of large multimodal NSCLC cohorts to develop and validate robust and powerful immunotherapy biomarkers.


RadShap: An Explanation Tool for Highlighting the Contributions of Multiple Regions of Interest to the Prediction of Radiomic Models

June 2024

·

47 Reads

·

1 Citation

Journal of Nuclear Medicine

Explaining the decisions made by a radiomic model is of significant interest, as it can provide valuable insights into the information learned by complex models and foster trust in well-performing ones, thereby facilitating their clinical adoption. Promising radiomic approaches that aggregate information from multiple regions within an image currently lack suitable explanation tools that could identify the regions that most significantly influence their decisions. Here we present a model- and modality-agnostic tool (RadShap, https://github.com/ncaptier/radshap), based on Shapley values, that explains the predictions of multiregion radiomic models by highlighting the contribution of each individual region. Methods: The explanation tool leverages Shapley values to distribute the aggregative radiomic model's output among all the regions of interest of an image, highlighting their individual contribution. RadShap was validated using a retrospective cohort of 130 patients with advanced non-small cell lung cancer undergoing first-line immunotherapy. Their baseline PET scans were used to build 1,000 synthetic tasks to evaluate the degree of alignment between the tool's explanations and our data generation process. RadShap's potential was then illustrated through 2 real case studies by aggregating information from all segmented tumors: the prediction of the progression-free survival of the non-small cell lung cancer patients and the classification of the histologic tumor subtype. Results: RadShap demonstrated strong alignment with the ground truth, with a median frequency of 94% for consistently explained predictions in the synthetic tasks. In both real-case studies, the aggregative models yielded superior performance to the single-lesion models (average [±SD] time-dependent area under the receiver operating characteristic curve was 0.66 ± 0.02 for the aggregative survival model vs. 0.55 ± 0.04 for the primary tumor survival model). The tool's explanations provided relevant insights into the behavior of the aggregative models, highlighting that for the classification of the histologic subtype, the aggregative model used information beyond the biopsy site to correctly classify patients who were initially misclassified by a model focusing only on the biopsied tumor. Conclusion: RadShap aligned with ground truth explanations and provided valuable insights into radiomic models' behaviors. It is implemented as a user-friendly Python package with documentation and tutorials, facilitating its smooth integration into radiomic pipelines.


Promising Candidate Prognostic Biomarkers in [ 18 F]FDG PET Images: Evaluation in Independent Cohorts of Non–Small Cell Lung Cancer Patients

March 2024

·

44 Reads

·

4 Citations

Journal of Nuclear Medicine

The normalized distances from the hot spot of radiotracer uptake (SUVmax) to the tumor centroid (NHOC) and to the tumor perimeter (NHOP) have recently been suggested as novel PET features reflecting tumor aggressiveness. These biomarkers characterizing the shift of SUVmax toward the lesion edge during tumor progression have been shown to be prognostic factors in breast and non-small cell lung cancer (NSCLC) patients. We assessed the impact of imaging parameters on NHOC and NHOP, their complementarity to conventional PET features, and their prognostic value for advanced-NSCLC patients. Methods: This retrospective study investigated baseline [18F]FDG PET scans: cohort 1 included 99 NSCLC patients with no treatment-related inclusion criteria (robustness study); cohort 2 included 244 NSCLC patients (survival analysis) treated with targeted therapy (93), immunotherapy (63), or immunochemotherapy (88). Although 98% of patients had metastases, radiomic features including SUVs were extracted from the primary tumor only. NHOCs and NHOPs were computed using 2 approaches: the normalized distance from the localization of SUVmax or SUVpeak to the tumor centroid or perimeter. Bland-Altman analyses were performed to investigate the impact of both spatial resolution (comparing PET images with and without gaussian postfiltering) and image sampling (comparing 2 voxel sizes) on feature values. The correlation of NHOCs and NHOPs with other features was studied using Spearman correlation coefficients (r). The ability of NHOCs and NHOPs to predict overall survival (OS) was estimated using the Kaplan-Meier method. Results: In cohort 1, NHOC and NHOP features were more robust to image filtering and to resampling than were SUVs. The correlations were weak between NHOCs and NHOPs (r ≤ 0.45) and between NHOCs or NHOPs and any other radiomic features (r ≤ 0.60). In cohort 2, the patients with short OS demonstrated higher NHOCs and lower NHOPs than those with long OS. NHOCs significantly distinguished 2 survival profiles in patients treated with immunotherapy (log-rank test, P < 0.01), whereas NHOPs stratified patients regarding OS in the targeted therapy (P = 0.02) and immunotherapy (P < 0.01) subcohorts. Conclusion: Our findings suggest that even in advanced NSCLC patients, NHOC and NHOP features pertaining to the primary tumor have prognostic potential. Moreover, these features appeared to be robust with respect to imaging protocol parameters and complementary to other radiomic features and are now available in LIFEx software to be independently tested by others.


Models including pathological and radiomic features vs clinical models in predicting outcome of patients with metastatic non-small cell lung cancer treated with immunotherapy.

June 2023

·

37 Reads

Journal of Clinical Oncology

e21164 Background: Overall survival of patients with metastatic non-small cell lung cancer (NSCLC) has increased with the use of anti-PD-1 immune checkpoint inhibitors. However, the duration of response remains highly variable between patients, and only 20-30% of patients are alive at 2 years. Thus, new biomarkers for predicting response to treatment and patient outcomes are still needed to guide therapeutic decision. In this study, we retrospectively investigated multimodal approaches that might improve the limited predictive power of clinical data. Methods: We studied a cohort of 317 patients with metastatic NSCLC treated with first-line immune checkpoint inhibitors alone or combined with platinum-based chemotherapy. Clinical data were collected for all patients, pathological slides (HES and PD-L1 staining) and baseline 18-FDG PET/CT scans were available in 237 and 130 patients respectively. An automatic cell type detection algorithm was applied to each pathological slide and pathomic features were extracted from the resulting annotations. After semi-automated segmentation of all tumor foci in the PET/CT scans, radiomic features were calculated for each tumor lesion and aggregated across all the lesions of each patient. Prognostic models were built using random forest and XGboost classifiers to predict patient survival at 12 months based on 1) features from single modalities (clinical, pathomic, or radiomic), 2) features from multiple modalities, where early fusion and late fusion strategies were investigated. The models were trained and tested with cross-validation and their performances were established using the area under the ROC curve (AUC) computed on the same 88 test patients for whom all the modalities were available. Results: Unimodal strategies yielded AUC of 0.62 ± 0.08 (1 std), 0.64 ± 0.07, 0.59 ± 0.08 for clinical, radiomic and pathomic features respectively. With late fusion, bimodal models consistently outperformed the clinical model, with the combination of radiomic and clinical features giving the best performance (AUC = 0.67 ± 0.07). The trimodal model outperformed all other modality combinations with an AUC of 0.69 ± 0.07; in particular, it was significantly superior to the clinical model (p-value < 0.001, paired t-test). The early fusion experiments confirmed the superiority of every bimodal approach over the clinical model. However, the trimodal model did not outperform the best bimodal model with early fusion. Validation will be performed on independent cohorts from external centers. Conclusions: Our study highlighted the potential of multimodal approaches for predicting the outcome of metastatic NSCLC patients treated with immunotherapy. Models integrating medical images and pathological slides usually collected from routine care outperformed a model trained on clinical data alone.


BIODICA: a computational environment for Independent Component Analysis

April 2022

·

200 Reads

·

9 Citations

Bioinformatics

We developed BIODICA, an integrated computational environment for application of Independent Component Analysis (ICA) to bulk and single-cell molecular profiles, interpretation of the results in terms of biological functions and correlation with metadata. The computational core is the novel Python package stabilized-ica which provides interface to several ICA algorithms, a stabilization procedure, meta-analysis and component interpretation tools. BIODICA is equipped with a user-friendly graphical user interface, allowing non-experienced users to perform the ICA-based omics data analysis. The results are provided in interactive ways, thus facilitating communication with biology experts. Availability and Implementation BIODICA is implemented in Java, Python and JavaScript. The source code is freely available on GitHub under the MIT and the GNU LGPL licenses. BIODICA is supported on all major operating systems. Url https://sysbio-curie.github.io/biodica-environment/

Citations (2)


... 12 Based on this result, the normalized distances from the hot spot of radiotracer uptake (SUVmax) to the tumor centroid (NHOC) and the tumor perimeter (NHOP) have been introduced as novel geometric [ 18 F]FDG PET/CT parameters that could reflect tumor aggressiveness. 12,13 In previous studies, both NHOC and NHOP have been found to have significant correla-tions with survival outcomes in patients with lung and breast cancers. [12][13][14] Considering the significant relationship between tumor aggressiveness and clinical outcomes in HNSCC patients treated with CCRT, 6,9,10 it is hypothesized that NHOC and NHOP of primary HNSCC tumors could have significant associations with treatment response and survival outcomes in HNSCC patients. ...

Reference:

Prognostic significance of normalized distance from maximum standardized uptake value to tumor centroid on [18F]FDG PET/CT in head and neck squamous cell carcinoma
Promising Candidate Prognostic Biomarkers in [ 18 F]FDG PET Images: Evaluation in Independent Cohorts of Non–Small Cell Lung Cancer Patients
  • Citing Article
  • March 2024

Journal of Nuclear Medicine

... We use the stabilized ICA which is a Python implementation of the Icasso algorithm. 33 It provides consistent ICs that are ordered based on the stability computed from the compactness of the estimates obtained from multiple runs of FastICA with random initializations. 33,34 The mixing matrix contains the weights of the ICs and is used as a feature matrix for clustering. ...

BIODICA: a computational environment for Independent Component Analysis

Bioinformatics