Available via license: CC BY-NC 4.0
Content may be subject to copyright.
RESEARCH ARTICLE
Machine learning-derived peripheral blood transcriptomic
biomarkers for early lung cancer diagnosis: Unveiling
tumor-immune interaction mechanisms
Xiaohua Li
1
| Xuebing Li
2
| Jiangyue Qin
3
| Lei Lei
4
| Hua Guo
1
|
Xi Zheng
5
| Xuefeng Zeng
1
1
Department of Respiratory and Critical Care Medicine, Sixth People's Hospital of Chengdu, Chengdu, Sichuan, China
2
Department of Respiratory and Critical Care Medicine, People's Hospital of Yaan, Yaan, Sichuan, China
3
Department of General Practice, West China Hospital, Sichuan University, Chengdu, Sichuan, China
4
Department of Oncology, Sixth People's Hospital of Chengdu, Chengdu, Sichuan, China
5
Department of Thoracic Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, China
Correspondence
Xiaohua Li, Department of Respiratory
and Critical Care Medicine, Sixth People's
Hospital of Chengdu, No. 16 Jianshe
South St, Chenghua District, Chengdu,
Sichuan, China.
Email: lxh_hjcd@163.com
Xuefeng Zeng, Department of Respiratory
and Critical Care Medicine, Sixth People's
Hospital of Chengdu., No. 16 Jianshe
South St, Chenghua District, Chengdu,
Sichuan, China.
Email: zxfeng1616@163.com
Funding information
National Natural Science Foundation of
China, Grant/Award Number: 81830001;
Natural Science Foundation of Sichuan
Province, Grant/Award Number:
2023NSFSC1890; Medical Science and
Technology Project of Sichuan Provincial
Health Commission, Grant/Award
Number: 21PJ153
Abstract
Lung cancer continues to be the leading cause of cancer-related mortality
worldwide. Early detection and a comprehensive understanding of tumor-
immune interactions are crucial for improving patient outcomes. This study
aimed to develop a novel biomarker panel utilizing peripheral blood transcrip-
tomics and machine learning algorithms for early lung cancer diagnosis, while
simultaneously providing insights into tumor-immune crosstalk mechanisms.
Leveraging a training cohort (GSE135304), we employed multiple machine
learning algorithms to formulate a Lung Cancer Diagnostic Score (LCDS)
based on peripheral blood transcriptomic features. The LCDS model's perfor-
mance was evaluated using the area under the receiver operating characteristic
(ROC) curve (AUC) in multiple validation cohorts (GSE42834, GSE157086,
and an in-house dataset). Peripheral blood samples were obtained from 20 lung
cancer patients and 10 healthy control subjects, representing an in-house
cohort recruited at the Sixth People's Hospital of Chengdu. We employed
advanced bioinformatics techniques to explore tumor-immune interactions
through comprehensive immune infiltration and pathway enrichment ana-
lyses. Initial screening identified 844 differentially expressed genes, which were
subsequently refined to 87 genes using the Boruta feature selection algorithm.
The random forest (RF) algorithm demonstrated the highest accuracy in con-
structing the LCDS model, yielding a mean AUC of 0.938. Lower LCDS values
were significantly associated with elevated immune scores and increased
Xiaohua Li, Xuebing Li, and Jiangyue Qin contributed equally to this work.
Received: 7 August 2024 Accepted: 30 September 2024
DOI: 10.1002/biof.2129
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any
medium, provided the original work is properly cited and is not used for commercial purposes.
© 2024 The Author(s). BioFactors published by Wiley Periodicals LLC on behalf of International Union of Biochemistry and Molecular Biology.
BioFactors. 2025;51:e2129. wileyonlinelibrary.com/journal/biof 1of14
https://doi.org/10.1002/biof.2129
CD4+and CD8+T-cell infiltration, indicative of enhanced antitumor-immune
responses. Higher LCDS scores correlated with activation of hypoxia, peroxi-
some proliferator-activated receptor (PPAR), and Toll-like receptor (TLR) sig-
naling pathways, as well as reduced DNA damage repair pathway scores. Our
study presents a novel, machine learning-derived peripheral blood transcrip-
tomic biomarker panel with potential applications in early lung cancer diagno-
sis. The LCDS model not only demonstrates high accuracy in distinguishing
lung cancer patients from healthy individuals but also offers valuable insights
into tumor-immune interactions and underlying cancer biology. This approach
may facilitate early lung cancer detection and contribute to a deeper under-
standing of the molecular and cellular mechanisms underlying tumor-immune
crosstalk. Furthermore, our findings on the relationship between LCDS and
immune infiltration patterns may have implications for future research on
therapeutic strategies targeting the immune system in lung cancer.
KEYWORDS
diagnosis, lung cancer, lung cancer diagnostic score, machine learning, tumor-immune
interaction
1|INTRODUCTION
Lung cancer is characterized by the highest mortality rate
among malignant tumors,
1
which underscores the impor-
tance of screening, early diagnosis, and timely treatment
as effective measures to reduce mortality.
2
Although
great progress has been made in treatment of lung can-
cer, the prognosis of patients with advanced lung cancer
remains poor.
3,4
Studies have shown that the median
overall survival (mOS) of stage I/II lung cancer is approx-
imately 57 months, while the mOS of stage III/IV lung
cancer patients is only 7 months.
3,4
Consequently, early
diagnosis emerges as a pivotal strategy to enhance sur-
vival rates and improve prognosis among lung cancer
patients.
5–7
Currently, the gold standard for lung cancer diagnosis
is histopathological examination, but this approach is
invasive, has poor patient compliance and cannot facili-
tate early diagnosis of lung cancer.
8
Advances in cancer
screening technology, especially the improved resolution
of low-dose computed tomography (LDCT), have led to
identification of numerous lung nodules each year.
9,10
However, the increased false-positive rate and risk of
overdiagnosis have resulted in many LDCT-screened
patients with lung nodules undergoing unnecessary sur-
gical procedures, thereby increasing physical and psycho-
logical burdens.
11
Given these limitations, LDCT is not
recommended for widespread early lung cancer screening
but should be reserved for high-risk individuals.
12
Hence,
there is an urgent need for more precise and noninvasive
prediction models applied to early-stage lung cancer,
advancing precision medicine in lung cancer management.
As an emerging predictive tool, noninvasive and con-
venient prediction models continue to play a valuable
role in early warning and auxiliary diagnosis, real-time
monitoring of therapeutic efficacy, guidance of medica-
tion and exploration of drug resistance mechanisms, and
assessment of prognosis in cases of lung cancer, among
other clinical applications.
9,13
With rapid development of
high-throughput sequencing technology, an increasing
number of novel diagnostic predictors have been intro-
duced into model variables (including genomics, micro-
biomics, immunology and imaging), improving the
accuracy and sensitivity of early disease diagnosis predic-
tion models.
14–16
Recently, several studies have demon-
strated the role of machine learning in development of
early lung cancer prediction models.
17–19
Duan et al.
20
focused on four biomarkers, such as the promoter meth-
ylation levels of the p16, RASSF1A, and FHIT genes and
the relative telomere length, applying Fisher discriminant
analysis and a BP neural network for adjuvant lung can-
cer diagnosis. Yu et al.
21
used CT imaging data combined
with machine learning methods to diagnose lung cancer
and determine pathological stages.
Peripheral blood tumor biomarkers serve as important
clinical tools for cancer screening, offering the advantages
of noninvasiveness and the ability to monitor dynamic
changes.
22,23
However, existing serum tumor markers are
mostly protein markers, including carcinoembryonic anti-
gen (CEA), neuron-specific enolase (NSE), and cytokeratin
2of14 LI ET AL.
19 fragment (CYFRA21-1),
24,25
and have limited sensitivity
and are primarily suited for adjunctive diagnostics. For
example,
26–28
CEA is a nonorgan-specific tumor-associated
antigen with an AUC of approximately 0.67–0.70 in lung
cancer diagnosis.
26,27
NSE is elevated only in 10%–20% of
small-cell lung cancer (SCLC) patients and a small propor-
tion of patients with benign lung diseases, and its diagnos-
tic sensitivity for SCLC is approximately 50%–80%.
28
Recent advances have been made in combining machine
learning with other peripheral blood biomarkers to predict
the occurrence or absence of early-stage cancer.
29,30
How-
ever, machine learning prediction models incorporating
transcriptomic markers from peripheral blood remain
unexplored in early lung cancer diagnosis.
In this study, we developed Lung Cancer Diagnostic
Score (LCDS), a predictive model based on machine learn-
ing algorithms, to enable precise diagnosis of early-stage
lung cancer predicated on transcriptomic features within
peripheral blood. Additionally, we combined multiomics
data to explore potential molecular mechanisms underly-
ing lung cancer development in conjunction with LCDS.
The findings not only facilitate early lung cancer predic-
tion but also promote precision treatment for patients with
lung cancer.
2|METHODS
2.1 |Data sources
We downloaded expression and clinical data of periph-
eral blood samples from healthy individuals and lung
cancer patients in GSE135304
31
(LC: N=303, normal:
N=284), GSE42834
32
(LC: N=16, normal: N=126),
and GSE157086 (LC: N=2, normal: N=3) from the
Gene Expression Omnibus (GEO) database. The probe
information of the expression data in GSE135304 and
GSE42834 was obtained from GPL10558. Additionally,
we collected peripheral blood samples from 10 healthy
individuals and 20 lung cancer patients from the Sixth
People's Hospital of Chengdu (in-house cohort). This
study was approved by the Ethics Committee of Sixth
People's Hospital of Chengdu (NO: 2022-Research
projects-001), and written informed consent was obtained
from all patients. The clinical baseline characteristics of
the patients in the GSE135304, GSE42834 and in-house
cohorts are detailed in Tables S1–S3.
2.2 |Collection of specimens
We recruited 10 healthy individuals and 20 lung cancer
patients and collected fresh peripheral blood from each
enrolled individual before clinical treatment. The study
was reviewed and approved by the Ethics Committee of
the Sixth People's Hospital of Chengdu and carried out in
accordance with the World Medical Association Declara-
tion of Helsinki Ethical Principles for Medical Research.
All subjects provided written informed consent.
2.3 |Generation and normalization of
RNA-sequencing data
We extracted total RNA from peripheral blood. Eligible
libraries were prepared from qualified samples with a
NEBNext
®
Ultra™RNA Library Prep Kit (New England
Biolabs, Ipswich, MA, UK) and sequenced using the Illu-
mina HiSeq 4000 platform. Paired-end reads (150 bp)
were mapped to the human reference genome (GRCh38).
Transcript abundances were summarized at the gene
level with tximport and normalized based on transcripts
per million (TPM).
2.4 |LCDS model construction
GSE135304 was used as the training cohort, and GSE42834,
GSE157086 and in-house data were used as the external
validation cohorts for this study. In the training set
GSE135304, we jointly used a univariable logistic regression
model (false discovery rate [FDR] <0.0 5) and receiver
operating characteristic (ROC) algorithm (AUC value
>0.6) to screen 844 lung cancer-related genes. To further
identify the most important contributing features as
model predictors and to avoid the overfitting phenome-
non, Boruta feature selection was applied to select the
844genesbydimensionalityreduction.
33
Finally, based
on the 87 lung cancer-related genes after Boruta feature
selection (Table S4), 97 machine learning combinatorial
algorithms were used for LCDS model construction
(Table S5).
2.5 |Relationship between LCDS and
lung cancer development
Expression data of all samples in the GSE42834 and in-
house cohorts were evaluated based on the “ssgsea”
method using the “GSVA”R package from the Molecu-
lar Signatures Database (MsigDB) for GO-BP, GO-CC,
GO-MF, KEGG, Reactome and Hallmark signaling
pathway scores.
34,35
Additionally, the xCell algorithm
was used to assess the immune cell infiltration score
for each peripheral blood sample.
36
Expression data
and survival data for patients with lung cancer were
LI ET AL.3of14
used to validate the prognostic value of the LCDS
model (Table S6).
37–41
2.6 |Statistical analysis
We used univariable logistic regression to screen for
genes associated with development of lung cancer, with a
screening criterion of FDR < 0.05.
42
ROC curves were
used to assess the sensitivity and specificity of genes asso-
ciated with the development of lung cancer (AUC > 0.6),
mainly using the “pROC”R package.
43
Additionally, the
Boruta algorithm and random forest (RF) algorithm were
used to further screen the characterized genes.
33,44
Spear-
man's correlation analysis was employed to calculate the
degree of correlation between two continuous variables,
and the correlation coefficients were expressed in terms
of Rvalues. The Kaplan–Meier (KM) method and the sur-
vival R package “survminer”were used to plot survival
curves, and further comparisons between groups were
performed by the log-rank test.
45
X-tile was applied to
group the highest performing critical values in the KM
analysis.
46
Decision curve analysis (DCA) was utilized to
evaluate the decision-making effect of the predictive
models.
47
We used the ‘ggolot2’R package to visualize
the figures in this study.
48
All statistical analyses and
graph visualization were based on R software (Version:
4.1). Two-sided pvalues that were <0.05 were considered
statistically significant.
3|RESULTS
3.1 |Construction of the LCDS model
Based on the training set GSE135304, we first used uni-
variable logistic regression to screen for genes associated
with development of lung cancer (cutoff: FDR < 0.05).
Next, ROC curve analysis was used to assess the sensitiv-
ity and accuracy of these lung cancer-associated genes for
prediction of lung cancer patients and healthy individ-
uals, and lung cancer-associated genes, with AUC values
>0.6 were identified. Details of the analysis process in
this study are shown in Figure 1. A total of 844 genes
associated with lung cancer were identified. Next, we
used Boruta feature selection to further screen the
844 lung cancer-related genes, yielding 87 lung cancer-
related genes, which were finally incorporated into the
lung cancer prediction model construction to obtain
LCDS (Figure 2A; Table S4). The correlation between
expression levels of these 87 lung cancer-related genes in
the training set GSE135304 is shown in Figure 2B.
Expression levels of the 87 lung cancer-related genes in
the peripheral blood of lung cancer patients and healthy
individuals differed to some extent (Figure 2C).
3.2 |Evaluation of the LCDS model
To validate the predictive ability of LCDS for lung cancer
patients, we calculated AUC values of 97 machine learn-
ing combinatorial algorithms based on 87 lung cancer-
related genes in 3 external validation cohorts (GSE42834,
GSE157086 and in-house) (Figure 3A,TableS6). The
mean AUC values of the 97 machine learning combinato-
rial algorithms in the 3 external validation cohorts indicate
the accuracy of each machine learning combinatorial algo-
rithm. Therefore, the accuracy of the lung cancer predic-
tion models based on expression of the 87 lung cancer-
related genes using the RF algorithm was maximized, with
a mean AUC value of 0.938 (Figure 3A). The AUC values
of the RF model were 1 and 0.985 in the training set
GSE135304 and in-house validation set, respectively
(Figure 3B,C). Next, we evaluated the performance of
LCDS in early-stage lung cancer patients. In both the
GSE135304 (training dataset) and in-house cohort (valida-
tion dataset), the LCDS demonstrated high sensitivity and
specificity for detecting stage I/II lung cancer patients ver-
sus normal controls (AUC =1.0, Sensitivity =1.0,
Specificity =1.0, Figure 3D,E). The results of DCA dem-
onstrated the net benefit of the LCDS model for predicting
lung cancer patients (Figure 3F: GSE157086; Figure 3G:
in-house). Both in the training set GSE135304 and in the
in-house validation set, LCDS showed a more statistically
significant difference in predicting the occurrence or
absence of lung cancer than a single gene among the
87 lung cancer-related genes (Figure 3H: GSE157086;
Figure 3I: in-house).
3.3 |Relationship between LCDS and
lung carcinogenesis
To further explore the possible relationship between the
LCDS and lung carcinogenesis, we analyzed the relation-
ship between the LCDS and immune cells and signaling
pathways. We found that lower LCDSs were associated
with higher immune scores and the abundance of acti-
vated immune cell infiltration, including CD4+Tcm
cells, CD8+Tcm cells, CD4+T cells, CD8+T cells,
CD4+memory T cells, and CD4+naïve T cells
(Figure 4A: in-house cohort; Figure 4B: GSE42834,
p< 0.05, R< 0). Higher LCDSs were significantly associ-
ated with macrophage infiltration (Figure 4A,B,P <0.05,
R> 0). The above results suggest that individuals in the
low LCDS group have a more suitable antitumor immune
4of14 LI ET AL.
microenvironment than individuals in the high LCDS
group. Moreover, individuals in the high LCDS group
showed stronger activity of pathways related to driving
cancer development and progression, including the hyp-
oxia pathway, peroxisome proliferator-activated recep-
tor (PPAR) pathway, and Toll-like receptor (TLR)
pathway (Figure 5A: in-house; Figure 5B: GSE42834,
p< 0.05, R> 0). In addition, individuals in the high
LCDS group exhibited a weaker capacity for cellular
response to DNA damage repair (Figure 5A: in-house;
Figure 5B: GSE42834, p< 0.05, R<0).
We further extended application of LCDS to assess
clinical prognosis in lung cancer patients. By evaluating
the effect of LCDS on OS time in 142 lung cancer patients
treated with ICIs (Ravi et al.
37
), we found that lung can-
cer patients in the low LCDS group had significantly
longer OS and progression-free survival (PFS) times than
those in the high LCDS group (Figure 5C, OS: hazard
ratio [HR] =0.56, 95% confidence interval [CI]: 0.33–
0.96, log-rank p=0.033; PFS: HR =0.62, 95% CI: 0.40–
0.97, log-rank p=0.034). Additionally, we found that
lung cancer patients (GSE41271, GSE47115, GSE73403,
GSE101929) in the low LCDS group had significantly
improved prognosis compared with those in the high
LCDS group (Figure 5C, log-rank p< 0.05, HR <1).
4|DISCUSSION
Lung cancer is a multifaceted disease characterized by
intricate regulatory mechanisms involving malignant
cells, immune cells, stromal cells, and aberrant signaling
In house Cohort
Data Collection
RNA-seq
Tumor = 20, Healthy Individuals = 10
Public Cohort
For Model Development
Training
- GSE135304 (T = 303, H = 284)
Validation
- GSE42834 (T = 16, H = 126)
- GSE157086 (T = 2, H = 3)
For Survival Analysis
- Ravi (n = 142)
- GSE41271 (n = 274)
- GSE47115 (n = 44)
- GSE73403 (n = 69)
- GSE101929 (n = 66)
Model Development
Logistic regression
and AUC >0.6
Boruta
Feature Selection
87 genes for training Integrated machine learning
•GBM
•Lasso
•Enet
•Ridge
•LDA
Further Evaluation
Low Risk Score
High Risk Score
Time
Survival
0
20
40
60
80
100
Survival Analysis
Immune Analysis Pathway Enrichment Analysis
lung cancer
peripheral blood
•Random Forest
•SVM
•plsRglm
•xgboost
•glmboost
•Stepglm
Select the best model with
the highest average AUC in
the validation set
Lung Cancer
Diagnostic Score (LCDS)
FIGURE 1 Flowchart of construction of the lung cancer diagnostic score (LCDS). The flowchart shows the overall study design and
methods used to develop and validate LCDS, including obtaining lung cancer and normal lung tissue gene expression data, feature selection
using the Boruta algorithm on 87 lung cancer-related genes, selecting the optimal random forest model, and validating the model on
independent datasets.
LI ET AL.5of14
pathways within a complex ecosystem.
49,50
Therefore, it is
challenging to accurately describe the molecular environ-
ment of the disease by using a single marker. Machine
learning offers a potent means to process massive amounts
of high-dimensional data, enabling precise diagnosis and
prediction of diseases based on data feature correlation.
51
In this study, we demonstrated the feasibility of a machine
learning predictive model (LCDS) based on gene expres-
sion data from peripheral blood for early lung cancer diag-
nosis. The LCDS model exhibited excellent performance,
with a mean AUC value of 0.938 across three validation
sets (GSE42834, GSE157086, and in-house cohort). The
observed relationship between LCDS and lung cancer
mechanisms is apparent, as evidenced by a more favorable
antitumor immune microenvironment (IME) and reduced
signaling pathway activity contributing to tumorigenesis
in the lower LCDS group (Figure 6). Furthermore, we
explored the relationship between LCDS and clinical prog-
nosis, revealing significantly improved outcomes among
lung cancer patients in the lower LCDS group.
(A)
(B) (C)
FIGURE 2 Screening and selection of 87 lung cancer-related genes. (A) Variable importance ranking by the Boruta algorithm showing
the 87 selected genes. (B) Correlation heat map of the 87 gene expression values in the GSE135304 dataset. (C) Heat map comparing
differential expression of the 87 genes in GSE135304. Red indicates upregulated genes, and blue indicates downregulated genes.
6of14 LI ET AL.
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
****
ADAM9
IFNGR1
P2RY10
UBTF
MID1IP1
OSBP
TBCCD1
CIAPIN1
SETD2
LPCAT1
SCAF4
PARL
SPOCK2
DYNC1H1
ID3
SMARCC1
STK4
HERC1
MCEMP1
PTBP1
MYDGF
DGAT2
DNAJB1
MYCBP2
ZFAND2A
ACAA2
GOT1
RHOC
TSPYL1
RHOB
IFFO2
AARS1
TRRAP
CCNK
USP5
TERF2
TP53BP1
RNF20
SPTAN1
CDK13
PFAS
SIN3A
EP400
PRPF8
SCAF8
CBLL1
CD63
POLR2A
LCDS
0.5 0.7 0.9 1.0
Training
0.6
0.985
0.96
0.95
0.96
0.965
0.96
0.97
0.97
0.975
0.965
0.98
0.96
0.965
0.95
0.97
0.92
0.95
0.93
0.925
0.955
0.93
0.905
0.905
0.915
0.905
0.91
0.91
0.91
0.91
0.91
0.91
0.91
0.91
0.91
0.91
0.91
0.94
0.9
0.895
0.875
0.89
0.915
0.9
0.905
0.9
0.9
0.905
0.9
0.905
0.9
0.9
0.895
0.875
0.9
0.9
0.865
0.885
0.865
0.88
0.88
0.88
0.88
0.88
0.88
0.88
0.88
0.88
0.88
0.88
0.88
0.865
0.88
0.86
0.86
0.86
0.86
0.86
0.86
0.86
0.86
0.86
0.86
0.855
0.85
0.905
0.9
0.885
0.885
0.88
0.815
0.795
0.85
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0.833
0.833
0.833
0.833
0.833
0.833
0.833
0.667
0.667
0.667
0.667
0.667
0.333
0.167
0.167
0
0.829
0.823
0.829
0.818
0.806
0.803
0.788
0.78
0.773
0.783
0.761
0.763
0.736
0.739
0.715
0.737
0.699
0.719
0.712
0.682
0.689
0.695
0.695
0.68
0.689
0.682
0.682
0.681
0.681
0.681
0.681
0.681
0.681
0.681
0.68
0.679
0.637
0.671
0.673
0.69
0.656
0.628
0.636
0.626
0.631
0.631
0.625
0.63
0.624
0.628
0.627
0.629
0.649
0.623
0.621
0.644
0.618
0.634
0.614
0.573
0.564
0.564
0.563
0.563
0.562
0.562
0.562
0.562
0.561
0.561
0.573
0.552
0.562
0.562
0.562
0.561
0.56
0.558
0.558
0.556
0.556
0.555
0.548
0.684
0.578
0.572
0.581
0.581
0.559
0.584
0.59
0.556
RF
NaiveBayes
plsRglm
svm
lasso + NaiveBayes
lasso + plsRglm
lasso + RF
Stepglm[backward] + plsRglm
xgboost
Stepglm[both] + plsRglm
Stepglm[backward] + NaiveBayes
Stepglm[both] + NaiveBayes
glmBoost + plsRglm
GBM
glmBoost + RF
glmBoost + svm
glmBoost
lasso + xgboost
lasso + gbm
lasso + svm
glmBoost + gbm
glmBoost + Stepglm[both]
glmBoost + Stepglm[backward]
glmBoost + Ridge
Stepglm[backw ard] + RF
glmBoost + Stepglm[forward]
glmBoost + Lasso
glmBoost + NaiveBayes
glmBoost + LDA
glmBoost + xgboost
Stepglm[both] + RF
Ridge
lasso + glmBoost
Stepglm[backward] + xgboost
Stepglm[backward] + gbm
Lasso
Stepglm[backward] + svm
Stepglm[both] + gbm
Stepglm[backward] + glmBoost
Stepglm[both] + svm
Stepglm[both] + glmBoost
Stepglm[backward] + Ridge
Stepglm[backw
Stepglm[backw
Stepglm[backw
Stepglm[backw
Stepglm[backw
Stepglm[backw
Stepglm[backw
Stepglm[backward] + Lasso
Stepglm[backw
Stepglm[backw
Stepglm[both] + Ridge
Stepglm[backw ard]
Stepglm[both] + Lasso
Stepglm[both]
Stepglm[both] + xgboost
lasso + Stepglm[forward]
lasso + LDA
lasso + Stepglm[both]
lasso + Stepglm[backward]
Stepglm[backward] + LDA
LDA
Stepglm[forward]
Stepglm[both] + LDA
Cohort
0.938
0.928
0.926
0.926
0.924
0.921
0.919
0.917
0.916
0.916
0.914
0.908
0.9
0.896
0.895
0.886
0.883
0.883
0.879
0.879
0.873
0.867
0.867
0.865
0.865
0.864
0.864
0.864
0.864
0.864
0.864
0.864
0.864
0.864
0.863
0.863
0.859
0.857
0.856
0.855
0.849
0.848
0.845
0.844
0.844
0.844
0.843
0.843
0.843
0.843
0.842
0.841
0.841
0.841
0.84
0.836
0.834
0.833
0.831
0.818
0.815
0.815
0.814
0.814
0.814
0.814
0.814
0.814
0.814
0.814
0.813
0.811
0.807
0.807
0.807
0.807
0.751
0.75
0.75
0.75
0.75
0.749
0.745
0.733
0.717
0.713
0.711
0.711
0.591
0.522
0.517
0.469
00.51
Mean AUC
0.4 0.6 0.8 1
GSE157086 GSE42834
In house
(A) (B)
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
A
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
A
(C)
0.0
0.2
0.4
0.6
0.00 0.25 0.50 0.75 1.00
Threshold Probility
Net Benefit
0.0
0.2
0.4
0.6
0.00 0.25 0.50 0.75 1.00
Threshold Probility
Net Benefit
(D) (E)
All None LCDS
(H) (I)
**
*
**
*
**
*
*
*
*
**
*
*
*
**
*
*
*
*
*
*
**
**
*
*
**
**
*
*
*
**
*
*
*
SERPINB10
SETD2
DGAT2
CD63
CCNK
TP53BP1
AARS1
GOT1
EPB41L3
FUNDC1
HES6
RBM4B
SLC1A3
CDK13
FASN
HSPH1
CIAPIN1
OSBP
PTBP1
SIN3A
CD160
RHOC
CBLL1
SMARCC1
TERF2
UBTF
USP5
DYNC1H1
MYCBP2
TM2D2
SCAF8
TBCCD1
BRF1
SCAF4
PRPF8
PORCN
HERC1
IFFO2
SDE2
IL1R2
SPOCK2
PFAS
TSPYL1
SPTAN1
P2RY10
TRRAP
MCEMP1
EP400
LCDS
0.5 0.7 0.9 1.0
In House
0.6
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
A
0.00.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
A
(F) (G)
Training In-house
Training (Stage I/II vs Healthy individuals) In-house (Stage I/II vs Healthy individuals)
Training In-house
FIGURE 3 Legend on next page.
LI ET AL.7of14
(A)
(B)
FIGURE 4 Correlations between lung cancer diagnostic score (LCDS) and immune cell infiltration scores. (A) The correlation between
LCDS and scores of immune cells was estimated by the xCell method using GSE135304. (B) The correlation between LCDS and scores of
immune cells estimated by the xCell method using the in-house cohort.
FIGURE 3 A lung cancer diagnostic score (LCDS) was developed and validated via the machine learning-based integrative procedure.
(A) Area under the receiver operating characteristic (ROC) curve (AUC) of 97 machine learning models validated on the GSE157086,
GSE42834, and in-house cohort (all validation datasets). (B) ROC curve of LCDS model on training cohort GSE135304 (Lung cancer
vs. Healthy individuals). (C) ROC curve of LCDS model on independent in-house validation cohort (Lung cancer vs. Healthy individuals).
(D) ROC curve of LCDS model on training cohort GSE135304 (Stage I/II Lung cancer vs. Healthy individuals). (E) ROC curve of LCDS
model on independent in-house validation cohort (Stage I/II Lung cancer vs. Healthy individuals). (F) Decision curve analysis (DCA) plots
showing the clinical utility of LCDS on GSE135304. (G) DCA plots showing the clinical utility of LCDS on the in-house cohort. (H) Top
87 nonzero feature coefficients from the final random forest (RF) model on the GSE135304. (I) Top 87 nonzero feature coefficients from the
final RF model on the in-house cohort.
8of14 LI ET AL.
(A)
(B)
(C)
pppp
ppp
FIGURE 5 Correlations between Lung Cancer Diagnostic Score (LCDS) and pathway activity scores. Spearman correlation coefficients
between LCDS and pathway activity scores by the single sample gene enrichment analysis (ssGSEA) method using (A) GSE135304 and
(B) in-house cohorts. (C) Kaplan–Meier plots of high vs. low LCDS groups showing association with overall survival (OS) and progression-
free survival (PFS) in lung cancer patients, as reported by Ravi et al.,
37
GSE41271, GSE47115, GSE73403, and GSE101929 (log-rank p< 0.05,
hazard ratio [HR] <1).
LI ET AL.9of14
An enhanced antitumor immune microenvironment
likely underlies the reduced risk of lung cancer in the
low LCDS group. Immune cells exert selective pressure
on tumor development through interactions with tumor
cells. Tumor immunogenicity undergoes modification by
the immune system in three successive stages: elimina-
tion, homeostasis, and escape.
52–54
The immune system,
especially cellular immunity featuring CD8+T cells with
recognized antitumor effects and CD4+T cells involved
in cancerous tissue clearance,
55
serves as a pivotal antitu-
mor mechanism. Studies have shown that the gradual
transition from immune activation to immunosuppres-
sion plays a crucial role in lung cancer progression, mani-
fested by decreased T-cell clones (CD4+T cells and
CD8+T cells) and increased regulatory cells (Tregs).
56
Additionally, proinflammatory monocyte-derived macro-
phages become significantly infiltrated in patients with
early-stage lung cancer.
57
Sinjab et al. found that CD8+
T cells and the inflammatory signature were significantly
reduced in early-stage lung cancer tissues and adjacent
normal tissues.
58
In this study, we found that individuals
in the higher LCDS group had significantly fewer CD4+
T cells and CD8+T cells and significantly more macro-
phages than those in the lower LCDS group.
Elevated pathway activity related to cancer progres-
sion may characterize individuals in the high LCDS
group at the molecular level. Hypoxia, as a potent selec-
tive stress, plays a moderate role in tumor cell invasion,
metastasis and angiogenesis.
59,60
Hypoxia impacts intra-
cellular mitochondrial gene expression
61,62
and induces
metabolic adaptations in cancer cells.
63
It also has impor-
tant effects on drug delivery, DNA repair, regulation of
drug resistance-related genes, the cell cycle and cell
death-related pathways, ultimately promoting malignant
tumorigenesis.
64–66
PPAR-γis highly expressed in non-
small cell lung cancer (NSCLC) and correlates significantly
CD4+T cell
CD8+T cell
Macrophage
hypoxia pathway
PPAR pathway
TLRs pathway
DDR pathway
Lung Cancer Peripheral Blood
LCDS
High versus Low
FIGURE 6 Proposed molecular mechanisms linking Lung Cancer Diagnostic Score (LCDS) to lung cancer development. This schematic
summarizes the potential mechanisms supported by the observed correlations between LCDS, immune infiltration, and pathway activities.
LCDS correlates positively with immunosuppressive tumor-associated macrophage (TAM) infiltration, hypoxia, peroxisome proliferator-
activated receptor (PPAR), and Toll-like receptor (TLR) pathway activity. LCDS correlates negatively with CD8+T-cell and CD4+T-cell
infiltration.
10 of 14 LI ET AL.
with tumor histological type, pathological differentiation and
clinical stage.
67–69
In tumor cells, TLRs contribute to cancer
development by promoting inflammation, cell proliferation,
cell survival and immunosuppression in various ways.
70,71
In
addition, aberrantly activatedTLRscanupregulatenuclear
factor κB(NF-κB) activity, inhibit JNK-mediated proapopto-
tic signaling, and ultimately create a tumor-friendly microen-
vironment.
71,72
Choi et al. found that high TLR expression
was associated with poor prognosis in cancer patients.
73
Abnormalities in the DNA damage repair system may also
contribute to development of tumors,
74
such as colorectal
cancer (CRC), endometrial cancer, ovarian cancer, and gas-
tric cancer.
75
In this study, we found that individuals with
higher LCDS had significantly increased activation scores in
hypoxia, PPAR, and TLR pathways, along with a signifi-
cantly lower DNA damage repair response.
However, there are some shortcomings in our study.
First, due to the lack of datasets with peripheral blood
samples from both lung cancer patients and healthy indi-
viduals, we incorporated three publicly available datasets
(GSE135304, GSE42834, and GSE157086) in our analysis.
Second, our LCDS model primarily served as a screening
tool for lung cancer patients in general and did not differ-
entiate between histological subtypes of lung cancer
(such as LUAD and LUSC). Finally, we did not explore
the mechanism underlying the capacity of LCDS for
early-stage lung cancer diagnosis.
5|CONCLUSIONS
We used machine learning to construct a diagnostic
model (LCDS) based on transcriptomic features of
peripheral blood that can be used to predict the occur-
rence of lung cancer. We also analyzed the association
between LCDS and the molecular mechanisms of lung
cancer development. LCDS has potential biomarker value
in clinical prognosis of lung cancer patients.
AUTHOR CONTRIBUTIONS
Xiaohua Li: Formal analysis; Investigation; Software;
Validation; Visualization; Writing—original draft; Fund-
ing acquisition; Project administration; Supervision;
Writing—review & editing. Xuebing Li: Data curation;
Methodology; Writing—original draft. Jiangyue Qin:
Data curation; Methodology; Conceptualization; Formal
analysis; Methodology. Lei Lei: Methodology. Hua Guo:
Software. Xuefeng Zeng: Funding acquisition; Method-
ology; Resources; Validation; Visualization; Writing—
review & editing. Xi Zheng: Writing—review & editing.
ACKNOWLEDGMENTS
We thank Dr. Jun Chen for helpful discussion.
FUNDING INFORMATION
This work was supported by Medical Science and Technol-
ogy Project of Sichuan Provincial Health Commission
(21PJ153); the National Natural Science Foundation of
China (81830001); and Natural Science Foundation of
Sichuan Province (2023NSFSC1890).
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts of interest.
DATA AVAILABILITY STATEMENT
The data used to support the findings of this study are
available in the Supporting Information.
CONSENT FOR PUBLICATION
All authors have read and approved the submitted manu-
script. No other consent for publication was required.
ORCID
Xiaohua Li https://orcid.org/0009-0003-0929-4666
REFERENCES
1. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics,
2022. CA Cancer J Clin. 2022;72(1):7–33. https://doi.org/10.
3322/caac.21708
2. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I,
Jemal A, et al. Global cancer statistics 2020: GLOBOCAN esti-
mates of incidence and mortality worldwide for 36 cancers in
185 countries. CA Cancer J Clin. 2021;71(3):209–49. https://
doi.org/10.3322/caac.21660
3. Pan J, Fang S, Tian H, Zhou C, Zhao X, Tian H, et al. lncRNA
JPX/miR-33a-5p/Twist1 axis regulates tumorigenesis and
metastasis of lung cancer by activating Wnt/β-catenin signal-
ing. Mol Cancer. 2020;19(1):9. https://doi.org/10.1186/s12943-
020-1133-9
4. Flores R, Patel P, Alpert N, Pyenson B, Taioli E. Association of
stage shift and population mortality among patients with non-
small cell lung cancer. JAMA Netw Open. 2021;4(12):e2137508.
https://doi.org/10.1001/jamanetworkopen.2021.37508
5. The National Lung Screening Trial Research Team. Reduced
lung-cancer mortality with low-dose computed tomographic
screening. N Engl J Med. 2011;365(5):395–409. https://doi.org/
10.1056/NEJMoa1102873
6. De Koning HJ, Van Der Aalst CM, De Jong PA, Scholten ET,
Nackaerts K, Heuvelmans MA, et al. Reduced lung-cancer
mortality with volume CT screening in a randomized trial. N
Engl J Med. 2020;382(6):503–13. https://doi.org/10.1056/
NEJMoa1911793
7. Kay FU, Kandathil A, Batra K, Saboo SS, Abbara S, Rajiah P.
Revisions to the tumor, node, metastasis staging of lung cancer
(8th edition): rationale, radiologic findings and clinical implica-
tions. World J Radiol. 2017;9(6):269–79. https://doi.org/10.
4329/wjr.v9.i6.269
8. Nooreldeen R, Bach H. Current and future development in
lung cancer diagnosis. Int J Mol Sci. 2021;22(16):8661. https://
doi.org/10.3390/ijms22168661
LI ET AL.11 of 14
9. Oudkerk M, Liu S, Heuvelmans MA, Walter JE, Field JK. Lung
cancer LDCT screening and mortality reduction—evidence,
pitfalls and future perspectives. Nat Rev Clin Oncol. 2021;
18(3):135–51. https://doi.org/10.1038/s41571-020-00432-6
10. Chan M, Huang W, Wang J, Liu RS, Hsiao M. Next-generation
cancer-specific hybrid Theranostic nanomaterials: MAGE-A3
NIR persistent luminescence nanoparticles conjugated to Afati-
nib for in situ suppression of lung adenocarcinoma growth and
metastasis. Adv Sci. 2020;7(9):1903741. https://doi.org/10.1002/
advs.201903741
11. Fehlmann T, Kahraman M, Ludwig N, Backes C, Galata V,
Keller V, et al. Evaluating the use of circulating MicroRNA pro-
files for lung cancer detection in symptomatic patients. JAMA
Oncol. 2020;6(5):714–23. https://doi.org/10.1001/jamaoncol.
2020.0001
12. Dickson JL, Horst C, Nair A, Tisi S, Prendecki R, Janes SM.
Hesitancy around low-dose CT screening for lung cancer. Ann
Oncol. 2022;33(1):34–41. https://doi.org/10.1016/j.annonc.
2021.09.008
13. MacMahon H, Li F, Jiang Y, Armato SG III. Accuracy of the
Vancouver lung cancer risk prediction model compared with
that of radiologists. Chest. 2019;156(1):112–9. https://doi.org/
10.1016/j.chest.2019.04.002
14. Qiu YL, Zheng H, Devos A, Selby H, Gevaert O. A meta-
learning approach for genomic survival analysis. Nat Commun.
2020;11(1):6350. https://doi.org/10.1038/s41467-020-20167-3
15. Xing W, Sun H, Yan C, Zhao C, Wang D, Li M, et al. A predic-
tion model based on DNA methylation biomarkers and radio-
logical characteristics for identifying malignant from benign
pulmonary nodules. BMC Cancer. 2021;21(1):263. https://doi.
org/10.1186/s12885-021-08002-4
16. Hu F, Huang H, Jiang Y, Feng M, Wang H, Tang M, et al. Dis-
criminating invasive adenocarcinoma among lung pure
ground-glass nodules: a multi-parameter prediction model.
J Thorac Dis. 2021;13(9):5383–94. https://doi.org/10.21037/jtd-
21-786
17. Hosny A, Parmar C, Coroller TP, Grossmann P, Zeleznik R,
Kumar A, et al. Deep learning for lung cancer prognostication:
A retrospective multi-cohort radiomics study. PLOS Med. 2018;
15(11):e1002711. https://doi.org/10.1371/journal.pmed.1002711
18. Chen K, Sun J, Zhao H, Jiang R, Zheng J, Li Z, et al.
Non-invasive lung cancer diagnosis and prognosis based on
multi-analyte liquid biopsy. Mol Cancer. 2021;20(1):23. https://
doi.org/10.1186/s12943-021-01323-9
19. Takahashi S, Asada K, Takasawa K, Shimoyama R, Sakai A,
Bolatkan A, et al. Predicting deep learning based multi-omics
parallel integration survival subtypes in lung cancer using
reverse phase protein Array data. Biomolecules. 2020;10(10):
1460. https://doi.org/10.3390/biom10101460
20. Duan X, Yang Y, Tan S, Wang S, Feng X, Cui L, et al. Applica-
tion of artificial neural network model combined with four bio-
markers in auxiliary diagnosis of lung cancer. Med Biol Eng
Comput. 2017;55(8):1239–48. https://doi.org/10.1007/s11517-
016-1585-7
21. Yu L, Tao G, Zhu L, Wang G, Li Z, Ye J, et al. Prediction of
pathologic stage in non-small cell lung cancer using machine
learning algorithm based on CT image feature analysis. BMC
Cancer. 2019;19(1):464. https://doi.org/10.1186/s12885-019-
5646-9
22. Sethi S, Ali S, Philip P, Sarkar F. Clinical advances in molecu-
lar biomarkers for cancer diagnosis and therapy. Int J Mol Sci.
2013;14(7):14771–84. https://doi.org/10.3390/ijms140714771
23. Liu C, Xiang X, Han S, Lim HY, Li L, Zhang X, et al. Blood-
based liquid biopsy: insights into early detection and clinical
management of lung cancer. Cancer Lett. 2022;524:91–102.
https://doi.org/10.1016/j.canlet.2021.10.013
24. Li X, Asmitananda T, Gao L, Gai D, Song Z, Zhang Y, et al.
Biomarkers in the lung cancer diagnosis: a clinical perspective.
Neoplasma. 2012;59(5):500–7. https://doi.org/10.4149/neo_
2012_064
25. Wang B, He YJ, Tian YX, Yang RN, Zhu YR, Qiu H. Clinical
utility of Haptoglobin in combination with CEA, NSE and
CYFRA21-1 for diagnosis of lung cancer. Asian Pac J Cancer
Prev. 2014;15(22):9611–4. https://doi.org/10.7314/APJCP.2014.
15.22.9611
26. Wu LX, Li XF, Chen HF, Zhu YC, Wang WX, Xu CW, et al.
Combined detection of CEA and CA125 for the diagnosis for
lung cancer: a meta-analysis. Cell Mol Biol (Noisy-le-grand).
2018;64(15):67–70.
27. Zhou J, Diao X, Wang S, Yao Y. Diagnosis value of combined
detection of serum SF, CEA and CRP in non-small cell lung
cancer. Cancer Manag Res. 2020;12:8813–9. https://doi.org/10.
2147/CMAR.S268565
28. Liu L, Teng J, Zhang L, Cong P, Yao Y, Sun G, et al. The com-
bination of the tumor markers suggests the histological diagno-
sis of lung cancer. Biomed Res Int. 2017;2017:1–9. https://doi.
org/10.1155/2017/2013989
29. Cosma G, McArdle SE, Foulds GA, Hood SP, Reeder S,
Johnson C, et al. Prostate cancer: early detection and assessing
clinical risk using deep machine learning of high dimensional
peripheral blood flow Cytometric phenotyping data. Front Immu-
nol. 2021;12:786828. https://doi.org/10.3389/fimmu.2021.786828
30. Hood SP, Cosma G, Foulds GA, Johnson C, Reeder S,
McArdle SE, et al. Identifying prostate cancer and its clinical
risk in asymptomatic men using machine learning of high
dimensional peripheral blood flow cytometric natural killer cell
subset phenotyping data. Elife. 2020;9:e50936. https://doi.org/
10.7554/eLife.50936
31. Kossenkov AV, Qureshi R, Dawany NB, Wickramasinghe J,
Liu Q, Majumdar RS, et al. A gene expression classifier from
whole blood distinguishes benign from malignant lung nodules
detected by low-dose CT. Cancer Res. 2019;79(1):263–73.
https://doi.org/10.1158/0008-5472.CAN-18-2032
32. Bloom CI, Graham CM, Berry MPR, Rozakeas F, Redford PS,
Wang Y, et al. Transcriptional blood signatures distinguish pul-
monary tuberculosis, pulmonary sarcoidosis, pneumonias and
lung cancers. PLoS ONE. 2013;8:e70630. https://doi.org/10.
1371/journal.pone.0070630
33. Kursa MB, Rudnicki WR. Feature selection with the Boruta
package. J Stat Softw. 2010;36(11). https://doi.org/10.18637/jss.
v036.i11
34. Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation
analysis for microarray and RNA-Seq data. BMC Bioinformat-
ics. 2013;14(1):7. https://doi.org/10.1186/1471-2105-14-7
35. Liberzon A, Subramanian A, Pinchback R, Thorvaldsd
ottir H,
Tamayo P, Mesirov JP. Molecular Signatures Database
(MSigDB) 3.0. Bioinformatics. 2011;27(12):1739–40. https://doi.
org/10.1093/bioinformatics/btr260
12 of 14 LI ET AL.
36. Aran D, Hu Z, Butte AJ. xCell: digitally portraying the tissue
cellular heterogeneity landscape. Genome Biol. 2017;18(1):220.
https://doi.org/10.1186/s13059-017-1349-1
37. Ravi A, Hellmann MD, Arniella MB, Holton M, Freeman SS,
Naranbhai V, et al. Genomic and transcriptomic analysis of
checkpoint blockade response in advanced non-small cell lung
cancer. Nat Genet. 2023;55(5):807–19. https://doi.org/10.1038/
s41588-023-01355-5
38. Riquelme E, Suraokar M, Behrens C, Lin HY, Girard L,
Nilsson MB, et al. VEGF/VEGFR-2 upregulates EZH2 expres-
sion in lung adenocarcinoma cells and EZH2 depletion
enhances the response to platinum-based and VEGFR-2–
targeted therapy. Clin Cancer Res. 2014;20(14):3849–61.
https://doi.org/10.1158/1078-0432.CCR-13-1916
39. Yu G, Herazo-Maya JD, Nukui T, Romkes M, Parwani A, Juan-
Guardela BM, et al. Matrix metalloproteinase-19 promotes met-
astatic behavior in vitro and is associated with increased mor-
tality in non-small cell lung cancer. Am J Respir Crit Care
Med. 2014;190(7):780–90. https://doi.org/10.1164/rccm.201310-
1903OC
40. Feng L, Wang J, Cao B, Zhang Y, Wu B, Di X, et al. Gene
expression profiling in human lung development: an abundant
resource for lung adenocarcinoma prognosis. PLoS ONE. 2014;
9(8):e105639. https://doi.org/10.1371/journal.pone.0105639
41. Mitchell KA, Zingone A, Toulabi L, Boeckelman J, Ryan BM.
Comparative transcriptome profiling reveals coding and non-
coding RNA differences in NSCLC from African Americans
and European Americans. Clin Cancer Res. 2017;23(23):7412–
25. https://doi.org/10.1158/1078-0432.CCR-17-0527
42. Zabor EC, Reddy CA, Tendulkar RD, Patil S. Logistic regres-
sion in clinical studies. Int J Radiat Oncol. 2022;112(2):271–7.
https://doi.org/10.1016/j.ijrobp.2021.08.007
43. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-
C, et al. pROC: display and analyze ROC curves. Expasy Org.
2015 Available from: https://cran.r-project.org/web/packages/
pROC/index.html
44. Naghibi SA, Ahmadi K, Daneshi A. Application of support vec-
tor machine, random forest, and genetic algorithm optimized
random Forest models in groundwater potential mapping.
Water Resour Manag. 2017;31(9):2761–75. https://doi.org/10.
1007/s11269-017-1660-3
45. Lin A, Qi C, Wei T, Li M, Cheng Q, Liu Z, et al. CAMOIP: a
web server for comprehensive analysis on multi-omics of
immunotherapy in pan-cancer. Brief Bioinform. 2022;23(3):
bbac129. https://doi.org/10.1093/bib/bbac129
46. Camp RL, Dolled-Filhart M, Rimm DL. X-Tile. Clin Cancer
Res. 2004;10(21):7252–9. https://doi.org/10.1158/1078-0432.
CCR-04-0713
47. Vickers AJ, Elkin EB. Decision curve analysis: a novel method
for evaluating prediction models. Med Decis Making. 2006;
26(6):565–74. https://doi.org/10.1177/0272989X06295361
48. Wickham H, Chang W, Henry L, Takahashi K, Wilke C,
Woo K, et al. ggplot2: create elegant data visualisations using
the grammar of graphics. 2016 Available from: https://cran.r-
project.org/web/packages/ggplot2/index.html
49. Frankell AM, Dietzen M, Al Bakir M, Lim EL, Karasaki T,
Ward S, et al. The evolution of lung cancer and impact of sub-
clonal selection in TRACERx. Nature. 2023;616(7957):525–33.
https://doi.org/10.1038/s41586-023-05783-5
50. Chen WW, Liu W, Li Y, Wang J, Ren Y, Wang G, et al. Deci-
phering the immune-tumor interplay during early-stage lung
cancer development via single-cell technology. Front Oncol.
2022;11:716042. https://doi.org/10.3389/fonc.2021.716042
51. Huang S, Yang J, Shen N, Xu Q, Zhao Q. Artificial intelligence
in lung cancer diagnosis and prognosis: current application
and future perspective. Semin Cancer Biol. 2023;89:30–7.
https://doi.org/10.1016/j.semcancer.2023.01.006
52. McGranahan N, Swanton C. Cancer evolution constrained by
the immune microenvironment. Cell. 2017;170(5):825–7.
https://doi.org/10.1016/j.cell.2017.08.012
53. Schreiber RD, Old LJ, Smyth MJ. Cancer Immunoediting: inte-
grating Immunity's roles in cancer suppression and promotion.
Science. 2011;331(6024):1565–70. https://doi.org/10.1126/science.
1203486
54. Mittal D, Gubin MM, Schreiber RD, Smyth MJ. New insights
into cancer immunoediting and its three component phases—
elimination, equilibrium and escape. Curr Opin Immunol.
2014;27:16–25. https://doi.org/10.1016/j.coi.2014.01.004
55. Riemann D, Cwikowski M, Turzer S, Giese T, Grallert M,
Schütte W, et al. Blood immune cell biomarkers in lung cancer.
Clin Exp Immunol. 2019;195(2):179–89. https://doi.org/10.
1111/cei.13219
56. Saab S, Zalzale H, Rahal Z, Khalifeh Y, Sinjab A, Kadara H.
Insights into lung cancer immune-based biology, prevention,
and treatment. Front Immunol. 2020;11:159. https://doi.org/10.
3389/fimmu.2020.00159
57. Bischoff P, Trinks A, Obermayer B, Pett JP, Wiederspahn J,
Uhlitz F, et al. Single-cell RNA sequencing reveals distinct tumor
microenvironmental patterns in lung adenocarcinoma. Oncogene.
2021;40(50):6748–58. https://doi.org/10.1038/s41388-021-02054-3
58. Sinjab A, Han G, Treekitkarnmongkol W, Hara K,
Brennan PM, Dang M, et al. Resolving the spatial and cellular
architecture of lung adenocarcinoma by multiregion single-cell
sequencing. Cancer Discov. 2021;11(10):2506–23. https://doi.
org/10.1158/2159-8290.CD-20-1285
59. Zhang C, Tang B, Hu J, Fang X, Bian H, Han J, et al. Neutro-
phils correlate with hypoxia microenvironment and promote
progression of non-small-cell lung cancer. Bioengineered. 2021;
12(1):8872–84. https://doi.org/10.1080/21655979.2021.1987820
60. DeBerardinis RJ. Tumor microenvironment, metabolism, and
immunotherapy. N Engl J Med. 2020;382(9):869–71. https://
doi.org/10.1056/NEJMcibr1914890
61. Tello D, Balsa E, Acosta-Iborra B, Fuertes-Yebra E, Elorza A,
Ord
oñez
´
A, et al. Induction of the mitochondrial NDUFA4L2
protein by HIF-1αdecreases oxygen consumption by inhibiting
complex I activity. Cell Metab. 2011;14(6):768–79. https://doi.
org/10.1016/j.cmet.2011.10.008
62. Shiratsuki S, Hara T, Munakata Y, Shirasuna K, Kuwayama T,
Iwata H. Low oxygen level increases proliferation and meta-
bolic changes in bovine granulosa cells. Mol Cell Endocrinol.
2016;437:75–85. https://doi.org/10.1016/j.mce.2016.08.010
63. Tirpe AA, Gulei D, Ciortea SM, Crivii C, Berindan-Neagoe I.
Hypoxia: overview on hypoxia-mediated mechanisms with a
focus on the role of HIF genes. Int J Mol Sci. 2019;20(24):6140.
https://doi.org/10.3390/ijms20246140
64. Masoud GN, Li W. HIF-1αpathway: role, regulation and inter-
vention for cancer therapy. Acta Pharm Sin B. 2015;5(5):378–
89. https://doi.org/10.1016/j.apsb.2015.05.007
LI ET AL.13 of 14
65. Scanlon SE, Glazer PM. Multifaceted control of DNA repair
pathways by the hypoxic tumor microenvironment. DNA
Repair. 2015;32:180–9. https://doi.org/10.1016/j.dnarep.2015.
04030
66. Gao X, Wang G, Zhao W, Han J, Diao CY, Wang XH, et al.
Blocking OLFM4/HIF-1αaxis alleviates hypoxia-induced inva-
sion, epithelial–mesenchymal transition, and chemotherapy
resistance in non-small-cell lung cancer. J Cell Physiol. 2019;
234(9):15035–43. https://doi.org/10.1002/jcp.28144
67. Reka AK, Goswami MT, Krishnapuram R, Standiford TJ,
Keshamouni VG. Molecular cross-regulation between PPAR-γ
and other signaling pathways: implications for lung cancer
therapy. Lung Cancer. 2011;72(2):154–9. https://doi.org/10.
1016/j.lungcan.2011.01.019
68. Giaginis C, Politi E, Alexandrou P, Sfiniadakis J, Kouraklis G,
Theocharis S. Expression of peroxisome proliferator activated
receptor-gamma (PPAR-γ) in human non-small cell lung carci-
noma: correlation with Clinicopathological parameters, prolif-
eration and apoptosis related molecules and Patients' survival.
Pathol Oncol Res. 2012;18(4):875–83. https://doi.org/10.1007/
s12253-012-9517-9
69. Xu R, Luo X, Ye X, Li H, Liu H, du Q, et al. SIRT1/PGC-
1α/PPAR-γcorrelate with hypoxia-induced chemoresistance in
non-small cell lung cancer. Front Oncol. 2021;11:682762.
https://doi.org/10.3389/fonc.2021.682762
70. Pradere JP, Dapito DH, Schwabe RF. The yin and yang of toll-
like receptors in cancer. Oncogene. 2014;33(27):3485–95.
https://doi.org/10.1038/onc.2013.302
71. Martín-Medina A, Cer
on-Pisa N, Martinez-Font E, Shafiek H,
Obrador-Hevia A, Sauleda J, et al. TLR/WNT: a novel relation-
ship in immunomodulation of lung cancer. Int J Mol Sci. 2022;
23(12):6539. https://doi.org/10.3390/ijms-23126539
72. Dutta J, Fan Y, Gupta N, Fan G. Gélinas C Current insights
into the regulation of programmed cell death by NF-κB.
Oncogene. 2006;25(51):6800–16. https://doi.org/10.1038/sj.
onc.1209938
73. Choi CH, Kang TH, Song JS, Kim YS, Chung EJ, Ylaya K, et al.
Hewitt SM Elevated expression of pancreatic adenocarcinoma
upregulated factor (PAUF) is associated with poor prognosis
and chemoresistance in epithelial ovarian cancer. Sci Rep.
2018;8(1):12161. https://doi.org/10.1038/s41598-018-30582-8
74. Majidinia M, Yousefi B. DNA repair and damage pathways in
breast cancer development and therapy. DNA Repair. 2017;54:
22–9. https://doi.org/10.1016/j.dnarep.2017.03.009
75. Kristeleit RS, Miller RE, Kohn EC. Gynecologic cancers:
emerging novel strategies for targeting DNA repair deficiency.
Am Soc Clin Oncol Educ Book. 2016;36:e259–68. https://doi.
org/10.1200/EDBK_159086
SUPPORTING INFORMATION
Additional supporting information can be found online
in the Supporting Information section at the end of this
article.
How to cite this article: Li X, Li X, Qin J, Lei L,
Guo H, Zheng X, et al. Machine learning-derived
peripheral blood transcriptomic biomarkers for
early lung cancer diagnosis: Unveiling tumor-
immune interaction mechanisms. BioFactors. 2025;
51(1):e2129. https://doi.org/10.1002/biof.2129
14 of 14 LI ET AL.