Background. Pyroptosis has been confirmed as a type of inflammatory programmed cell death in recent years. However, the prognostic role of pyroptosis in colon cancer (CC) remains unclear. Methods. Dataset TCGA-COAD which came from the TCGA portal was taken as the training cohort. GSE17538 from the GEO database was treated as validation cohorts. Differential expression genes (DEGs) between normal and tumor tissues were confirmed. Patients were classified into two subgroups according to the expression characteristics of pyroptosis-related DEGs. The LASSO regression analysis was used to build the best prognostic signature, and its reliability was validated using Kaplan–Meier, ROC, PCA, and t-SNE analyses. And a nomogram based on the multivariate Cox analysis was developed. The enrichment analysis was performed in the GO and KEGG to investigate the potential mechanism. In addition, we explored the difference in the abundance of infiltrating immune cells and immune microenvironment between high- and low-risk groups. And we also predicted the association of common immune checkpoints with risk scores. Finally, we verified the expression of the pyroptosis-related hub gene at the protein level by immunohistochemistry. Results. A total of 23 pyroptosis-related DEGs were identified in the TCGA cohort. Patients were classified into two molecular clusters (MC) based on DEGs. Kaplan–Meier survival analysis indicated that patients with MC1 represented significantly poorer OS than patients with MC2. 13 overall survival- (OS-) related DEGs in MCs were used to construct the prognostic signature. Patients in the high-risk group exhibited poorer OS compared to those in the low-risk group. Combined with the clinical features, the risk score was found to be an independent prognostic factor of CC patients. The above results are verified in the external dataset GSE17538. A nomogram was established and showed excellent performance. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses indicated that the varied prognostic performance between high- and low-risk groups may be related to the immune response mediated by local inflammation. Further analysis showed that the high-risk group has stronger immune cell infiltration and lower tumor purity than the low-risk group. Through the correlation between risk score and immune checkpoint expression, T-cell immunoglobulin and mucin domain-containing protein 3 (TIM-3) was predicted as a potential therapeutic target for the high-risk group. Conclusion. The 13-gene signature was associated with OS, immune cells, tumor purity, and immune checkpoints in CC patients, and it could provide the basis for immunotherapy and predicting prognosis and help clinicians make decisions for individualized treatment.
1. Introduction
Colon cancer (CC), a common malignancy arising from the digestive system of mankind, exhibited an obviously rising tendency in both morbidity and mortality [1]. Dietary habits, age, obesity, smoking, and lack of physical exercise are well-known risk factors for colon cancer. The most common subtype of colon cancer is colon adenocarcinoma (COAD) which accounts for 98% of newly diagnosed colon cancer cases with a 5-year survival rate of 40–60% [2]. For therapeutic effect, there are still significant individual differences among patients with CC. the reason is not only associated with socioeconomic factors but also associated with individual genetic heterogeneity [3]. Therefore, it is obviously a great challenge to investigate and develop new strategies for the CC early diagnosis, more precision therapy, and predicting prognosis. Currently, the tumor-node-metastasis (TNM) stage, based on anatomical information, is a common tool to evaluate the prognosis of patients. Nevertheless, the great limitation of the TNM stage is that it may not fully consider the genetic heterogeneity within individual tumors. With the development of sequencing technology, there is a deeper understanding of the transcriptomes of tumors. Assessment of KRAS and BRAF mutation status or MSI status is widely used in clinical treatments. This makes CC patients diagnosed in the middle to the late stage have more treatment opportunities than before [4]. However, because of the complexity of the molecular mechanism affecting the prognosis of CC, single gene/factor prediction models are often accompanied by low accuracy. In contrast, polygene-based models tend to show better results in predicting the prognosis of various cancers [5–7]. Therefore, we need a reliable prognostic gene signature to promote individualized therapy and help survival prediction for CC patients.
Pyroptosis, also known as cell inflammatory necrosis, is a kind of programmed cell death, which is characterized by the continuous expansion of cells until the rupture of the cell membrane, leading to the release of cell contents and activating a strong inflammatory response [8]. Pyroptosis occurs when activated caspase-1 cleaves the protein gasdermin D, releasing the gasdermin N subunit, which can form a pore in the plasma membrane [9]. Pyroptosis is closely related to a variety of diseases; for tumors, it is a double-edged sword. On the one hand, as an innate immune mechanism, pyroptosis can inhibit the development of tumors, and on the other hand, as a proinflammatory cell death mode, pyroptosis, in turn, provides a suitable microenvironment for tumor growth [10]. The long-term chronic inflammatory response can lead to local tissue dysplasia and thus carcinogenesis. Especially, considering that the presence of a large number of bacteria in the intestine may increase the chance of infection with the occurrence of pyroptosis. Therefore, we hypothesized that pyroptosis might play an important role in the development of colon cancer. Although, up to now, several studies have linked pyroptosis with colon cancer [11–13], there are still few scientific and clinical studies on the correlation between CC and pyroptosis; whether pyroptosis is correlated with CC prognosis and identifies expression characteristics of the key pyroptosis-related genes (PDGs) in CC progression remains largely unknown. Despite significant progress in CC gene signatures, few have considered the use of pyroptosis-related gene characteristics to construct a prognostic signature in CC. Accordingly, we carried out a systematic study on pyroptosis-related genes to explore the expression characteristics of those in normal and tumor tissues and predict the prognosis and immune response of patients by trying to construct a prognostic signature.
2. Materials and Methods
2.1. Acquisition of Data
The level 3 RNA-seq data (Workflow Type: HTSeq-FPKM) of 385 COAD patients and the corresponding clinical information were obtained from The Cancer Genome Atlas (TCGA) dataset (https://portal.gdc.cancer.gov/), in which the method of acquisition and application complied with the guidelines and policies. FPKM values were then normalized by log2 (FPKM + 1) for the subsequent analysis. The GSE17538 gene expression profiles were acquired from the Gene Expression Omnibus (GEO: https://www.ncbi.nlm.nih.gov/gds/) database, including exhaustive transcriptome information about 238 cases of colon cancer patients (Platform: GPL570). The original datasets extracted from GSE17538 were normalized with the RMA method. Both TCGA and GEO databases are publicly available; thus, ethical approval is not required for the present study.
2.2. Identification of Differentially Expressed PRGs
A total of 33 pyroptosis-related genes were obtained from previous reports [14]. Among them, GSDMA is deleted because there is no annotation information about it in the GPL570 platform. The “limma” package was used to identify DEGs between tumor and adjacent normal tissues in the TCGA cohort with a value <0.05. Heatmap of DEGs was plotted by “pheatmap” package. PPI networks of DEGs were constructed using STRING v11.5 (http://string-db.org/) with default parameters (confidence = 0.4). Pearson’s correlations among DEGs were calculated using “reshape2” package (cutoff = 0.2), and the correlation networks were generated using “igraph” package. The MCODE plug-in in Cytoscape software was used to identify the hub genes of the PPI network.
2.3. PRGs-Based Classifications of CC Patients in the TCGA and GSE17538 Cohorts
Unsupervised consensus clustering, an algorithm based on k-means machine learning, was utilized to explore a molecular classification of both the TCGA and GSE17538 CC cohorts based on the expression patterns of PRGs using the “ConsensusClusterPlus” package [15] in R. The optimal number of clusters is determined according to the consensus score and the relative change of the area under the CDF curve of the consensus heatmap. Then, Kaplan–Meier survival analysis was performed to evaluate the prognosis of patients in different MCs. We also performed comparisons of the clinicopathological variables and the difference of tumor immune microenvironment between different clusters of patients to further explore the associations between the PRGs-based MCs and the clinical features or local immune status of CC patients.
2.4. Development and Validation of the PRGs-Based Prognostic Risk Signature
We analyzed the differences between patients with different clusters in the TCGA cohort and the GSE17538 cohort to obtain intersecting genes ( < 0.05). Then, based on the TCGA cohort, we used univariate Cox regression analysis to screen the genes related to prognosis by setting a strict significance threshold ( < 0.0001). Afterward, LASSO Cox regression analysis was performed to construct a prognostic signature with minimizing the risk of overfitting. The risk score of the patients is calculated according to the normalized expression level of each gene and corresponding regression coefficient as the following formula: Risk score = ∑ Coefi ∗ Expri. Then, patients were divided into the high-risk group and the low-risk group according to the median risk score. The survival curve was drawn between the high-risk group with the low-risk group by using the “survival” and “survminer” packages of the R software, and the accuracy of the signature is evaluated using the ROC curve. PCA and t-SNE were used for dimensionality reduction analysis to assess the ability to distinguish different risk patients of the risk signature. The stability of the risk signature is verified by the GSE17538 cohort.
2.5. Construction and Validation of a Predictive Nomogram
The Cox regression analysis was performed to determine whether the risk score and relevant clinical parameters could be predictors associated with OS for CC patients. Considering the collinearity among the clinical variables, we excluded the T/N/M stage and retained the AJCC stage. Subsequently, based on the results of multivariate Cox regression analysis, a prognostic nomogram was generated to predict 1-year, 2-year, and 3‐year OS of CC patients in the TCGA cohort. The predicted OS of the nomogram against observed survival rates was plotted using the calibration curve.
2.6. Functional Enrichment and Immune Characterization Analysis
The “limma” R package was used to identify DEGs between the high-risk and low-risk groups ( < 0.05). Gene Ontology (GO) including biological process (BP), cellular component (CC), and molecular function (MF), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of the DEGs were performed using the “clusterProfiler” R package. In order to compare the immune status between different groups. The “gsva” package was utilized to conduct the ssGSEA to calculate the scores of infiltrating immune cells and to evaluate the activity of immune-related pathways. Estimate, immune, and stromal scores of each patient were calculated with the ESTIMATE algorithm of the “estimate” package [16] to evaluate the difference of immune microenvironment. The correlation between the expression of common immune checkpoints and risk score was analyzed by drawing a correlation matrix diagram.
2.7. Statistical Analysis
All statistical analyses were accomplished with R software (v4.0.3). Continuous variables were presented as mean ± standard deviation (SD) as appropriate. Normally and nonnormally distributed variables were analyzed using the unpaired Student’s t-test and the Wilcoxon test, respectively. A hazard ratio (HR) and a 95% confidence interval (CI) were evaluated by univariable and multivariate Cox regression models. The statistical value < 0.05 indicates that the difference is statistically significant (∗ value < 0.05, ∗∗ value < 0.01, ∗∗∗ value < 0.001).
3. Result
3.1. Identification of DEGs between Normal and Tumor Tissues
Figure 1 provides an overview of the study flowchart. A total of 39 normal and 398 tumor tissues samples with gene expression were included in the analysis. We found that the majority of the pyroptosis-related genes (23/32, 72%) were significantly differentially expressed between the two groups ( < 0.001). 10 genes are upregulated (CASP8, NOD1, GPX4, CASP4, PJVK, IL6, IL1B, PLCG1, NOD2, and GSDMC) and 13 downregulated (ELANE, CASP5, NLRP7, IL18, NLRP3, NLRC4, PRKACA, NLRP1, GSDMB, CASP9, CASP3, TIRAP, and NLRP2) in tumor tissues. Figure 2(a) shows a heatmap of the expression levels of these genes. To further explore the interactions between the 23 pyroptosis-related DEGs, a PPI network was constructed (Figure 2(b)). The result shows that CASP4, CASP5, and IL18 are at the core of the network. The correlation network containing pyroptosis-related DEGs is presented in Figure 2(c). A total of 11 hub genes including NOD2, CASP4, NOD1, IL18, IL1B, NLRP1, CASP8, NLRC4, IL6, CASP5, and NLRP3 were identified by the MCODE plug-in in Cytoscape software (Figure 2(d)), and their protein levels were verified using the Human Protein Atlas (HPA) database (Figure 3).