Figure - available from: International Journal of Cancer
This content is subject to copyright. Terms and conditions apply.
Consensus calls identify a cluster of genomically normal cells unique to left‐sided cancer samples. (A) UMAP of epithelial cells, colored by louvain clustering. (B) Stacked bar plot of consensus calls across 20 louvain clusters (cancer sample and genomically cancer, orange; cancer sample and genomically normal, blue, normal sample, grey). (C) Bar plot of cluster homogeneity scores for cancer cell calls by different methods as indicated. (D) Relative fractions of genomically normal cells in cluster 9, by cancer location (see Figure 1A). P‐value from mixed‐effects binomial model, *** P < .001. (E) Pie chart of the epithelial cell types in louvain cluster 9, as indicated. Color code: Enterocyte (dark green), Enterocyte progenitor (light green), Immature Goblet (light purple), Stem/TA (dark blue), and Stem (light blue). (F) Dot plot of top 10 marker genes for louvain cluster 9. Color of dot represents the mean normalized expression of the gene, and the size of the dot shows the fraction cells expressing the gene. (G) UMAP colored by PLA2G2A expression, which is the top gene marker specific to louvain cluster 9.
Source publication
Single‐cell analyses can be confounded by assigning unrelated groups of cells to common developmental trajectories. For instance, cancer cells and admixed normal epithelial cells could adopt similar cell states thus complicating analyses of their developmental potential. Here, we develop and benchmark CCISM (for Cancer Cell Identification using Som...
Citations
Motivation:
Single-cell RNA sequencing (scRNA-seq) data are widely used to study cancer cell states and their heterogeneity. However, the tumour microenvironment is usually a mixture of healthy and cancerous cells and it can be difficult to fully separate these two populations based on transcriptomics alone. If available, somatic single-nucleotide variants (SNVs) observed in the scRNA-seq data could be used to identify the cancer population and match that information with the single cells' expression profile. However, calling somatic SNVs in scRNA-seq data is a challenging task, as most variants seen in the short-read data are not somatic, but can instead be germline variants, RNA edits or transcription, sequencing, or processing errors. In addition, only variants present in actively transcribed regions for each individual cell will be seen in the data.
Results:
To address these challenges, we develop CCLONE (Cancer Cell Labelling On Noisy Expression), an interpretable tool adapted to handle the uncertainty and sparsity of SNVs called from scRNA-seq data. CCLONE jointly identifies cancer clonal populations, and their associated variants. We apply CCLONE on two acute myeloid leukaemia datasets and one lung adenocarcinoma dataset and show that CCLONE captures both genetic clones and somatic events for multiple patients. These results show how CCLONE can be used to gather insight into the course of the disease and the origin of cancer cells in scRNA-seq data.
Availability and implementation:
Source code is available at github.com/HaghverdiLab/CCLONE.