| Schematic representation of the proposed cluster-driven batch alignment (CBA) method to align single cell RNA-seq measured in two different batches. The unsupervised clustering for cells from both batches and the network architecture in CBA are shown and the explanation of various nodes are listed on the top left corner. Cell A and cell C are from batch one, and cells B and D are from batch two. At its core, the alignment is done using an autoencoder where cells A&B are aligned and embedded in a lower dimensional representation M and, simultaneously, cells C&D are aligned and embedded in N. M&N are subsequently used to represent the aligned cells, e.g., to make a UMAP visualization. Details on the autoencoder as well as the classification layer can be found in the section that describes CBA.

| Schematic representation of the proposed cluster-driven batch alignment (CBA) method to align single cell RNA-seq measured in two different batches. The unsupervised clustering for cells from both batches and the network architecture in CBA are shown and the explanation of various nodes are listed on the top left corner. Cell A and cell C are from batch one, and cells B and D are from batch two. At its core, the alignment is done using an autoencoder where cells A&B are aligned and embedded in a lower dimensional representation M and, simultaneously, cells C&D are aligned and embedded in N. M&N are subsequently used to represent the aligned cells, e.g., to make a UMAP visualization. Details on the autoencoder as well as the classification layer can be found in the section that describes CBA.

Source publication
Article
Full-text available
The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is however hampered by technical differences between these...

Contexts in source publication

Context 1
... complete CBA workflow is shown in Figure 1. The main idea is that we aim to retain the data structure in the separate batches as much as possible. ...
Context 2
... integrate cells from two batches, we use an autoencoder. The network architecture is shown in Figure 1. Inputs are the cells of the different batches (expression vectors related to the selected PCs), the clustering of the separate batches, as well as their matching. ...
Context 3
... the core of the autoencoder is the embedding of a cell to a lower dimensional representation in such a way that the expression profile can be reconstructed as good as possible, i.e., the reconstruction loss, L r should be minimized. In Figure 1 this can be seen by following cell A: the initial representation A1 is embedded with stacked dense layers to a lower presentation A3, which is then reconstructed to the original dimensions A4. To accomplish an alignment between the two batches, we perform this auto-encoding for two cells from the two different batches simultaneously. ...
Context 4
... accomplish an alignment between the two batches, we perform this auto-encoding for two cells from the two different batches simultaneously. Moreover, we do that for two pairs of cells at the same time (for reasons explained later) in different streams, i.e., pair A&B and pair C&D in stream 1 and 2, respectively (Figure 1). Focusing on pair A&B, their representation is concatenated and embedded in M and reconstructed in A4 and B4. ...
Context 5
... this reason, we make use of the two streams. Then, requirements on cells from the same batch, in Figure 1 A&C as well as B&D, can be formulated. Focusing on A&C, we require that the reconstructed versions A4&C4 are close to each other when they are from the same cluster, and we do not put any requirement when they are from different clusters. ...
Context 6
... we require that two cells from different batches but matching clusters also should be close together. For this cluster preserving loss, L c , we thus put requirements on pairs A&B and C&D in Figure 1. Focusing on A&B, we require that their embedded representations A4&B4 are close to each other when they are in matching clusters and there are no constraints when they are not in matching clusters. ...
Context 7
... than 10,000 epochs are needed for training per experiment (early stopping is used to prevent overfitting). The used memory vs. the training time is shown in Supplementary Figure 1 (about 1.18 GB), the memory is queried using the psutil module in Python. In LIGER, the parameter k in optimizeALS() is set to 20 for the pancreas datasets and to 40 for the mouse lung datasets (as recommended by the authors). ...
Context 8
... is an autoencoder and therefore finds an embedding space in which the two batches are aligned. Figure 3a1 shows the aligned cells in the embedded space colored according to their batch. The batch effect is removed by CBA as cells from different batches are overlapping. ...
Context 9
... 3 shows the resulting alignments. From Figure 3b1, it shows that Seurat nicely aligns the batches, which is not the case for Scanorama (Figure 3d1). For the latter one, you can see clusters of aligned data consisting of cells of only one batch, often relating to an original cluster in the separate batches (Figures 3d2,d3). ...
Context 10
... 3 shows the resulting alignments. From Figure 3b1, it shows that Seurat nicely aligns the batches, which is not the case for Scanorama (Figure 3d1). For the latter one, you can see clusters of aligned data consisting of cells of only one batch, often relating to an original cluster in the separate batches (Figures 3d2,d3). ...
Context 11
... the latter one, you can see clusters of aligned data consisting of cells of only one batch, often relating to an original cluster in the separate batches (Figures 3d2,d3). BBKNN (Figure 3c1) also can align both batches but seems to increase the internal variation within the clusters, resulting in touching/overlapping clusters. LIGER (Figure 3g1) also performs well; however, it merges some acinar cells with ductal cells. ...
Context 12
... (Figure 3c1) also can align both batches but seems to increase the internal variation within the clusters, resulting in touching/overlapping clusters. LIGER (Figure 3g1) also performs well; however, it merges some acinar cells with ductal cells. Interestingly, BERMUDA, which is also an auto-encoder based alignment network like ours, has more problems in aligning the batches. ...
Context 13
... performs best on the metrics that compare the clustering of the aligned data to the clustering in the original datasets, i.e., it best preserved the original clusters, but at the expense of not really aligning the dataset. Harmony does well on all metrics, especially on how well the aligned dataset clusters (SC), but visually the aligned data is not convincing in Figures 3b1,e1. ...

Similar publications

Article
Full-text available
Bilateral renal cell carcinoma (RCC) is a rare disease that can be classified as either familial or sporadic. Studying the cellular molecular characteristics of sporadic bilateral RCC is important to provide guidance for clinical treatment. Cellular molecular characteristics can be expressed at the RNA level, especially at the single-cell degree. S...