7.1 Clustering using Seurat’s FindClusters() function

We have had the most success using the graph clustering approach implemented by Seurat. In ArchR, clustering is performed using the addClusters() function which permits additional clustering parameters to be passed to the Seurat::FindClusters() function via .... In our hands, clustering using Seurat::FindClusters() is deterministic, meaning that the exact same input will always result in the exact same output.

projHeme2 <- addClusters(
    input = projHeme2,
    reducedDims = "IterativeLSI",
    method = "Seurat",
    name = "Clusters",
    resolution = 0.8
)
## ArchR logging to : ArchRLogs/ArchR-addClusters-93b04e914-Date-2025-02-06_Time-01-01-27.957006.log
## If there is an issue, please report to github with logFile!
## 2025-02-06 01:01:28.578552 : Running Seurats FindClusters (Stuart et al. Cell 2019), 0.001 mins elapsed.
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 10250
## Number of edges: 458538
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8668
## Number of communities: 12
## Elapsed time: 1 seconds
## 2025-02-06 01:01:47.388466 : Testing Outlier Clusters, 0.314 mins elapsed.
## 2025-02-06 01:01:47.394136 : Assigning Cluster Names to 12 Clusters, 0.315 mins elapsed.
## 2025-02-06 01:01:47.422364 : Finished addClusters, 0.315 mins elapsed.

To access these clusters we can use the $ accessor which shows the cluster ID for each single cell.

head(projHeme2$Clusters)
## [1] "C9"  "C11" "C4"  "C4"  "C4"  "C7"

We can tabulate the number of cells present in each cluster:

table(projHeme2$Clusters)
## 
##   C1  C10  C11  C12   C2   C3   C4   C5   C6   C7   C8   C9 
## 1532  903 1250  633 1120  314  351  386  702 1261 1377  421

To better understand which samples reside in which clusters, we can create a cluster confusion matrix across each sample using the confusionMatrix() function.

cM <- confusionMatrix(paste0(projHeme2$Clusters), paste0(projHeme2$Sample))
cM
## 12 x 3 sparse Matrix of class "dgCMatrix"
##     scATAC_BMMC_R1 scATAC_CD34_BMMC_R1 scATAC_PBMC_R1
## C9             254                   5            162
## C11           1202                   .             48
## C4             351                   .              .
## C7             310                 940             11
## C1            1489                  10             33
## C6             171                 531              .
## C8             139                1238              .
## C12             86                   .            547
## C3             160                 144             10
## C10            322                   .            581
## C2             117                   2           1001
## C5              88                 298              .

To plot this confusion matrix as a heatmap, we use the pheatmap package:

library(pheatmap)
cM <- cM / Matrix::rowSums(cM)
p <- pheatmap::pheatmap(
    mat = as.matrix(cM), 
    color = paletteContinuous("whiteBlue"), 
    border_color = "black"
)
p

There are times where the relative location of cells within the 2-dimensional embedding does not agree perfectly with the identified clusters. More explicitly, cells from a single cluster may appear in multiple different areas of the embedding. In these contexts, it may be appropriate to adjust the clustering parameters or embedding parameters until there is agreement between the two.