7.1 Clustering using Seurat’s FindClusters()
function
We have had the most success using the graph clustering approach implemented by Seurat. In ArchR, clustering is performed using the addClusters()
function which permits additional clustering parameters to be passed to the Seurat::FindClusters()
function via ...
. In our hands, clustering using Seurat::FindClusters()
is deterministic, meaning that the exact same input will always result in the exact same output.
projHeme2 <- addClusters(
input = projHeme2,
reducedDims = "IterativeLSI",
method = "Seurat",
name = "Clusters",
resolution = 0.8
)
## ArchR logging to : ArchRLogs/ArchR-addClusters-93b04e914-Date-2025-02-06_Time-01-01-27.957006.log
## If there is an issue, please report to github with logFile!
## 2025-02-06 01:01:28.578552 : Running Seurats FindClusters (Stuart et al. Cell 2019), 0.001 mins elapsed.
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 10250
## Number of edges: 458538
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8668
## Number of communities: 12
## Elapsed time: 1 seconds
## 2025-02-06 01:01:47.388466 : Testing Outlier Clusters, 0.314 mins elapsed.
## 2025-02-06 01:01:47.394136 : Assigning Cluster Names to 12 Clusters, 0.315 mins elapsed.
## 2025-02-06 01:01:47.422364 : Finished addClusters, 0.315 mins elapsed.
To access these clusters we can use the $
accessor which shows the cluster ID for each single cell.
We can tabulate the number of cells present in each cluster:
table(projHeme2$Clusters)
##
## C1 C10 C11 C12 C2 C3 C4 C5 C6 C7 C8 C9
## 1532 903 1250 633 1120 314 351 386 702 1261 1377 421
To better understand which samples reside in which clusters, we can create a cluster confusion matrix across each sample using the confusionMatrix()
function.
cM <- confusionMatrix(paste0(projHeme2$Clusters), paste0(projHeme2$Sample))
cM
## 12 x 3 sparse Matrix of class "dgCMatrix"
## scATAC_BMMC_R1 scATAC_CD34_BMMC_R1 scATAC_PBMC_R1
## C9 254 5 162
## C11 1202 . 48
## C4 351 . .
## C7 310 940 11
## C1 1489 10 33
## C6 171 531 .
## C8 139 1238 .
## C12 86 . 547
## C3 160 144 10
## C10 322 . 581
## C2 117 2 1001
## C5 88 298 .
To plot this confusion matrix as a heatmap, we use the pheatmap
package:
library(pheatmap)
cM <- cM / Matrix::rowSums(cM)
p <- pheatmap::pheatmap(
mat = as.matrix(cM),
color = paletteContinuous("whiteBlue"),
border_color = "black"
)
p
There are times where the relative location of cells within the 2-dimensional embedding does not agree perfectly with the identified clusters. More explicitly, cells from a single cluster may appear in multiple different areas of the embedding. In these contexts, it may be appropriate to adjust the clustering parameters or embedding parameters until there is agreement between the two.