7.1 Clustering using Seurat’s FindClusters() function

We have had the most success using the graph clustering approach implemented by Seurat. In ArchR, clustering is performed using the addClusters() function which permits additional clustering parameters to be passed to the Seurat::FindClusters() function via .... In our hands, clustering using Seurat::FindClusters() is deterministic, meaning that the exact same input will always result in the exact same output.

projHeme2 <- addClusters(
    input = projHeme2,
    reducedDims = "IterativeLSI",
    method = "Seurat",
    name = "Clusters",
    resolution = 0.8
)
## ArchR logging to : ArchRLogs/ArchR-addClusters-371b039f6f826-Date-2022-12-23_Time-06-09-39.log
## If there is an issue, please report to github with logFile!
## 2022-12-23 06:09:40 : Running Seurats FindClusters (Stuart et al. Cell 2019), 0.001 mins elapsed.
## Warning: The following arguments are not used: row.names
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 10250
## Number of edges: 453996
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8665
## Number of communities: 12
## Elapsed time: 1 seconds
## 2022-12-23 06:09:53 : Testing Outlier Clusters, 0.221 mins elapsed.
## 2022-12-23 06:09:53 : Assigning Cluster Names to 12 Clusters, 0.221 mins elapsed.
## 2022-12-23 06:09:53 : Finished addClusters, 0.221 mins elapsed.

To access these clusters we can use the $ accessor which shows the cluster ID for each single cell.

head(projHeme2$Clusters)
## [1] "C9"  "C6"  "C10" "C10" "C10" "C4"

We can tabulate the number of cells present in each cluster:

table(projHeme2$Clusters)
## 
##   C1  C10  C11  C12   C2   C3   C4   C5   C6   C7   C8   C9 
## 1553  354  315  383 1119  709 1200 1404  932 1239  614  428

To better understand which samples reside in which clusters, we can create a cluster confusion matrix across each sample using the confusionMatrix() function.

cM <- confusionMatrix(paste0(projHeme2$Clusters), paste0(projHeme2$Sample))
cM
## 12 x 3 sparse Matrix of class "dgCMatrix"
##     scATAC_BMMC_R1 scATAC_CD34_BMMC_R1 scATAC_PBMC_R1
## C9             257                   5            166
## C6             330                   .            602
## C10            354                   .              .
## C4             296                 894             10
## C1            1514                  16             23
## C3             170                 539              .
## C7            1199                   .             40
## C5             138                1266              .
## C8              84                   .            530
## C11            154                 151             10
## C12             87                 296              .
## C2             106                   1           1012

To plot this confusion matrix as a heatmap, we use the pheatmap package:

library(pheatmap)
cM <- cM / Matrix::rowSums(cM)
p <- pheatmap::pheatmap(
    mat = as.matrix(cM), 
    color = paletteContinuous("whiteBlue"), 
    border_color = "black"
)
p

There are times where the relative location of cells within the 2-dimensional embedding does not agree perfectly with the identified clusters. More explicitly, cells from a single cluster may appear in multiple different areas of the embedding. In these contexts, it may be appropriate to adjust the clustering parameters or embedding parameters until there is agreement between the two.