7.1 Clustering using Seurat’s FindClusters()
function
We have had the most success using the graph clustering approach implemented by Seurat. In ArchR, clustering is performed using the addClusters()
function which permits additional clustering parameters to be passed to the Seurat::FindClusters()
function via ...
. In our hands, clustering using Seurat::FindClusters()
is deterministic, meaning that the exact same input will always result in the exact same output.
<- addClusters(
projHeme2 input = projHeme2,
reducedDims = "IterativeLSI",
method = "Seurat",
name = "Clusters",
resolution = 0.8
)## ArchR logging to : ArchRLogs/ArchR-addClusters-371b039f6f826-Date-2022-12-23_Time-06-09-39.log
## If there is an issue, please report to github with logFile!
## 2022-12-23 06:09:40 : Running Seurats FindClusters (Stuart et al. Cell 2019), 0.001 mins elapsed.
## Warning: The following arguments are not used: row.names
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 10250
## Number of edges: 453996
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8665
## Number of communities: 12
## Elapsed time: 1 seconds
## 2022-12-23 06:09:53 : Testing Outlier Clusters, 0.221 mins elapsed.
## 2022-12-23 06:09:53 : Assigning Cluster Names to 12 Clusters, 0.221 mins elapsed.
## 2022-12-23 06:09:53 : Finished addClusters, 0.221 mins elapsed.
To access these clusters we can use the $
accessor which shows the cluster ID for each single cell.
head(projHeme2$Clusters)
## [1] "C9" "C6" "C10" "C10" "C10" "C4"
We can tabulate the number of cells present in each cluster:
table(projHeme2$Clusters)
##
## C1 C10 C11 C12 C2 C3 C4 C5 C6 C7 C8 C9
## 1553 354 315 383 1119 709 1200 1404 932 1239 614 428
To better understand which samples reside in which clusters, we can create a cluster confusion matrix across each sample using the confusionMatrix()
function.
<- confusionMatrix(paste0(projHeme2$Clusters), paste0(projHeme2$Sample))
cM
cM## 12 x 3 sparse Matrix of class "dgCMatrix"
## scATAC_BMMC_R1 scATAC_CD34_BMMC_R1 scATAC_PBMC_R1
## C9 257 5 166
## C6 330 . 602
## C10 354 . .
## C4 296 894 10
## C1 1514 16 23
## C3 170 539 .
## C7 1199 . 40
## C5 138 1266 .
## C8 84 . 530
## C11 154 151 10
## C12 87 296 .
## C2 106 1 1012
To plot this confusion matrix as a heatmap, we use the pheatmap
package:
library(pheatmap)
<- cM / Matrix::rowSums(cM)
cM <- pheatmap::pheatmap(
p mat = as.matrix(cM),
color = paletteContinuous("whiteBlue"),
border_color = "black"
) p
There are times where the relative location of cells within the 2-dimensional embedding does not agree perfectly with the identified clusters. More explicitly, cells from a single cluster may appear in multiple different areas of the embedding. In these contexts, it may be appropriate to adjust the clustering parameters or embedding parameters until there is agreement between the two.