9.6 Module Scores

The reliability of gene scores derived from chromatin accessibility is highly dependent on the particular gene being queried. For example, genes that exist in highly gene-dense regions of the genome can be harder to accurately represent with gene scores because the surrounding peaks may not be acting to regulate its gene expression. For this reason, it’s often best to use gene scores from multiple different genes to (for example) assign cluster identities. To facilitate this, ArchR enables the use of “modules” which are user-defined groups of features where the signal from each individual feature is combined using equal weights into a single module score. The per-cell values for each user-defined module are stored as a new column in the cellColData of the given ArchRProject.

Module scores are added using the addModuleScore() function. The key parameters for this function are features and name. features is a named list of character vectors containing gene names while name is the broad title that will be given to each column added to cellColData

features <- list(
  BScore = c("MS4A1", "CD79A", "CD74"),
  TScore = c("CD3D", "CD8A", "GZMB", "CCR7", "LEF1")
)
projHeme2 <- addModuleScore(projHeme2,
    useMatrix = "GeneScoreMatrix",
    name = "Module",
    features = features)
## ArchR logging to : ArchRLogs/ArchR-addModuleScore-371b032779696-Date-2022-12-23_Time-06-33-09.log
## If there is an issue, please report to github with logFile!
## 2022-12-23 06:33:13 : Computing Module 1 of 2, 0.048 mins elapsed.
## 2022-12-23 06:33:18 : Computing Module 2 of 2, 0.14 mins elapsed.
## 2022-12-23 06:33:24 : Finished Running addModuleScore, 0.231 mins elapsed.

In this example, two columns will be added to cellColData named “Module.BScore” and “Module.TScore”. This is because the column name is derived as name.featureName. If your feature list is unnamed, then the columns in cellColData will be numbered instead (i.e. “Module1” and “Module2” if name="Module" as above). The “BScore” module contains 3 gene markers of B cells (MS4A1, CD79A, and CD74) while the “TScore” module contains 5 gene markers for T cells (CD3D, CD8A, GZMB, CCR7, and LEF1).

We can then plot these modules on an embedding.

p1 <- plotEmbedding(projHeme2,
    embedding = "UMAP",
    colorBy = "cellColData",
    name="Module.BScore",
    imputeWeights = getImputeWeights(projHeme2))
## Getting ImputeWeights
## ArchR logging to : ArchRLogs/ArchR-plotEmbedding-371b0bd75a3b-Date-2022-12-23_Time-06-33-24.log
## If there is an issue, please report to github with logFile!
## Getting UMAP Embedding
## ColorBy = cellColData
## Imputing Matrix
## Using weights on disk
## Using weights on disk
## Plotting Embedding
## 1 
## ArchR logging successful to : ArchRLogs/ArchR-plotEmbedding-371b0bd75a3b-Date-2022-12-23_Time-06-33-24.log

p2 <- plotEmbedding(projHeme2,
    embedding = "UMAP",
    colorBy = "cellColData",
    name="Module.TScore",
    imputeWeights = getImputeWeights(projHeme2))
## Getting ImputeWeights
## ArchR logging to : ArchRLogs/ArchR-plotEmbedding-371b02e3541db-Date-2022-12-23_Time-06-33-29.log
## If there is an issue, please report to github with logFile!
## Getting UMAP Embedding
## ColorBy = cellColData
## Imputing Matrix
## Using weights on disk
## Using weights on disk
## Plotting Embedding
## 1 
## ArchR logging successful to : ArchRLogs/ArchR-plotEmbedding-371b02e3541db-Date-2022-12-23_Time-06-33-29.log

plotPDF(ggAlignPlots(p1,p2,draw=F,type="h"))
## Plotting Gtable!
## NULL

Module scores can be calculated from any matrix present within your ArchRProject.