9.6 Module Scores
The reliability of gene scores derived from chromatin accessibility is highly dependent on the particular gene being queried. For example, genes that exist in highly gene-dense regions of the genome can be harder to accurately represent with gene scores because the surrounding peaks may not be acting to regulate its gene expression. For this reason, it’s often best to use gene scores from multiple different genes to (for example) assign cluster identities. To facilitate this, ArchR enables the use of “modules” which are user-defined groups of features where the signal from each individual feature is combined using equal weights into a single module score. The per-cell values for each user-defined module are stored as a new column in the cellColData
of the given ArchRProject
.
Module scores are added using the addModuleScore()
function. The key parameters for this function are features
and name
. features
is a named list of character vectors containing gene names while name
is the broad title that will be given to each column added to cellColData
<- list(
features BScore = c("MS4A1", "CD79A", "CD74"),
TScore = c("CD3D", "CD8A", "GZMB", "CCR7", "LEF1")
)<- addModuleScore(projHeme2,
projHeme2 useMatrix = "GeneScoreMatrix",
name = "Module",
features = features)
## ArchR logging to : ArchRLogs/ArchR-addModuleScore-371b032779696-Date-2022-12-23_Time-06-33-09.log
## If there is an issue, please report to github with logFile!
## 2022-12-23 06:33:13 : Computing Module 1 of 2, 0.048 mins elapsed.
## 2022-12-23 06:33:18 : Computing Module 2 of 2, 0.14 mins elapsed.
## 2022-12-23 06:33:24 : Finished Running addModuleScore, 0.231 mins elapsed.
In this example, two columns will be added to cellColData
named “Module.BScore” and “Module.TScore”. This is because the column name is derived as name.featureName
. If your feature list is unnamed, then the columns in cellColData
will be numbered instead (i.e. “Module1” and “Module2” if name="Module"
as above). The “BScore” module contains 3 gene markers of B cells (MS4A1, CD79A, and CD74) while the “TScore” module contains 5 gene markers for T cells (CD3D, CD8A, GZMB, CCR7, and LEF1).
We can then plot these modules on an embedding.
<- plotEmbedding(projHeme2,
p1 embedding = "UMAP",
colorBy = "cellColData",
name="Module.BScore",
imputeWeights = getImputeWeights(projHeme2))
## Getting ImputeWeights
## ArchR logging to : ArchRLogs/ArchR-plotEmbedding-371b0bd75a3b-Date-2022-12-23_Time-06-33-24.log
## If there is an issue, please report to github with logFile!
## Getting UMAP Embedding
## ColorBy = cellColData
## Imputing Matrix
## Using weights on disk
## Using weights on disk
## Plotting Embedding
## 1
## ArchR logging successful to : ArchRLogs/ArchR-plotEmbedding-371b0bd75a3b-Date-2022-12-23_Time-06-33-24.log
<- plotEmbedding(projHeme2,
p2 embedding = "UMAP",
colorBy = "cellColData",
name="Module.TScore",
imputeWeights = getImputeWeights(projHeme2))
## Getting ImputeWeights
## ArchR logging to : ArchRLogs/ArchR-plotEmbedding-371b02e3541db-Date-2022-12-23_Time-06-33-29.log
## If there is an issue, please report to github with logFile!
## Getting UMAP Embedding
## ColorBy = cellColData
## Imputing Matrix
## Using weights on disk
## Using weights on disk
## Plotting Embedding
## 1
## ArchR logging successful to : ArchRLogs/ArchR-plotEmbedding-371b02e3541db-Date-2022-12-23_Time-06-33-29.log
plotPDF(ggAlignPlots(p1,p2,draw=F,type="h"))
## Plotting Gtable!
## NULL
Module scores can be calculated from any matrix present within your ArchRProject
.