Chapter 9 Gene Scores and Marker Genes with ArchR

While ArchR is able to robustly call clusters, it is not possible to know a priori which cell type is represented by each cluster. This task is often left to manual annotation because every application is different. There are, of course, new tools that help to do this annotation but these have largely been designed with scRNA-seq in mind.

To do this cell type annotation when we only have scATAC-seq data available, we use prior knowledge of cell type-specific marker genes and we estimate gene expression for these genes from our chromatin accessibility data by uing gene scores. A gene score is essentially a prediction of how highly expressed a gene will be based on the accessibility of regulatory elements in the vicinity of the gene. To create these gene scores, ArchR allows for the use of complex user-supplied custom distance-weighted accessibility models. In addition to gene scores, we can also use module scores which allow you to group multiple features (e.g. genes or peaks) together to create more refined cell group definitions.