13.1 Identifying Marker Peaks with ArchR

Often times, we are interested to know which peaks are unique to an individual cluster or a small group of clusters. We can do this in an unsupervised fashion in ArchR using the addMarkerFeatures() function in combination with useMatrix = "PeakMatrix".

First, lets remind ourselves of the cell types that we are working with in projHeme5 and their relative proportions.

table(projHeme5$Clusters2)
## 
##          B      CD4.M      CD4.N        CLP  Erythroid        GMP       Mono 
##        428        614       1239        383        709       1200       2672 
##         NK        pDC       PreB Progenitor 
##        932        315        354       1404

Now, we are ready to identify marker peaks by calling the addMarkerFeatures() function with useMatrix = "PeakMatrix". Additionally, we tell ArchR to account for differences in data quality amongst the cell groups by setting the bias parameter to account for TSS enrichment and the number of unique fragments per cell.

markerPeaks <- getMarkerFeatures(
    ArchRProj = projHeme5, 
    useMatrix = "PeakMatrix", 
    groupBy = "Clusters2",
  bias = c("TSSEnrichment", "log10(nFrags)"),
  testMethod = "wilcoxon"
)
## ArchR logging to : ArchRLogs/ArchR-getMarkerFeatures-371b070498788-Date-2022-12-23_Time-07-43-28.log
## If there is an issue, please report to github with logFile!
## MatrixClass = Sparse.Integer.Matrix
## 2022-12-23 07:43:30 : Matching Known Biases, 0.005 mins elapsed.
## ###########
## 2022-12-23 07:44:39 : Completed Pairwise Tests, 1.161 mins elapsed.
## ###########
## ArchR logging successful to : ArchRLogs/ArchR-getMarkerFeatures-371b070498788-Date-2022-12-23_Time-07-43-28.log

The object returned by the getMarkerFeatures() function is a SummarizedExperiment that contains a few different assays.

markerPeaks
## class: SummarizedExperiment 
## dim: 142475 11 
## metadata(2): MatchInfo Params
## assays(7): Log2FC Mean ... AUC MeanBGD
## rownames(142475): 1 2 ... 142474 142475
## rowData names(4): seqnames idx start end
## colnames(11): B CD4.M ... PreB Progenitor
## colData names(0):

We can use the getMarkers() function to retrieve particular slices of this SummarizedExperiment that we are interested in. The default behavior of this function is to return a list of DataFrame objects, one for each cell group.

markerList <- getMarkers(markerPeaks, cutOff = "FDR <= 0.01 & Log2FC >= 1")
markerList
## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor

If we are interested in the marker peaks for a specific cell group, we can access this from the list via the $ accessor.

markerList$Erythroid
## DataFrame with 2913 rows and 7 columns
##        seqnames       idx     start       end    Log2FC         FDR  MeanDiff
##           <Rle> <integer> <integer> <integer> <numeric>   <numeric> <numeric>
## 88068     chr22      1269  30129823  30130323   4.03035 2.16505e-15  1.136746
## 2686       chr1      2686  27869046  27869546   6.49248 1.37776e-14  1.042588
## 45848     chr15      2979  75986645  75987145   4.57935 1.53210e-14  0.890110
## 45672     chr15      2803  74902681  74903181   8.79924 1.50478e-13  0.897953
## 123083     chr7      2986  65392660  65393160   8.84562 2.15504e-13  0.927354
## ...         ...       ...       ...       ...       ...         ...       ...
## 71569      chr2      1721  32817077  32817577   2.05638  0.00994914  0.226892
## 62970     chr18      2642  77279716  77280216   3.67312  0.00996505  0.163021
## 132301     chr8      5168 138915979 138916479   3.93326  0.00996505  0.255747
## 98007      chr3      7667 182791944 182792444   3.82732  0.00999030  0.159790
## 133672     chr9       683  27247710  27248210   3.62070  0.00999030  0.228739

Instead of a list of DataFrame objects, we can use getMarkers() to return a GRangesList object by setting returnGR = TRUE.

markerList <- getMarkers(markerPeaks, cutOff = "FDR <= 0.01 & Log2FC >= 1", returnGR = TRUE)
markerList
## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor

This GRangesList object can similarly be subset to a GRanges object for a particular cell group using the $ accessor.

markerList$Erythroid
## GRanges object with 2913 ranges and 3 metadata columns:
##          seqnames              ranges strand |    Log2FC         FDR  MeanDiff
##             <Rle>           <IRanges>  <Rle> | <numeric>   <numeric> <numeric>
##      [1]    chr22   30129823-30130323      * |   4.03035 2.16505e-15  1.136746
##      [2]     chr1   27869046-27869546      * |   6.49248 1.37776e-14  1.042588
##      [3]    chr15   75986645-75987145      * |   4.57935 1.53210e-14  0.890110
##      [4]    chr15   74902681-74903181      * |   8.79924 1.50478e-13  0.897953
##      [5]     chr7   65392660-65393160      * |   8.84562 2.15504e-13  0.927354
##      ...      ...                 ...    ... .       ...         ...       ...
##   [2909]     chr2   32817077-32817577      * |   2.05638  0.00994914  0.226892
##   [2910]    chr18   77279716-77280216      * |   3.67312  0.00996505  0.163021
##   [2911]     chr8 138915979-138916479      * |   3.93326  0.00996505  0.255747
##   [2912]     chr3 182791944-182792444      * |   3.82732  0.00999030  0.159790
##   [2913]     chr9   27247710-27248210      * |   3.62070  0.00999030  0.228739
##   -------
##   seqinfo: 23 sequences from an unspecified genome; no seqlengths