13.1 Identifying Marker Peaks with ArchR

Often times, we are interested to know which peaks are unique to an individual cluster or a small group of clusters. We can do this in an unsupervised fashion in ArchR using the addMarkerFeatures() function in combination with useMatrix = "PeakMatrix".

First, lets remind ourselves of the cell types that we are working with in projHeme5 and their relative proportions.

table(projHeme5$Clusters2)
## 
##          B      CD4.M      CD4.N        CLP  Erythroid        GMP       Mono 
##        421        633       1250        386        702       1261       2652 
##         NK        pDC       PreB Progenitor 
##        903        314        351       1377

Now, we are ready to identify marker peaks by calling the addMarkerFeatures() function with useMatrix = "PeakMatrix". Additionally, we tell ArchR to account for differences in data quality amongst the cell groups by setting the bias parameter to account for TSS enrichment and the number of unique fragments per cell.

markerPeaks <- getMarkerFeatures(
    ArchRProj = projHeme5, 
    useMatrix = "PeakMatrix", 
    groupBy = "Clusters2",
  bias = c("TSSEnrichment", "log10(nFrags)"),
  testMethod = "wilcoxon"
)
## ArchR logging to : ArchRLogs/ArchR-getMarkerFeatures-96be4b233-Date-2025-02-06_Time-02-03-45.434572.log
## If there is an issue, please report to github with logFile!
## MatrixClass = Sparse.Integer.Matrix
## 2025-02-06 02:03:46.394397 : Matching Known Biases, 0.005 mins elapsed.
## ###########
## 2025-02-06 02:05:55.075573 : Completed Pairwise Tests, 2.15 mins elapsed.
## ###########
## ArchR logging successful to : ArchRLogs/ArchR-getMarkerFeatures-96be4b233-Date-2025-02-06_Time-02-03-45.434572.log

The object returned by the getMarkerFeatures() function is a SummarizedExperiment that contains a few different assays.

markerPeaks
## class: SummarizedExperiment 
## dim: 147407 11 
## metadata(2): MatchInfo Params
## assays(7): Log2FC Mean ... AUC MeanBGD
## rownames(147407): 1 2 ... 147406 147407
## rowData names(4): seqnames idx start end
## colnames(11): B CD4.M ... PreB Progenitor
## colData names(0):

We can use the getMarkers() function to retrieve particular slices of this SummarizedExperiment that we are interested in. The default behavior of this function is to return a list of DataFrame objects, one for each cell group.

markerList <- getMarkers(markerPeaks, cutOff = "FDR <= 0.01 & Log2FC >= 1")
markerList
## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor

If we are interested in the marker peaks for a specific cell group, we can access this from the list via the $ accessor.

markerList$Erythroid
## DataFrame with 2871 rows and 7 columns
##        seqnames       idx     start       end    Log2FC         FDR  MeanDiff
##           <Rle> <integer> <integer> <integer> <numeric>   <numeric> <numeric>
## 91169     chr22      1297  30129829  30130329   4.56041 3.24640e-16  1.090454
## 45636     chr15      1214  50552646  50553146   4.33461 6.58297e-14  0.882263
## 127421     chr7      3093  65392660  65393160   8.98553 3.73382e-13  1.017888
## 8954       chr1      8954 156864638 156865138   6.58465 3.83815e-13  0.978842
## 7130       chr1      7130 110407020 110407520   4.23204 4.88410e-13  0.859836
## ...         ...       ...       ...       ...       ...         ...       ...
## 139717     chr9      2044  86957967  86958467   6.22430   0.0099907  0.148421
## 141515     chr9      3842 119117633 119118133   6.59935   0.0099907  0.193083
## 143169     chr9      5496 136544169 136544669   6.08502   0.0099907  0.134577
## 146831     chrX      2966 133937841 133938341   6.17985   0.0099907  0.143857
## 146835     chrX      2970 133999808 134000308   6.15916   0.0099907  0.141780

Instead of a list of DataFrame objects, we can use getMarkers() to return a GRangesList object by setting returnGR = TRUE.

markerList <- getMarkers(markerPeaks, cutOff = "FDR <= 0.01 & Log2FC >= 1", returnGR = TRUE)
markerList
## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor

This GRangesList object can similarly be subset to a GRanges object for a particular cell group using the $ accessor.

markerList$Erythroid
## GRanges object with 2871 ranges and 3 metadata columns:
##          seqnames              ranges strand |    Log2FC         FDR  MeanDiff
##             <Rle>           <IRanges>  <Rle> | <numeric>   <numeric> <numeric>
##      [1]    chr22   30129829-30130329      * |   4.56041 3.24640e-16  1.090454
##      [2]    chr15   50552646-50553146      * |   4.33461 6.58297e-14  0.882263
##      [3]     chr7   65392660-65393160      * |   8.98553 3.73382e-13  1.017888
##      [4]     chr1 156864638-156865138      * |   6.58465 3.83815e-13  0.978842
##      [5]     chr1 110407020-110407520      * |   4.23204 4.88410e-13  0.859836
##      ...      ...                 ...    ... .       ...         ...       ...
##   [2867]     chr9   86957967-86958467      * |   6.22430   0.0099907  0.148421
##   [2868]     chr9 119117633-119118133      * |   6.59935   0.0099907  0.193083
##   [2869]     chr9 136544169-136544669      * |   6.08502   0.0099907  0.134577
##   [2870]     chrX 133937841-133938341      * |   6.17985   0.0099907  0.143857
##   [2871]     chrX 133999808-134000308      * |   6.15916   0.0099907  0.141780
##   -------
##   seqinfo: 23 sequences from an unspecified genome; no seqlengths