12.3 Calling Peaks w/ TileMatrix

As mentioned previously, ArchR also implements its own native peak caller but we do not recommend using it. While we have benchmarked this peak caller against MACS2 and note very similar performances, we do not recommend using this native peak caller unless absolutely necessary. It is no longer actively supported in the ArchR code base.

The ArchR native peak caller calls peaks on the 500-bp TileMatrix and we indicate to addReproduciblePeakSet() that we want to use this peak caller via the peakMethod parameter. Note that we are not storing the output into the projHeme4 object because we do not intend to keep this peak set and this analysis is only for illustrative purposes. Storage into the ArchRProject object would overwrite the previous reproducible peak set already stored in projHeme4.

projHemeTmp <- addReproduciblePeakSet(
    ArchRProj = projHeme4, 
    groupBy = "Clusters2",
    peakMethod = "Tiles",
    method = "p"
)
## ArchR logging to : ArchRLogs/ArchR-addReproduciblePeakSet-9286e8c5f-Date-2025-02-06_Time-01-58-48.55566.log
## If there is an issue, please report to github with logFile!
## Calling Peaks with TileMatrix. We recommend using the Macs2 Version.
## This method is still under development.
## Group Coverages Already Computed Returning Groups, Set force = TRUE to Recompute!
## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor
## 2025-02-06 01:58:49.348169 : Peak Calling Parameters!, 0.013 mins elapsed.
##                 Group nCells nCellsUsed nReplicates nMin nMax maxPeaks
## B                   B    421        416           2  162  254   150000
## CD4.M           CD4.M    633        586           2   86  500   150000
## CD4.N           CD4.N   1250        548           2   48  500   150000
## CLP               CLP    386        386           2   88  298   150000
## Erythroid   Erythroid    702        671           2  171  500   150000
## GMP               GMP   1261        810           2  310  500   150000
## Mono             Mono   2652       1000           2  500  500   150000
## NK                 NK    903        822           2  322  500   150000
## pDC               pDC    314        304           2  144  160   150000
## PreB             PreB    351        351           2   40  311   150000
## Progenitor Progenitor   1377        639           2  139  500   150000
## 2025-02-06 01:58:53.23755 : Computing Total Accessibility Across All Features, 0.078 mins elapsed.
## 2025-02-06 01:58:56.607124 : Computing Pseudo-Grouped Tile Matrix, 0.134 mins elapsed.
## 2025-02-06 01:59:26.047958 : Created Pseudo-Grouped Tile Matrix (0.471 GB), 0.625 mins elapsed.
## colSums = 825382colSums = 660560colSums = 3022271colSums = 323151colSums = 1406092colSums = 153415colSums = 1096638colSums = 276220colSums = 2290759colSums = 670817colSums = 2221092colSums = 2253776colSums = 2111952colSums = 2471475colSums = 3602112colSums = 1148728colSums = 772246colSums = 640088colSums = 1295679colSums = 131806colSums = 2058797colSums = 430702
## nTiles = 6072620
## Expectation = 0.13591859856207Expectation = 0.10877677180525Expectation = 0.497688147784646Expectation = 0.0532144280392977Expectation = 0.231546185995501Expectation = 0.0252633953713554Expectation = 0.180587291811442Expectation = 0.0454861328388735Expectation = 0.377227457011965Expectation = 0.11046582858799Expectation = 0.365755143578884Expectation = 0.371137334461896Expectation = 0.347782670412441Expectation = 0.406986605452012Expectation = 0.593172633887844Expectation = 0.189165137947048Expectation = 0.127168503874769Expectation = 0.10540557452961Expectation = 0.213364083377521Expectation = 0.0217049642493685Expectation = 0.339029446927356Expectation = 0.0709252349068442
## 2025-02-06 01:59:47.694666 : Creating Group Peak Sets with Annotations!, 0.986 mins elapsed.
## 2025-02-06 01:59:55.344699 : Creating Union Peak Set with Annotations!, 1.113 mins elapsed.
## Annotating Peaks : Nearest Gene
## Annotating Peaks : Gene
## Annotating Peaks : TSS
## Annotating Peaks : GC
## Plotting Ggplot!
## 2025-02-06 02:00:09.962552 : Finished Creating Union Peak Set (263147)!, 1.357 mins elapsed.

We can similarly examine this merged peak set.

getPeakSet(projHemeTmp)
## GRanges object with 263147 ranges and 10 metadata columns:
##            seqnames              ranges strand |   mlog10p      Group
##               <Rle>           <IRanges>  <Rle> | <numeric>      <Rle>
##        [1]     chr1       752500-752999      * |     6.311       Mono
##        [2]     chr1       757500-757999      * |     3.426      CD4.N
##        [3]     chr1       758500-758999      * |     1.177         NK
##        [4]     chr1       762000-762499      * |     1.479         NK
##        [5]     chr1       762500-762999      * |    25.802         NK
##        ...      ...                 ...    ... .       ...        ...
##   [263143]     chrX 154842000-154842499      * |     6.554        CLP
##   [263144]     chrX 154842500-154842999      * |     5.993       PreB
##   [263145]     chrX 154862000-154862499      * |     3.312 Progenitor
##   [263146]     chrX 154862500-154862999      * |     1.262         NK
##   [263147]     chrX 154997000-154997499      * |     2.034 Progenitor
##            distToGeneStart nearestGene    peakType distToTSS  nearestTSS
##                  <integer> <character> <character> <integer> <character>
##        [1]           10152   LINC00115      Distal     10152  uc010nxx.2
##        [2]            5152   LINC00115      Distal      5152  uc010nxx.2
##        [3]            4152   LINC00115      Distal      4152  uc010nxx.2
##        [4]             652   LINC00115    Promoter       652  uc010nxx.2
##        [5]             152   LINC00115    Promoter       152  uc010nxx.2
##        ...             ...         ...         ...       ...         ...
##   [263143]             372       TMLHE    Intronic       372  uc004fnn.3
##   [263144]             126       TMLHE    Promoter       126  uc004fnn.3
##   [263145]           19626       TMLHE      Distal     19626  uc004fnn.3
##   [263146]           20126       TMLHE      Distal     20126  uc004fnn.3
##   [263147]          154626       TMLHE      Distal       201  uc004fnq.1
##                   GC       idx         N
##            <numeric> <integer> <numeric>
##        [1]     0.484         1         0
##        [2]     0.552         2         0
##        [3]     0.560         3         0
##        [4]     0.574         4         0
##        [5]     0.684         5         0
##        ...       ...       ...       ...
##   [263143]     0.612      6175         0
##   [263144]     0.554      6176         0
##   [263145]     0.430      6177         0
##   [263146]     0.420      6178         0
##   [263147]     0.542      6179         0
##   -------
##   seqinfo: 23 sequences from an unspecified genome; no seqlengths

12.3.1 Comparing the two peak calling methods

To compare the merged peak set generated using MACS2 vs the merged peak set generated using the ArchR native TileMatrix peak caller, we can check the perfecent of overlapping peaks etc.

First, we check the percent of MACS2-called peaks that are overlapped by the TileMatrix-called peaks.

length(subsetByOverlaps(getPeakSet(projHeme4), getPeakSet(projHemeTmp))) / length(getPeakSet(projHeme4))
## [1] 0.9565828

Then, we check the converse - the percent of TileMatrix-called peaks that are overlapped by MACS2-called peaks. You can see that this overlap is not as strong.

length(subsetByOverlaps(getPeakSet(projHemeTmp), getPeakSet(projHeme4))) / length(getPeakSet(projHemeTmp))
## [1] 0.7852683

If we increase the margins of the peaks to be wider (1000-bp peaks instead of 500-bp peaks), the percent of MACS2-called peaks that are overlapped does not change much.

length(subsetByOverlaps(resize(getPeakSet(projHeme4), 1000, "center"), getPeakSet(projHemeTmp))) / length(getPeakSet(projHeme4))
## [1] 0.9619421

But the percent of TileMatrix-called peaks overlapped by MACS2 does increase.

length(subsetByOverlaps(getPeakSet(projHemeTmp), resize(getPeakSet(projHeme4), 1000, "center"))) / length(getPeakSet(projHemeTmp))
## [1] 0.8581781