12.3 Calling Peaks w/ TileMatrix

As mentioned previously, ArchR also implements its own native peak caller but we do not recommend using it. While we have benchmarked this peak caller against MACS2 and note very similar performances, we do not recommend using this native peak caller unless absolutely necessary. It is no longer actively supported in the ArchR code base.

The ArchR native peak caller calls peaks on the 500-bp TileMatrix and we indicate to addReproduciblePeakSet() that we want to use this peak caller via the peakMethod parameter. Note that we are not storing the output into the projHeme4 object because we do not intend to keep this peak set and this analysis is only for illustrative purposes. Storage into the ArchRProject object would overwrite the previous reproducible peak set already stored in projHeme4.

projHemeTmp <- addReproduciblePeakSet(
    ArchRProj = projHeme4, 
    groupBy = "Clusters2",
    peakMethod = "Tiles",
    method = "p"
)
## ArchR logging to : ArchRLogs/ArchR-addReproduciblePeakSet-371b013a62387-Date-2022-12-23_Time-07-32-46.log
## If there is an issue, please report to github with logFile!
## Calling Peaks with TileMatrix. We recommend using the Macs2 Version.
## This method is still under development.
## Group Coverages Already Computed Returning Groups, Set force = TRUE to Recompute!
## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor
## 2022-12-23 07:32:48 : Peak Calling Parameters!, 0.019 mins elapsed.
##                 Group nCells nCellsUsed nReplicates nMin nMax maxPeaks
## B                   B    428        423           2  166  257   150000
## CD4.M           CD4.M    614        584           2   84  500   150000
## CD4.N           CD4.N   1239        540           2   40  500   150000
## CLP               CLP    383        383           2   87  296   150000
## Erythroid   Erythroid    709        670           2  170  500   150000
## GMP               GMP   1200        796           2  296  500   150000
## Mono             Mono   2672       1000           2  500  500   150000
## NK                 NK    932        830           2  330  500   150000
## pDC               pDC    315        305           2  151  154   150000
## PreB             PreB    354        354           2   40  314   150000
## Progenitor Progenitor   1404        638           2  138  500   150000
## 2022-12-23 07:32:53 : Computing Total Accessibility Across All Features, 0.115 mins elapsed.
## 2022-12-23 07:32:56 : Computing Pseudo-Grouped Tile Matrix, 0.165 mins elapsed.
## 2022-12-23 07:33:20 : Created Pseudo-Grouped Tile Matrix (0.47 GB), 0.567 mins elapsed.
## colSums = 842305colSums = 708202colSums = 2919862colSums = 330893colSums = 1388486colSums = 129196colSums = 1084145colSums = 269343colSums = 2379311colSums = 666994colSums = 2190898colSums = 2114705colSums = 2031077colSums = 2497821colSums = 3576261colSums = 1227950colSums = 740585colSums = 658103colSums = 1301588colSums = 136583colSums = 2097209colSums = 411403
## nTiles = 6072620
## Expectation = 0.138705369346345Expectation = 0.116622149912229Expectation = 0.480824092401632Expectation = 0.054489330799556Expectation = 0.228646943164565Expectation = 0.0212751662379665Expectation = 0.178530024931578Expectation = 0.0443536727145779Expectation = 0.391809630768927Expectation = 0.109836281539105Expectation = 0.360782989879163Expectation = 0.348236016744008Expectation = 0.334464695633845Expectation = 0.411325095263659Expectation = 0.588915657492153Expectation = 0.202210907318423Expectation = 0.121954774051398Expectation = 0.108372168849689Expectation = 0.214337139488392Expectation = 0.0224916098817314Expectation = 0.345354888005507Expectation = 0.0677471997259832
## 2022-12-23 07:33:39 : Creating Group Peak Sets with Annotations!, 0.885 mins elapsed.
## 2022-12-23 07:33:51 : Creating Union Peak Set with Annotations!, 1.069 mins elapsed.
## Annotating Peaks : Nearest Gene
## Annotating Peaks : Gene
## Annotating Peaks : TSS
## Annotating Peaks : GC
## Plotting Ggplot!
## 2022-12-23 07:34:04 : Finished Creating Union Peak Set (261665)!, 1.289 mins elapsed.

We can similarly examine this merged peak set.

getPeakSet(projHemeTmp)
## GRanges object with 261665 ranges and 10 metadata columns:
##            seqnames              ranges strand |   mlog10p      Group
##               <Rle>           <IRanges>  <Rle> | <numeric>      <Rle>
##        [1]     chr1       752500-752999      * |     7.467       Mono
##        [2]     chr1       758500-758999      * |     0.897         NK
##        [3]     chr1       762000-762499      * |     1.954         NK
##        [4]     chr1       762500-762999      * |    24.668         NK
##        [5]     chr1       763000-763499      * |     7.089         NK
##        ...      ...                 ...    ... .       ...        ...
##   [261661]     chrX 154842500-154842999      * |     5.862       PreB
##   [261662]     chrX 154862000-154862499      * |     1.277          B
##   [261663]     chrX 154862500-154862999      * |     0.793         NK
##   [261664]     chrX 154996000-154996499      * |     4.028       PreB
##   [261665]     chrX 154997000-154997499      * |     1.815 Progenitor
##            distToGeneStart nearestGene    peakType distToTSS  nearestTSS
##                  <integer> <character> <character> <integer> <character>
##        [1]           10152   LINC00115      Distal     10152  uc010nxx.2
##        [2]            4152   LINC00115      Distal      4152  uc010nxx.2
##        [3]             652   LINC00115    Promoter       652  uc010nxx.2
##        [4]             152   LINC00115    Promoter       152  uc010nxx.2
##        [5]             277   LINC01128    Promoter        70  uc031pjk.1
##        ...             ...         ...         ...       ...         ...
##   [261661]             126       TMLHE    Promoter       126  uc004fnn.3
##   [261662]           19626       TMLHE      Distal     19626  uc004fnn.3
##   [261663]           20126       TMLHE      Distal     20126  uc004fnn.3
##   [261664]          153626       TMLHE      Distal      1201  uc004fnq.1
##   [261665]          154626       TMLHE      Distal       201  uc004fnq.1
##                   GC       idx         N
##            <numeric> <integer> <numeric>
##        [1]     0.484         1         0
##        [2]     0.560         2         0
##        [3]     0.574         3         0
##        [4]     0.684         4         0
##        [5]     0.704         5         0
##        ...       ...       ...       ...
##   [261661]     0.554      6252         0
##   [261662]     0.430      6253         0
##   [261663]     0.420      6254         0
##   [261664]     0.392      6255         0
##   [261665]     0.542      6256         0
##   -------
##   seqinfo: 23 sequences from an unspecified genome; no seqlengths

12.3.1 Comparing the two peak calling methods

To compare the merged peak set generated using MACS2 vs the merged peak set generated using the ArchR native TileMatrix peak caller, we can check the perfecent of overlapping peaks etc.

First, we check the percent of MACS2-called peaks that are overlapped by the TileMatrix-called peaks.

length(subsetByOverlaps(getPeakSet(projHeme4), getPeakSet(projHemeTmp))) / length(getPeakSet(projHeme4))
## [1] 0.958147

Then, we check the converse - the percent of TileMatrix-called peaks that are overlapped by MACS2-called peaks. You can see that this overlap is not as strong.

length(subsetByOverlaps(getPeakSet(projHemeTmp), getPeakSet(projHeme4))) / length(getPeakSet(projHemeTmp))
## [1] 0.7689183

If we increase the margins of the peaks to be wider (1000-bp peaks instead of 500-bp peaks), the percent of MACS2-called peaks that are overlapped does not change much.

length(subsetByOverlaps(resize(getPeakSet(projHeme4), 1000, "center"), getPeakSet(projHemeTmp))) / length(getPeakSet(projHeme4))
## [1] 0.9636006

But the percent of TileMatrix-called peaks overlapped by MACS2 does increase.

length(subsetByOverlaps(getPeakSet(projHemeTmp), resize(getPeakSet(projHeme4), 1000, "center"))) / length(getPeakSet(projHemeTmp))
## [1] 0.8446296