12.3 Calling Peaks w/ TileMatrix
As mentioned previously, ArchR also implements its own native peak caller but we do not recommend using it. While we have benchmarked this peak caller against MACS2 and note very similar performances, we do not recommend using this native peak caller unless absolutely necessary. It is no longer actively supported in the ArchR code base.
The ArchR native peak caller calls peaks on the 500-bp TileMatrix
and we indicate to addReproduciblePeakSet()
that we want to use this peak caller via the peakMethod
parameter. Note that we are not storing the output into the projHeme4
object because we do not intend to keep this peak set and this analysis is only for illustrative purposes. Storage into the ArchRProject
object would overwrite the previous reproducible peak set already stored in projHeme4
.
<- addReproduciblePeakSet(
projHemeTmp ArchRProj = projHeme4,
groupBy = "Clusters2",
peakMethod = "Tiles",
method = "p"
)## ArchR logging to : ArchRLogs/ArchR-addReproduciblePeakSet-371b013a62387-Date-2022-12-23_Time-07-32-46.log
## If there is an issue, please report to github with logFile!
## Calling Peaks with TileMatrix. We recommend using the Macs2 Version.
## This method is still under development.
## Group Coverages Already Computed Returning Groups, Set force = TRUE to Recompute!
## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor
## 2022-12-23 07:32:48 : Peak Calling Parameters!, 0.019 mins elapsed.
## Group nCells nCellsUsed nReplicates nMin nMax maxPeaks
## B B 428 423 2 166 257 150000
## CD4.M CD4.M 614 584 2 84 500 150000
## CD4.N CD4.N 1239 540 2 40 500 150000
## CLP CLP 383 383 2 87 296 150000
## Erythroid Erythroid 709 670 2 170 500 150000
## GMP GMP 1200 796 2 296 500 150000
## Mono Mono 2672 1000 2 500 500 150000
## NK NK 932 830 2 330 500 150000
## pDC pDC 315 305 2 151 154 150000
## PreB PreB 354 354 2 40 314 150000
## Progenitor Progenitor 1404 638 2 138 500 150000
## 2022-12-23 07:32:53 : Computing Total Accessibility Across All Features, 0.115 mins elapsed.
## 2022-12-23 07:32:56 : Computing Pseudo-Grouped Tile Matrix, 0.165 mins elapsed.
## 2022-12-23 07:33:20 : Created Pseudo-Grouped Tile Matrix (0.47 GB), 0.567 mins elapsed.
## colSums = 842305colSums = 708202colSums = 2919862colSums = 330893colSums = 1388486colSums = 129196colSums = 1084145colSums = 269343colSums = 2379311colSums = 666994colSums = 2190898colSums = 2114705colSums = 2031077colSums = 2497821colSums = 3576261colSums = 1227950colSums = 740585colSums = 658103colSums = 1301588colSums = 136583colSums = 2097209colSums = 411403
## nTiles = 6072620
## Expectation = 0.138705369346345Expectation = 0.116622149912229Expectation = 0.480824092401632Expectation = 0.054489330799556Expectation = 0.228646943164565Expectation = 0.0212751662379665Expectation = 0.178530024931578Expectation = 0.0443536727145779Expectation = 0.391809630768927Expectation = 0.109836281539105Expectation = 0.360782989879163Expectation = 0.348236016744008Expectation = 0.334464695633845Expectation = 0.411325095263659Expectation = 0.588915657492153Expectation = 0.202210907318423Expectation = 0.121954774051398Expectation = 0.108372168849689Expectation = 0.214337139488392Expectation = 0.0224916098817314Expectation = 0.345354888005507Expectation = 0.0677471997259832
## 2022-12-23 07:33:39 : Creating Group Peak Sets with Annotations!, 0.885 mins elapsed.
## 2022-12-23 07:33:51 : Creating Union Peak Set with Annotations!, 1.069 mins elapsed.
## Annotating Peaks : Nearest Gene
## Annotating Peaks : Gene
## Annotating Peaks : TSS
## Annotating Peaks : GC
## Plotting Ggplot!
## 2022-12-23 07:34:04 : Finished Creating Union Peak Set (261665)!, 1.289 mins elapsed.
We can similarly examine this merged peak set.
getPeakSet(projHemeTmp)
## GRanges object with 261665 ranges and 10 metadata columns:
## seqnames ranges strand | mlog10p Group
## <Rle> <IRanges> <Rle> | <numeric> <Rle>
## [1] chr1 752500-752999 * | 7.467 Mono
## [2] chr1 758500-758999 * | 0.897 NK
## [3] chr1 762000-762499 * | 1.954 NK
## [4] chr1 762500-762999 * | 24.668 NK
## [5] chr1 763000-763499 * | 7.089 NK
## ... ... ... ... . ... ...
## [261661] chrX 154842500-154842999 * | 5.862 PreB
## [261662] chrX 154862000-154862499 * | 1.277 B
## [261663] chrX 154862500-154862999 * | 0.793 NK
## [261664] chrX 154996000-154996499 * | 4.028 PreB
## [261665] chrX 154997000-154997499 * | 1.815 Progenitor
## distToGeneStart nearestGene peakType distToTSS nearestTSS
## <integer> <character> <character> <integer> <character>
## [1] 10152 LINC00115 Distal 10152 uc010nxx.2
## [2] 4152 LINC00115 Distal 4152 uc010nxx.2
## [3] 652 LINC00115 Promoter 652 uc010nxx.2
## [4] 152 LINC00115 Promoter 152 uc010nxx.2
## [5] 277 LINC01128 Promoter 70 uc031pjk.1
## ... ... ... ... ... ...
## [261661] 126 TMLHE Promoter 126 uc004fnn.3
## [261662] 19626 TMLHE Distal 19626 uc004fnn.3
## [261663] 20126 TMLHE Distal 20126 uc004fnn.3
## [261664] 153626 TMLHE Distal 1201 uc004fnq.1
## [261665] 154626 TMLHE Distal 201 uc004fnq.1
## GC idx N
## <numeric> <integer> <numeric>
## [1] 0.484 1 0
## [2] 0.560 2 0
## [3] 0.574 3 0
## [4] 0.684 4 0
## [5] 0.704 5 0
## ... ... ... ...
## [261661] 0.554 6252 0
## [261662] 0.430 6253 0
## [261663] 0.420 6254 0
## [261664] 0.392 6255 0
## [261665] 0.542 6256 0
## -------
## seqinfo: 23 sequences from an unspecified genome; no seqlengths
12.3.1 Comparing the two peak calling methods
To compare the merged peak set generated using MACS2 vs the merged peak set generated using the ArchR native TileMatrix
peak caller, we can check the perfecent of overlapping peaks etc.
First, we check the percent of MACS2-called peaks that are overlapped by the TileMatrix
-called peaks.
length(subsetByOverlaps(getPeakSet(projHeme4), getPeakSet(projHemeTmp))) / length(getPeakSet(projHeme4))
## [1] 0.958147
Then, we check the converse - the percent of TileMatrix
-called peaks that are overlapped by MACS2-called peaks. You can see that this overlap is not as strong.
length(subsetByOverlaps(getPeakSet(projHemeTmp), getPeakSet(projHeme4))) / length(getPeakSet(projHemeTmp))
## [1] 0.7689183
If we increase the margins of the peaks to be wider (1000-bp peaks instead of 500-bp peaks), the percent of MACS2-called peaks that are overlapped does not change much.
length(subsetByOverlaps(resize(getPeakSet(projHeme4), 1000, "center"), getPeakSet(projHemeTmp))) / length(getPeakSet(projHeme4))
## [1] 0.9636006
But the percent of TileMatrix
-called peaks overlapped by MACS2 does increase.
length(subsetByOverlaps(getPeakSet(projHemeTmp), resize(getPeakSet(projHeme4), 1000, "center"))) / length(getPeakSet(projHemeTmp))
## [1] 0.8446296