12.3 Calling Peaks w/ TileMatrix
As mentioned previously, ArchR also implements its own native peak caller but we do not recommend using it. While we have benchmarked this peak caller against MACS2 and note very similar performances, we do not recommend using this native peak caller unless absolutely necessary. It is no longer actively supported in the ArchR code base.
The ArchR native peak caller calls peaks on the 500-bp TileMatrix
and we indicate to addReproduciblePeakSet()
that we want to use this peak caller via the peakMethod
parameter. Note that we are not storing the output into the projHeme4
object because we do not intend to keep this peak set and this analysis is only for illustrative purposes. Storage into the ArchRProject
object would overwrite the previous reproducible peak set already stored in projHeme4
.
projHemeTmp <- addReproduciblePeakSet(
ArchRProj = projHeme4,
groupBy = "Clusters2",
peakMethod = "Tiles",
method = "p"
)
## ArchR logging to : ArchRLogs/ArchR-addReproduciblePeakSet-9286e8c5f-Date-2025-02-06_Time-01-58-48.55566.log
## If there is an issue, please report to github with logFile!
## Calling Peaks with TileMatrix. We recommend using the Macs2 Version.
## This method is still under development.
## Group Coverages Already Computed Returning Groups, Set force = TRUE to Recompute!
## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor
## 2025-02-06 01:58:49.348169 : Peak Calling Parameters!, 0.013 mins elapsed.
## Group nCells nCellsUsed nReplicates nMin nMax maxPeaks
## B B 421 416 2 162 254 150000
## CD4.M CD4.M 633 586 2 86 500 150000
## CD4.N CD4.N 1250 548 2 48 500 150000
## CLP CLP 386 386 2 88 298 150000
## Erythroid Erythroid 702 671 2 171 500 150000
## GMP GMP 1261 810 2 310 500 150000
## Mono Mono 2652 1000 2 500 500 150000
## NK NK 903 822 2 322 500 150000
## pDC pDC 314 304 2 144 160 150000
## PreB PreB 351 351 2 40 311 150000
## Progenitor Progenitor 1377 639 2 139 500 150000
## 2025-02-06 01:58:53.23755 : Computing Total Accessibility Across All Features, 0.078 mins elapsed.
## 2025-02-06 01:58:56.607124 : Computing Pseudo-Grouped Tile Matrix, 0.134 mins elapsed.
## 2025-02-06 01:59:26.047958 : Created Pseudo-Grouped Tile Matrix (0.471 GB), 0.625 mins elapsed.
## colSums = 825382colSums = 660560colSums = 3022271colSums = 323151colSums = 1406092colSums = 153415colSums = 1096638colSums = 276220colSums = 2290759colSums = 670817colSums = 2221092colSums = 2253776colSums = 2111952colSums = 2471475colSums = 3602112colSums = 1148728colSums = 772246colSums = 640088colSums = 1295679colSums = 131806colSums = 2058797colSums = 430702
## nTiles = 6072620
## Expectation = 0.13591859856207Expectation = 0.10877677180525Expectation = 0.497688147784646Expectation = 0.0532144280392977Expectation = 0.231546185995501Expectation = 0.0252633953713554Expectation = 0.180587291811442Expectation = 0.0454861328388735Expectation = 0.377227457011965Expectation = 0.11046582858799Expectation = 0.365755143578884Expectation = 0.371137334461896Expectation = 0.347782670412441Expectation = 0.406986605452012Expectation = 0.593172633887844Expectation = 0.189165137947048Expectation = 0.127168503874769Expectation = 0.10540557452961Expectation = 0.213364083377521Expectation = 0.0217049642493685Expectation = 0.339029446927356Expectation = 0.0709252349068442
## 2025-02-06 01:59:47.694666 : Creating Group Peak Sets with Annotations!, 0.986 mins elapsed.
## 2025-02-06 01:59:55.344699 : Creating Union Peak Set with Annotations!, 1.113 mins elapsed.
## Annotating Peaks : Nearest Gene
## Annotating Peaks : Gene
## Annotating Peaks : TSS
## Annotating Peaks : GC
## Plotting Ggplot!
## 2025-02-06 02:00:09.962552 : Finished Creating Union Peak Set (263147)!, 1.357 mins elapsed.
We can similarly examine this merged peak set.
getPeakSet(projHemeTmp)
## GRanges object with 263147 ranges and 10 metadata columns:
## seqnames ranges strand | mlog10p Group
## <Rle> <IRanges> <Rle> | <numeric> <Rle>
## [1] chr1 752500-752999 * | 6.311 Mono
## [2] chr1 757500-757999 * | 3.426 CD4.N
## [3] chr1 758500-758999 * | 1.177 NK
## [4] chr1 762000-762499 * | 1.479 NK
## [5] chr1 762500-762999 * | 25.802 NK
## ... ... ... ... . ... ...
## [263143] chrX 154842000-154842499 * | 6.554 CLP
## [263144] chrX 154842500-154842999 * | 5.993 PreB
## [263145] chrX 154862000-154862499 * | 3.312 Progenitor
## [263146] chrX 154862500-154862999 * | 1.262 NK
## [263147] chrX 154997000-154997499 * | 2.034 Progenitor
## distToGeneStart nearestGene peakType distToTSS nearestTSS
## <integer> <character> <character> <integer> <character>
## [1] 10152 LINC00115 Distal 10152 uc010nxx.2
## [2] 5152 LINC00115 Distal 5152 uc010nxx.2
## [3] 4152 LINC00115 Distal 4152 uc010nxx.2
## [4] 652 LINC00115 Promoter 652 uc010nxx.2
## [5] 152 LINC00115 Promoter 152 uc010nxx.2
## ... ... ... ... ... ...
## [263143] 372 TMLHE Intronic 372 uc004fnn.3
## [263144] 126 TMLHE Promoter 126 uc004fnn.3
## [263145] 19626 TMLHE Distal 19626 uc004fnn.3
## [263146] 20126 TMLHE Distal 20126 uc004fnn.3
## [263147] 154626 TMLHE Distal 201 uc004fnq.1
## GC idx N
## <numeric> <integer> <numeric>
## [1] 0.484 1 0
## [2] 0.552 2 0
## [3] 0.560 3 0
## [4] 0.574 4 0
## [5] 0.684 5 0
## ... ... ... ...
## [263143] 0.612 6175 0
## [263144] 0.554 6176 0
## [263145] 0.430 6177 0
## [263146] 0.420 6178 0
## [263147] 0.542 6179 0
## -------
## seqinfo: 23 sequences from an unspecified genome; no seqlengths
12.3.1 Comparing the two peak calling methods
To compare the merged peak set generated using MACS2 vs the merged peak set generated using the ArchR native TileMatrix
peak caller, we can check the perfecent of overlapping peaks etc.
First, we check the percent of MACS2-called peaks that are overlapped by the TileMatrix
-called peaks.
length(subsetByOverlaps(getPeakSet(projHeme4), getPeakSet(projHemeTmp))) / length(getPeakSet(projHeme4))
## [1] 0.9565828
Then, we check the converse - the percent of TileMatrix
-called peaks that are overlapped by MACS2-called peaks. You can see that this overlap is not as strong.
length(subsetByOverlaps(getPeakSet(projHemeTmp), getPeakSet(projHeme4))) / length(getPeakSet(projHemeTmp))
## [1] 0.7852683
If we increase the margins of the peaks to be wider (1000-bp peaks instead of 500-bp peaks), the percent of MACS2-called peaks that are overlapped does not change much.
length(subsetByOverlaps(resize(getPeakSet(projHeme4), 1000, "center"), getPeakSet(projHemeTmp))) / length(getPeakSet(projHeme4))
## [1] 0.9619421
But the percent of TileMatrix
-called peaks overlapped by MACS2 does increase.