13.1 Identifying Marker Peaks with ArchR
Often times, we are interested to know which peaks are unique to an individual cluster or a small group of clusters. We can do this in an unsupervised fashion in ArchR using the addMarkerFeatures()
function in combination with useMatrix = "PeakMatrix"
.
First, lets remind ourselves of the cell types that we are working with in projHeme5
and their relative proportions.
table(projHeme5$Clusters2)
##
## B CD4.M CD4.N CLP Erythroid GMP Mono
## 421 633 1250 386 702 1261 2652
## NK pDC PreB Progenitor
## 903 314 351 1377
Now, we are ready to identify marker peaks by calling the addMarkerFeatures()
function with useMatrix = "PeakMatrix"
. Additionally, we tell ArchR to account for differences in data quality amongst the cell groups by setting the bias
parameter to account for TSS enrichment and the number of unique fragments per cell.
markerPeaks <- getMarkerFeatures(
ArchRProj = projHeme5,
useMatrix = "PeakMatrix",
groupBy = "Clusters2",
bias = c("TSSEnrichment", "log10(nFrags)"),
testMethod = "wilcoxon"
)
## ArchR logging to : ArchRLogs/ArchR-getMarkerFeatures-96be4b233-Date-2025-02-06_Time-02-03-45.434572.log
## If there is an issue, please report to github with logFile!
## MatrixClass = Sparse.Integer.Matrix
## 2025-02-06 02:03:46.394397 : Matching Known Biases, 0.005 mins elapsed.
## ###########
## 2025-02-06 02:05:55.075573 : Completed Pairwise Tests, 2.15 mins elapsed.
## ###########
## ArchR logging successful to : ArchRLogs/ArchR-getMarkerFeatures-96be4b233-Date-2025-02-06_Time-02-03-45.434572.log
The object returned by the getMarkerFeatures()
function is a SummarizedExperiment
that contains a few different assays
.
markerPeaks
## class: SummarizedExperiment
## dim: 147407 11
## metadata(2): MatchInfo Params
## assays(7): Log2FC Mean ... AUC MeanBGD
## rownames(147407): 1 2 ... 147406 147407
## rowData names(4): seqnames idx start end
## colnames(11): B CD4.M ... PreB Progenitor
## colData names(0):
We can use the getMarkers()
function to retrieve particular slices of this SummarizedExperiment
that we are interested in. The default behavior of this function is to return a list of DataFrame
objects, one for each cell group.
markerList <- getMarkers(markerPeaks, cutOff = "FDR <= 0.01 & Log2FC >= 1")
markerList
## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor
If we are interested in the marker peaks for a specific cell group, we can access this from the list via the $
accessor.
markerList$Erythroid
## DataFrame with 2871 rows and 7 columns
## seqnames idx start end Log2FC FDR MeanDiff
## <Rle> <integer> <integer> <integer> <numeric> <numeric> <numeric>
## 91169 chr22 1297 30129829 30130329 4.56041 3.24640e-16 1.090454
## 45636 chr15 1214 50552646 50553146 4.33461 6.58297e-14 0.882263
## 127421 chr7 3093 65392660 65393160 8.98553 3.73382e-13 1.017888
## 8954 chr1 8954 156864638 156865138 6.58465 3.83815e-13 0.978842
## 7130 chr1 7130 110407020 110407520 4.23204 4.88410e-13 0.859836
## ... ... ... ... ... ... ... ...
## 139717 chr9 2044 86957967 86958467 6.22430 0.0099907 0.148421
## 141515 chr9 3842 119117633 119118133 6.59935 0.0099907 0.193083
## 143169 chr9 5496 136544169 136544669 6.08502 0.0099907 0.134577
## 146831 chrX 2966 133937841 133938341 6.17985 0.0099907 0.143857
## 146835 chrX 2970 133999808 134000308 6.15916 0.0099907 0.141780
Instead of a list of DataFrame
objects, we can use getMarkers()
to return a GRangesList
object by setting returnGR = TRUE
.
markerList <- getMarkers(markerPeaks, cutOff = "FDR <= 0.01 & Log2FC >= 1", returnGR = TRUE)
markerList
## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor
This GRangesList
object can similarly be subset to a GRanges
object for a particular cell group using the $
accessor.
markerList$Erythroid
## GRanges object with 2871 ranges and 3 metadata columns:
## seqnames ranges strand | Log2FC FDR MeanDiff
## <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
## [1] chr22 30129829-30130329 * | 4.56041 3.24640e-16 1.090454
## [2] chr15 50552646-50553146 * | 4.33461 6.58297e-14 0.882263
## [3] chr7 65392660-65393160 * | 8.98553 3.73382e-13 1.017888
## [4] chr1 156864638-156865138 * | 6.58465 3.83815e-13 0.978842
## [5] chr1 110407020-110407520 * | 4.23204 4.88410e-13 0.859836
## ... ... ... ... . ... ... ...
## [2867] chr9 86957967-86958467 * | 6.22430 0.0099907 0.148421
## [2868] chr9 119117633-119118133 * | 6.59935 0.0099907 0.193083
## [2869] chr9 136544169-136544669 * | 6.08502 0.0099907 0.134577
## [2870] chrX 133937841-133938341 * | 6.17985 0.0099907 0.143857
## [2871] chrX 133999808-134000308 * | 6.15916 0.0099907 0.141780
## -------
## seqinfo: 23 sequences from an unspecified genome; no seqlengths