13.1 Identifying Marker Peaks with ArchR
Often times, we are interested to know which peaks are unique to an individual cluster or a small group of clusters. We can do this in an unsupervised fashion in ArchR using the addMarkerFeatures()
function in combination with useMatrix = "PeakMatrix"
.
First, lets remind ourselves of the cell types that we are working with in projHeme5
and their relative proportions.
table(projHeme5$Clusters2)
##
## B CD4.M CD4.N CLP Erythroid GMP Mono
## 428 614 1239 383 709 1200 2672
## NK pDC PreB Progenitor
## 932 315 354 1404
Now, we are ready to identify marker peaks by calling the addMarkerFeatures()
function with useMatrix = "PeakMatrix"
. Additionally, we tell ArchR to account for differences in data quality amongst the cell groups by setting the bias
parameter to account for TSS enrichment and the number of unique fragments per cell.
<- getMarkerFeatures(
markerPeaks ArchRProj = projHeme5,
useMatrix = "PeakMatrix",
groupBy = "Clusters2",
bias = c("TSSEnrichment", "log10(nFrags)"),
testMethod = "wilcoxon"
)## ArchR logging to : ArchRLogs/ArchR-getMarkerFeatures-371b070498788-Date-2022-12-23_Time-07-43-28.log
## If there is an issue, please report to github with logFile!
## MatrixClass = Sparse.Integer.Matrix
## 2022-12-23 07:43:30 : Matching Known Biases, 0.005 mins elapsed.
## ###########
## 2022-12-23 07:44:39 : Completed Pairwise Tests, 1.161 mins elapsed.
## ###########
## ArchR logging successful to : ArchRLogs/ArchR-getMarkerFeatures-371b070498788-Date-2022-12-23_Time-07-43-28.log
The object returned by the getMarkerFeatures()
function is a SummarizedExperiment
that contains a few different assays
.
markerPeaks## class: SummarizedExperiment
## dim: 142475 11
## metadata(2): MatchInfo Params
## assays(7): Log2FC Mean ... AUC MeanBGD
## rownames(142475): 1 2 ... 142474 142475
## rowData names(4): seqnames idx start end
## colnames(11): B CD4.M ... PreB Progenitor
## colData names(0):
We can use the getMarkers()
function to retrieve particular slices of this SummarizedExperiment
that we are interested in. The default behavior of this function is to return a list of DataFrame
objects, one for each cell group.
<- getMarkers(markerPeaks, cutOff = "FDR <= 0.01 & Log2FC >= 1")
markerList
markerList## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor
If we are interested in the marker peaks for a specific cell group, we can access this from the list via the $
accessor.
$Erythroid
markerList## DataFrame with 2913 rows and 7 columns
## seqnames idx start end Log2FC FDR MeanDiff
## <Rle> <integer> <integer> <integer> <numeric> <numeric> <numeric>
## 88068 chr22 1269 30129823 30130323 4.03035 2.16505e-15 1.136746
## 2686 chr1 2686 27869046 27869546 6.49248 1.37776e-14 1.042588
## 45848 chr15 2979 75986645 75987145 4.57935 1.53210e-14 0.890110
## 45672 chr15 2803 74902681 74903181 8.79924 1.50478e-13 0.897953
## 123083 chr7 2986 65392660 65393160 8.84562 2.15504e-13 0.927354
## ... ... ... ... ... ... ... ...
## 71569 chr2 1721 32817077 32817577 2.05638 0.00994914 0.226892
## 62970 chr18 2642 77279716 77280216 3.67312 0.00996505 0.163021
## 132301 chr8 5168 138915979 138916479 3.93326 0.00996505 0.255747
## 98007 chr3 7667 182791944 182792444 3.82732 0.00999030 0.159790
## 133672 chr9 683 27247710 27248210 3.62070 0.00999030 0.228739
Instead of a list of DataFrame
objects, we can use getMarkers()
to return a GRangesList
object by setting returnGR = TRUE
.
<- getMarkers(markerPeaks, cutOff = "FDR <= 0.01 & Log2FC >= 1", returnGR = TRUE)
markerList
markerList## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor
This GRangesList
object can similarly be subset to a GRanges
object for a particular cell group using the $
accessor.
$Erythroid
markerList## GRanges object with 2913 ranges and 3 metadata columns:
## seqnames ranges strand | Log2FC FDR MeanDiff
## <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
## [1] chr22 30129823-30130323 * | 4.03035 2.16505e-15 1.136746
## [2] chr1 27869046-27869546 * | 6.49248 1.37776e-14 1.042588
## [3] chr15 75986645-75987145 * | 4.57935 1.53210e-14 0.890110
## [4] chr15 74902681-74903181 * | 8.79924 1.50478e-13 0.897953
## [5] chr7 65392660-65393160 * | 8.84562 2.15504e-13 0.927354
## ... ... ... ... . ... ... ...
## [2909] chr2 32817077-32817577 * | 2.05638 0.00994914 0.226892
## [2910] chr18 77279716-77280216 * | 3.67312 0.00996505 0.163021
## [2911] chr8 138915979-138916479 * | 3.93326 0.00996505 0.255747
## [2912] chr3 182791944-182792444 * | 3.82732 0.00999030 0.159790
## [2913] chr9 27247710-27248210 * | 3.62070 0.00999030 0.228739
## -------
## seqinfo: 23 sequences from an unspecified genome; no seqlengths