14.1 Motif Footprinting
Importantly, the footprints generated from this tutorial data are not as clean as would be desired but this is because of the small size of the tutorial dataset. Footprints generated from larger datasets would be expected to show even less variation.
When footprinting, the first thing we need to do is obtain the positions of the relevant motifs. To do this, we call the getPositions()
function. This function has an optional parameter called name
which can accept the name of the peakAnnotation
object from which we would like to obtain the positions. If name = NULL
, then ArchR will use the first entry in the peakAnnotation
slot. In the example shown below, we do not specify name
and ArchR uses the first entry which is our CIS-BP motifs.
This creates a GRangesList
object where each TF motif is represented by a separate GRanges
object.
## GRangesList object of length 870:
## $TFAP2B_1
## GRanges object with 16773 ranges and 1 metadata column:
## seqnames ranges strand | score
##|
## [1] chr1 852468-852479 + | 8.17731199746359
## [2] chr1 873916-873927 + | 8.32673820065588
## [3] chr1 873916-873927 - | 8.32673820065588
## [4] chr1 896671-896682 + | 9.96223327271814
## [5] chr1 896671-896682 - | 8.92408377606486
## … … … … . …
## [16769] chrX 153991101-153991112 + | 8.39549159740639
## [16770] chrX 154299568-154299579 + | 8.90119825654299
## [16771] chrX 154664929-154664940 - | 8.16690864294221
## [16772] chrX 154807684-154807695 + | 9.57636587154549
## [16773] chrX 154807684-154807695 - | 10.6117355833828
## ——-
## seqinfo: 23 sequences from an unspecified genome; no seqlengths
##
## …
## <869 more elements>
We can subset this GRangesList
to a few TF motifs that we are interested in. Because the SREBF1 TF comes up when we search for “EBF1”, we explicitly remove it from the downstream analyses below using the %ni%
helper function which provides the opposite functionality of %in%
from base R.
motifs <- c("GATA1", "CEBPA", "EBF1", "IRF4", "TBX21", "PAX5")
markerMotifs <- unlist(lapply(motifs, function(x) grep(x, names(motifPositions), value = TRUE)))
markerMotifs <- markerMotifs[markerMotifs %ni% "SREBF1_22"]
markerMotifs
## [1] “GATA1_383” “CEBPA_155” “EBF1_67” “IRF4_632” “TBX21_780” “PAX5_709”
To accurately profile TF footprints, a large number of reads are required. Therefore, cells are grouped to create pseudo-bulk ATAC-seq profiles that can be then used for TF footprinting. These pseudo-bulk profiles are stored as group coverage files which we originally created in a previous chapter to perform peak calling. If you haven’t already added group coverages to your ArchRProject
, lets do that now.
With group coverages calculated, we can now compute footprints for the subset of marker motifs that we previously selected using the getFootprints()
function. Even though ArchR implements a highly optimized footprinting workflow, it is recommended to perform footprinting on a subset of motifs rather than all motifs. As such, we provide the subset of motifs to footprint via the positions
parameter.
seFoot <- getFootprints(
ArchRProj = projHeme5,
positions = motifPositions[markerMotifs],
groupBy = "Clusters2"
)
## ArchR logging to : ArchRLogs/ArchR-getFootprints-10b7a1d11042e-Date-2020-04-15_Time-11-15-50.log
## If there is an issue, please report to github with logFile!
## 2020-04-15 11:15:51 : Computing Kmer Bias Table, 0.024 mins elapsed.
## 2020-04-15 11:16:13 : Finished Computing Kmer Tables, 0.361 mins elapsed.
## 2020-04-15 11:16:13 : Computing Footprints, 0.385 mins elapsed.
## 2020-04-15 11:16:22 : Computing Footprints Bias, 0.539 mins elapsed.
## 2020-04-15 11:16:28 : Summarizing Footprints, 0.642 mins elapsed.
Once we have retrieved these footprints, we can plot them using the plotFootprints()
function. This function can simultaneously normalize the footprints in various ways. This normalization and the actual plotting of the footprints is discussed in the next section.