16.2 Normalization of Footprints for Tn5 Bias

One major challenge with TF footprinting using ATAC-seq data is the insertion sequence bias of the Tn5 transposase which can lead to misclassification of TF footprints. To account for Tn5 insertion bias, ArchR identifies the k-mer (user-defined length, default length 6) sequences surrounding each Tn5 insertion site. To do this analysis, ArchR identifies single-base resolution Tn5 insertion sites for each pseudo-bulk, resizes these 1-bp sites to k-bp windows (-k/2 and + (k/2 - 1) bp from insertion), and then creates a k-mer frequency table using the oligonucleotidefrequency(w=k, simplify.as="collapse") function from the Biostrings package. ArchR then calculates the expected k-mers genome-wide using the same function with the BSgenome-associated genome file. To calculate the insertion bias for a pseudo-bulk footprint, ArchR creates a k-mer frequency matrix that is represented as all possible k-mers across a window +/- N bp (user-defined, default 250 bp) from the motif center. Then, iterating over each motif site, ArchR fills in the positioned k-mers into the k-mer frequency matrix. This is then calculated for each motif position genome-wide. Using the sample’s k-mer frequency table, ArchR can then compute the expected Tn5 insertions by multiplying the k-mer position frequency table by the observed/expected Tn5 k-mer frequency.

All of this happens under the hood within the plotFootprints() function.

16.2.1 Subtracting the Tn5 Bias

One normalization method subtracts the Tn5 bias from the footprinting signal. This normalization is performed by setting normMethod = "Subtract" when calling plotFootprints().

plotFootprints(
  seFoot = seFoot,
  ArchRProj = projHeme5, 
  normMethod = "Subtract",
  plotName = "Footprints-Subtract-Bias",
  addDOC = FALSE,
  smoothWindow = 5
)
## ArchR logging to : ArchRLogs/ArchR-plotFootprints-371b063162b59-Date-2022-12-23_Time-08-36-12.log
## If there is an issue, please report to github with logFile!
## 2022-12-23 08:36:13 : Plotting Footprint : GATA1_383 (1 of 6), 0.018 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Subtract
## 2022-12-23 08:36:14 : Plotting Footprint : CEBPA_155 (2 of 6), 0.047 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Subtract
## 2022-12-23 08:36:16 : Plotting Footprint : EBF1_67 (3 of 6), 0.078 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Subtract
## 2022-12-23 08:36:18 : Plotting Footprint : IRF4_632 (4 of 6), 0.108 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Subtract
## 2022-12-23 08:36:20 : Plotting Footprint : TBX21_780 (5 of 6), 0.138 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Subtract
## 2022-12-23 08:36:22 : Plotting Footprint : PAX5_709 (6 of 6), 0.168 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Subtract
## ArchR logging successful to : ArchRLogs/ArchR-plotFootprints-371b063162b59-Date-2022-12-23_Time-08-36-12.log

By default, these plots will be saved in the outputDirectory of the ArchRProject. If you requested to plot all motifs and returned this as a ggplot object, this ggplot object would be extremely large. An example of motif footprints from bias-subtracted analyses are shown below.

16.2.2 Dividing by the Tn5 Bias

A second strategy for normalization divides the footprinting signal by the Tn5 bias signal. This normalization is performed by setting normMethod = "Divide" when calling plotFootprints().

plotFootprints(
  seFoot = seFoot,
  ArchRProj = projHeme5, 
  normMethod = "Divide",
  plotName = "Footprints-Divide-Bias",
  addDOC = FALSE,
  smoothWindow = 5
)
## ArchR logging to : ArchRLogs/ArchR-plotFootprints-371b023a605d6-Date-2022-12-23_Time-08-36-29.log
## If there is an issue, please report to github with logFile!
## 2022-12-23 08:36:30 : Plotting Footprint : GATA1_383 (1 of 6), 0.018 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Divide
## 2022-12-23 08:36:32 : Plotting Footprint : CEBPA_155 (2 of 6), 0.052 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Divide
## 2022-12-23 08:36:34 : Plotting Footprint : EBF1_67 (3 of 6), 0.082 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Divide
## 2022-12-23 08:36:35 : Plotting Footprint : IRF4_632 (4 of 6), 0.113 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Divide
## 2022-12-23 08:36:37 : Plotting Footprint : TBX21_780 (5 of 6), 0.143 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Divide
## 2022-12-23 08:36:39 : Plotting Footprint : PAX5_709 (6 of 6), 0.173 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = Divide
## ArchR logging successful to : ArchRLogs/ArchR-plotFootprints-371b023a605d6-Date-2022-12-23_Time-08-36-29.log

An example of motif footprints from bias-divided analyses are shown below.

16.2.3 Footprinting Without Normalization for Tn5 Bias

While we highly recommend normalizing footprints for Tn5 sequence insertion bias, it is possible to perform footprinting without normalization by setting normMethod = "None" in the plotFootprints() function.

plotFootprints(
  seFoot = seFoot,
  ArchRProj = projHeme5, 
  normMethod = "None",
  plotName = "Footprints-No-Normalization",
  addDOC = FALSE,
  smoothWindow = 5
)
## ArchR logging to : ArchRLogs/ArchR-plotFootprints-371b02731658a-Date-2022-12-23_Time-08-36-46.log
## If there is an issue, please report to github with logFile!
## 2022-12-23 08:36:47 : Plotting Footprint : GATA1_383 (1 of 6), 0.018 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = None
## 2022-12-23 08:36:49 : Plotting Footprint : CEBPA_155 (2 of 6), 0.052 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = None
## 2022-12-23 08:36:51 : Plotting Footprint : EBF1_67 (3 of 6), 0.082 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = None
## 2022-12-23 08:36:53 : Plotting Footprint : IRF4_632 (4 of 6), 0.115 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = None
## 2022-12-23 08:36:55 : Plotting Footprint : TBX21_780 (5 of 6), 0.147 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = None
## 2022-12-23 08:36:57 : Plotting Footprint : PAX5_709 (6 of 6), 0.178 mins elapsed.
## Applying smoothing window to footprint
## Normalizing by flanking regions
## NormMethod = None
## ArchR logging successful to : ArchRLogs/ArchR-plotFootprints-371b02731658a-Date-2022-12-23_Time-08-36-46.log

An example of motif footprints without normalization are shown below.