1.1 A Brief Primer on ATAC-seq Terminology
The most fundamental component of any ATAC-seq experiment is a “fragment”. In ATAC-seq, a fragment refers to a sequenceable DNA molecule created by two transposition events. Each end of that fragment is sequenced using paired-end sequencing. The inferred single-base position of the start and end of the fragment is adjusted based on the insertion offset of Tn5. As reported previously, Tn5 transposase binds to DNA as a homodimer with 9-bp of DNA between the two Tn5 molecules. Because of this, each Tn5 homodimer binding event creates two insertions, separated by 9 bp. Thus, the actual central point of the “accessible” site is in the very center of the Tn5 dimer, not the location of each Tn5 insertion. To account for this, we apply an offset to the individual Tn5 insertions, adjusting plus-stranded insertion events by +4 bp and minus-stranded insertion events by -5 bp. This is consistent with the convention put forth during the original description of ATAC-seq. Thus, in ArchR, “fragments” refers to a table or genomic ranges object containing the chromosome, offset-adjusted single-base chromosome start position, offset-adjusted single-base chromosome end position, and unique cellular barcode ID corresponding to each sequenced fragment. Similarly, “insertions” refer to the offset-adjusted single-base position at the very center of an accessible site.