4.3 Using demuxlet with ArchR

Doublets can also be identified (in human data) based on sample genotype. If you know the genotypes of your samples ahead of time (and even if you don’t), you can determine which droplets have multiple cells from different genetically distinct samples because those droplets would contain single-nucleotide polymorphisms that should not co-exist in a single sample. Of course, this requires you to mix multiple samples into an individual scATAC-seq reaction, which most labs do not routinely do. However, this “cell hashing” can be a good experimental workflow to reduce batch effects and costs. More justification for this can be found here.

ArchR provides easy import of the results from demuxlet via the addDemuxletResults() function. This function takes as input your ArchRProject object, a vector of .best files output by demuxlet, and a vector of sampleNames that is ordered to allow matching of your ArchRProject sample names with the demuxlet .best files provided. More explicitly, given the order of the .best files provided to the bestFiles parameter, you should create a properly ordered vector of sample names that corresponds to each .best file.

Because we do not have this type of analysis for the tutorial data, we do not showcase this functionality here. The final output of this function is an updated ArchRProject object and, like other “add-ers” in ArchR, must be stored into an ArchRProject object. This will add two columns to cellColData, one labeled DemuxletClassify which will show whether the cell is classified as a singlet, doublet, or ambiguous, and one labeled DemuxletBest which contains the output of demuxlet identifying the sample to which the given cell corresponds. Cells not classified by demuxlet would be labeled as “NotClassified” for these columns.

A hypothetical usage would look like this:

ArchRProj <- addDemuxletResults(ArchRProj = ArchRProj,
  bestFiles = c("myBestFile1.best", "myBestFile2.best"),
  sampleNames = c("Sample1", "Sample2"))