4.3 Using demuxlet with ArchR
Doublets can also be identified (in human data) based on sample genotype. If you know the genotypes of your samples ahead of time (and even if you don’t), you can determine which droplets have multiple cells from different genetically distinct samples because those droplets would contain single-nucleotide polymorphisms that should not co-exist in a single sample. Of course, this requires you to mix multiple samples into an individual scATAC-seq reaction, which most labs do not routinely do. However, this “cell hashing” can be a good experimental workflow to reduce batch effects and costs. More justification for this can be found here.
ArchR provides easy import of the results from demuxlet via the addDemuxletResults()
function. This function takes as input your ArchRProject
object, a vector of .best
files output by demuxlet, and a vector of sampleNames
that is ordered to allow matching of your ArchRProject
sample names with the demuxlet .best
files provided. More explicitly, given the order of the .best
files provided to the bestFiles
parameter, you should create a properly ordered vector of sample names that corresponds to each .best
file.
Because we do not have this type of analysis for the tutorial data, we do not showcase this functionality here. The final output of this function is an updated ArchRProject
object and, like other “add-ers” in ArchR, must be stored into an ArchRProject
object. This will add two columns to cellColData
, one labeled DemuxletClassify
which will show whether the cell is classified as a singlet, doublet, or ambiguous, and one labeled DemuxletBest
which contains the output of demuxlet identifying the sample to which the given cell corresponds. Cells not classified by demuxlet would be labeled as “NotClassified” for these columns.
A hypothetical usage would look like this:
<- addDemuxletResults(ArchRProj = ArchRProj,
ArchRProj bestFiles = c("myBestFile1.best", "myBestFile2.best"),
sampleNames = c("Sample1", "Sample2"))