3.6 Filtering Doublets from an ArchRProject
After we have added information on the predicted doublets using addDoubletScores()
, we can remove these predicted doublets using filterDoublets()
. One of the key elements of this filtering step is the filterRatio
which is the maximum ratio of predicted doublets to filter based on the number of pass-filter cells. For example, if there are 5000 cells, the maximum number of filtered predicted doublets would be filterRatio * 5000^2 / (100000)
(which simplifies to filterRatio * 5000 * 0.05
). This filterRatio
allows you to apply a consistent filter across multiple different samples that may have different percentages of doublets because they were run with different cell loading concentrations. The higher the filterRatio
, the greater the number of cells potentially removed as doublets.
First, we filter the doublets. We save this as a new ArchRProject
for the purposes of this stepwise tutorial but you can always overwrite your original ArchRProject
object.
## Filtering 410 cells from ArchRProject!
## scATAC_BMMC_R1 : 243 of 4932 (4.9%)
## scATAC_CD34_BMMC_R1 : 107 of 3275 (3.3%)
## scATAC_PBMC_R1 : 60 of 2454 (2.4%)
Previously, we saw that projHeme1
had 10,661 cells. Now, we see that projHeme2
has 10,251 cells, indicating that 410 cells (3.85%) were removed by doublet filtration as indicated above.
## class: ArchRProject
## outputDirectory: /oak/stanford/groups/howchang/users/jgranja/ArchRTutorial/ArchRBook/BookOutput4/HemeTutorial
## samples(3): scATAC_BMMC_R1 scATAC_CD34_BMMC_R1 scATAC_PBMC_R1
## sampleColData names(1): ArrowFiles
## cellColData names(13): Sample TSSEnrichment … bioNames bioNames2
## numberOfCells(1): 10251
## medianTSS(1): 16.856
## medianFrags(1): 2991
If you wanted to filter more cells from the ArchR Project, you would use a higher filterRatio
. To see additional arguments that can be tweaked, try ?filterDoublets
.
## Filtering 614 cells from ArchRProject!
## scATAC_BMMC_R1 : 364 of 4932 (7.4%)
## scATAC_CD34_BMMC_R1 : 160 of 3275 (4.9%)
## scATAC_PBMC_R1 : 90 of 2454 (3.7%)
Since projHemeTmp
was only created for illustrative purposes, we remove it from our R session.