11.2 Making Pseudo-bulk Replicates

In ArchR, pseudo-bulk replicates are made using the addGroupCoverages() function. The key parameter here is groupBy which defines the groups for which pseudo-bulk replicates should be made. Here, we are using Clusters2 which was defined by labeling our clusters with cell types defined by scRNA-seq data in a previous chapter.

projHeme4 <- addGroupCoverages(ArchRProj = projHeme3, groupBy = "Clusters2")
## ArchR logging to : ArchRLogs/ArchR-addGroupCoverages-371b07c38aa5d-Date-2022-12-23_Time-07-05-17.log
## If there is an issue, please report to github with logFile!
## B (1 of 11) : CellGroups N = 2
## CD4.M (2 of 11) : CellGroups N = 2
## CD4.N (3 of 11) : CellGroups N = 2
## CLP (4 of 11) : CellGroups N = 2
## Erythroid (5 of 11) : CellGroups N = 2
## GMP (6 of 11) : CellGroups N = 2
## Mono (7 of 11) : CellGroups N = 2
## NK (8 of 11) : CellGroups N = 2
## pDC (9 of 11) : CellGroups N = 2
## PreB (10 of 11) : CellGroups N = 2
## Progenitor (11 of 11) : CellGroups N = 2
## 2022-12-23 07:05:36 : Creating Coverage Files!, 0.317 mins elapsed.
## 2022-12-23 07:05:36 : Batch Execution w/ safelapply!, 0.317 mins elapsed.
## 2022-12-23 07:05:36 : Group B._.scATAC_BMMC_R1 (1 of 22) : Creating Group Coverage File : B._.scATAC_BMMC_R1.insertions.coverage.h5, 0.317 mins elapsed.
## Number of Cells = 257
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:06:18 : Group B._.scATAC_PBMC_R1 (2 of 22) : Creating Group Coverage File : B._.scATAC_PBMC_R1.insertions.coverage.h5, 1.02 mins elapsed.
## Number of Cells = 166
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:06:59 : Group CD4.M._.scATAC_PBMC_R1 (3 of 22) : Creating Group Coverage File : CD4.M._.scATAC_PBMC_R1.insertions.coverage.h5, 1.703 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:07:41 : Group CD4.M._.scATAC_BMMC_R1 (4 of 22) : Creating Group Coverage File : CD4.M._.scATAC_BMMC_R1.insertions.coverage.h5, 2.411 mins elapsed.
## Number of Cells = 84
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:08:22 : Group CD4.N._.scATAC_BMMC_R1 (5 of 22) : Creating Group Coverage File : CD4.N._.scATAC_BMMC_R1.insertions.coverage.h5, 3.095 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:09:04 : Group CD4.N._.Other (6 of 22) : Creating Group Coverage File : CD4.N._.Other.insertions.coverage.h5, 3.789 mins elapsed.
## Number of Cells = 40
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:09:45 : Group CLP._.scATAC_CD34_BMMC_R1 (7 of 22) : Creating Group Coverage File : CLP._.scATAC_CD34_BMMC_R1.insertions.coverage.h5, 4.469 mins elapsed.
## Number of Cells = 296
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:10:26 : Group CLP._.scATAC_BMMC_R1 (8 of 22) : Creating Group Coverage File : CLP._.scATAC_BMMC_R1.insertions.coverage.h5, 5.159 mins elapsed.
## Number of Cells = 87
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:11:07 : Group Erythroid._.scATAC_CD34_BMMC_R1 (9 of 22) : Creating Group Coverage File : Erythroid._.scATAC_CD34_BMMC_R1.insertions.coverage.h5, 5.842 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:11:49 : Group Erythroid._.scATAC_BMMC_R1 (10 of 22) : Creating Group Coverage File : Erythroid._.scATAC_BMMC_R1.insertions.coverage.h5, 6.546 mins elapsed.
## Number of Cells = 170
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:12:31 : Group GMP._.scATAC_CD34_BMMC_R1 (11 of 22) : Creating Group Coverage File : GMP._.scATAC_CD34_BMMC_R1.insertions.coverage.h5, 7.232 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:13:13 : Group GMP._.scATAC_BMMC_R1 (12 of 22) : Creating Group Coverage File : GMP._.scATAC_BMMC_R1.insertions.coverage.h5, 7.933 mins elapsed.
## Number of Cells = 296
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:13:55 : Group Mono._.scATAC_PBMC_R1 (13 of 22) : Creating Group Coverage File : Mono._.scATAC_PBMC_R1.insertions.coverage.h5, 8.632 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:14:36 : Group Mono._.scATAC_BMMC_R1 (14 of 22) : Creating Group Coverage File : Mono._.scATAC_BMMC_R1.insertions.coverage.h5, 9.328 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:15:19 : Group NK._.scATAC_PBMC_R1 (15 of 22) : Creating Group Coverage File : NK._.scATAC_PBMC_R1.insertions.coverage.h5, 10.032 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:16:02 : Group NK._.scATAC_BMMC_R1 (16 of 22) : Creating Group Coverage File : NK._.scATAC_BMMC_R1.insertions.coverage.h5, 10.747 mins elapsed.
## Number of Cells = 330
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:16:43 : Group pDC._.scATAC_BMMC_R1 (17 of 22) : Creating Group Coverage File : pDC._.scATAC_BMMC_R1.insertions.coverage.h5, 11.441 mins elapsed.
## Number of Cells = 154
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:17:24 : Group pDC._.scATAC_CD34_BMMC_R1 (18 of 22) : Creating Group Coverage File : pDC._.scATAC_CD34_BMMC_R1.insertions.coverage.h5, 12.127 mins elapsed.
## Number of Cells = 151
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:18:05 : Group PreB._.Rep1 (19 of 22) : Creating Group Coverage File : PreB._.Rep1.insertions.coverage.h5, 12.809 mins elapsed.
## Number of Cells = 314
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:18:47 : Group PreB._.Rep2 (20 of 22) : Creating Group Coverage File : PreB._.Rep2.insertions.coverage.h5, 13.498 mins elapsed.
## Number of Cells = 40
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:19:27 : Group Progenitor._.scATAC_CD34_BMMC_R1 (21 of 22) : Creating Group Coverage File : Progenitor._.scATAC_CD34_BMMC_R1.insertions.coverage.h5, 14.178 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:20:09 : Group Progenitor._.scATAC_BMMC_R1 (22 of 22) : Creating Group Coverage File : Progenitor._.scATAC_BMMC_R1.insertions.coverage.h5, 14.875 mins elapsed.
## Number of Cells = 138
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:20:50 : Adding Kmer Bias to Coverage Files!, 15.556 mins elapsed.
## Completed Kmer Bias Calculation
## Adding Kmer Bias (1 of 22)
## Adding Kmer Bias (2 of 22)
## Adding Kmer Bias (3 of 22)
## Adding Kmer Bias (4 of 22)
## Adding Kmer Bias (5 of 22)
## Adding Kmer Bias (6 of 22)
## Adding Kmer Bias (7 of 22)
## Adding Kmer Bias (8 of 22)
## Adding Kmer Bias (9 of 22)
## Adding Kmer Bias (10 of 22)
## Adding Kmer Bias (11 of 22)
## Adding Kmer Bias (12 of 22)
## Adding Kmer Bias (13 of 22)
## Adding Kmer Bias (14 of 22)
## Adding Kmer Bias (15 of 22)
## Adding Kmer Bias (16 of 22)
## Adding Kmer Bias (17 of 22)
## Adding Kmer Bias (18 of 22)
## Adding Kmer Bias (19 of 22)
## Adding Kmer Bias (20 of 22)
## Adding Kmer Bias (21 of 22)
## Adding Kmer Bias (22 of 22)
## 2022-12-23 07:22:29 : Finished Creation of Coverage Files!, 17.208 mins elapsed.
## ArchR logging successful to : ArchRLogs/ArchR-addGroupCoverages-371b07c38aa5d-Date-2022-12-23_Time-07-05-17.log

Once we have run addGroupCoverages(), this creates an entry in the project metadata of the ArchRProject, including a file path to the HDF5-format coverage object that is stored on disk.

projHeme4@projectMetadata$GroupCoverages$Clusters2$coverageMetadata
## DataFrame with 22 rows and 5 columns
##           Group                   Name                   File    nCells
##     <character>            <character>            <character> <integer>
## 1             B     B._.scATAC_BMMC_R1 /corces/home/rcorces..       257
## 2             B     B._.scATAC_PBMC_R1 /corces/home/rcorces..       166
## 3         CD4.M CD4.M._.scATAC_PBMC_R1 /corces/home/rcorces..       500
## 4         CD4.M CD4.M._.scATAC_BMMC_R1 /corces/home/rcorces..        84
## 5         CD4.N CD4.N._.scATAC_BMMC_R1 /corces/home/rcorces..       500
## ...         ...                    ...                    ...       ...
## 18          pDC pDC._.scATAC_CD34_BM.. /corces/home/rcorces..       151
## 19         PreB            PreB._.Rep1 /corces/home/rcorces..       314
## 20         PreB            PreB._.Rep2 /corces/home/rcorces..        40
## 21   Progenitor Progenitor._.scATAC_.. /corces/home/rcorces..       500
## 22   Progenitor Progenitor._.scATAC_.. /corces/home/rcorces..       138
##     nInsertions
##       <numeric>
## 1       1291546
## 2       1130972
## 3       4757644
## 4        511298
## 5       2129280
## ...         ...
## 18      1108616
## 19      1995046
## 20       208888
## 21      3526630
## 22       618374

We can also obtain the actual cell assignments for the pseudo-bulk replicates by calling addGroupCoverages() a second time using the returnGroups = TRUE. The returned object is a list of lists where each sub-list contains the replicates for a given cell group.

groups <- addGroupCoverages(ArchRProj = projHeme3, groupBy = "Clusters2", returnGroups = TRUE)
## ArchR logging to : ArchRLogs/ArchR-addGroupCoverages-371b0d3396f1-Date-2022-12-23_Time-07-22-29.log
## If there is an issue, please report to github with logFile!
## B (1 of 11) : CellGroups N = 2
## CD4.M (2 of 11) : CellGroups N = 2
## CD4.N (3 of 11) : CellGroups N = 2
## CLP (4 of 11) : CellGroups N = 2
## Erythroid (5 of 11) : CellGroups N = 2
## GMP (6 of 11) : CellGroups N = 2
## Mono (7 of 11) : CellGroups N = 2
## NK (8 of 11) : CellGroups N = 2
## pDC (9 of 11) : CellGroups N = 2
## PreB (10 of 11) : CellGroups N = 2
## Progenitor (11 of 11) : CellGroups N = 2
groups
## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor

If we looked at one of the sub-lists, we would see the cell names for each cell that was used to make up the given pseudo-bulk replicate.

groups$B
## CharacterList of length 2
## [["scATAC_BMMC_R1"]] scATAC_BMMC_R1#TTATGTCAGTGATTAG-1 ...
## [["scATAC_PBMC_R1"]] scATAC_PBMC_R1#TGAGTCAGTACTTGAC-1 ...

With these pseudo-bulk replicates generated, we can now call peaks in our data. As mentioned previously, we do not want to call peaks on the merged set of all single cells so having these more granular cell groups defined, either through clustering or otherwise, provides the ideal starting point for peak calling.