11.2 Making Pseudo-bulk Replicates
In ArchR, pseudo-bulk replicates are made using the addGroupCoverages()
function. The key parameter here is groupBy
which defines the groups for which pseudo-bulk replicates should be made. Here, we are using Clusters2
which was defined by labeling our clusters with cell types defined by scRNA-seq data in a previous chapter.
<- addGroupCoverages(ArchRProj = projHeme3, groupBy = "Clusters2")
projHeme4 ## ArchR logging to : ArchRLogs/ArchR-addGroupCoverages-371b07c38aa5d-Date-2022-12-23_Time-07-05-17.log
## If there is an issue, please report to github with logFile!
## B (1 of 11) : CellGroups N = 2
## CD4.M (2 of 11) : CellGroups N = 2
## CD4.N (3 of 11) : CellGroups N = 2
## CLP (4 of 11) : CellGroups N = 2
## Erythroid (5 of 11) : CellGroups N = 2
## GMP (6 of 11) : CellGroups N = 2
## Mono (7 of 11) : CellGroups N = 2
## NK (8 of 11) : CellGroups N = 2
## pDC (9 of 11) : CellGroups N = 2
## PreB (10 of 11) : CellGroups N = 2
## Progenitor (11 of 11) : CellGroups N = 2
## 2022-12-23 07:05:36 : Creating Coverage Files!, 0.317 mins elapsed.
## 2022-12-23 07:05:36 : Batch Execution w/ safelapply!, 0.317 mins elapsed.
## 2022-12-23 07:05:36 : Group B._.scATAC_BMMC_R1 (1 of 22) : Creating Group Coverage File : B._.scATAC_BMMC_R1.insertions.coverage.h5, 0.317 mins elapsed.
## Number of Cells = 257
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:06:18 : Group B._.scATAC_PBMC_R1 (2 of 22) : Creating Group Coverage File : B._.scATAC_PBMC_R1.insertions.coverage.h5, 1.02 mins elapsed.
## Number of Cells = 166
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:06:59 : Group CD4.M._.scATAC_PBMC_R1 (3 of 22) : Creating Group Coverage File : CD4.M._.scATAC_PBMC_R1.insertions.coverage.h5, 1.703 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:07:41 : Group CD4.M._.scATAC_BMMC_R1 (4 of 22) : Creating Group Coverage File : CD4.M._.scATAC_BMMC_R1.insertions.coverage.h5, 2.411 mins elapsed.
## Number of Cells = 84
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:08:22 : Group CD4.N._.scATAC_BMMC_R1 (5 of 22) : Creating Group Coverage File : CD4.N._.scATAC_BMMC_R1.insertions.coverage.h5, 3.095 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:09:04 : Group CD4.N._.Other (6 of 22) : Creating Group Coverage File : CD4.N._.Other.insertions.coverage.h5, 3.789 mins elapsed.
## Number of Cells = 40
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:09:45 : Group CLP._.scATAC_CD34_BMMC_R1 (7 of 22) : Creating Group Coverage File : CLP._.scATAC_CD34_BMMC_R1.insertions.coverage.h5, 4.469 mins elapsed.
## Number of Cells = 296
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:10:26 : Group CLP._.scATAC_BMMC_R1 (8 of 22) : Creating Group Coverage File : CLP._.scATAC_BMMC_R1.insertions.coverage.h5, 5.159 mins elapsed.
## Number of Cells = 87
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:11:07 : Group Erythroid._.scATAC_CD34_BMMC_R1 (9 of 22) : Creating Group Coverage File : Erythroid._.scATAC_CD34_BMMC_R1.insertions.coverage.h5, 5.842 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:11:49 : Group Erythroid._.scATAC_BMMC_R1 (10 of 22) : Creating Group Coverage File : Erythroid._.scATAC_BMMC_R1.insertions.coverage.h5, 6.546 mins elapsed.
## Number of Cells = 170
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:12:31 : Group GMP._.scATAC_CD34_BMMC_R1 (11 of 22) : Creating Group Coverage File : GMP._.scATAC_CD34_BMMC_R1.insertions.coverage.h5, 7.232 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:13:13 : Group GMP._.scATAC_BMMC_R1 (12 of 22) : Creating Group Coverage File : GMP._.scATAC_BMMC_R1.insertions.coverage.h5, 7.933 mins elapsed.
## Number of Cells = 296
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:13:55 : Group Mono._.scATAC_PBMC_R1 (13 of 22) : Creating Group Coverage File : Mono._.scATAC_PBMC_R1.insertions.coverage.h5, 8.632 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:14:36 : Group Mono._.scATAC_BMMC_R1 (14 of 22) : Creating Group Coverage File : Mono._.scATAC_BMMC_R1.insertions.coverage.h5, 9.328 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:15:19 : Group NK._.scATAC_PBMC_R1 (15 of 22) : Creating Group Coverage File : NK._.scATAC_PBMC_R1.insertions.coverage.h5, 10.032 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:16:02 : Group NK._.scATAC_BMMC_R1 (16 of 22) : Creating Group Coverage File : NK._.scATAC_BMMC_R1.insertions.coverage.h5, 10.747 mins elapsed.
## Number of Cells = 330
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:16:43 : Group pDC._.scATAC_BMMC_R1 (17 of 22) : Creating Group Coverage File : pDC._.scATAC_BMMC_R1.insertions.coverage.h5, 11.441 mins elapsed.
## Number of Cells = 154
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:17:24 : Group pDC._.scATAC_CD34_BMMC_R1 (18 of 22) : Creating Group Coverage File : pDC._.scATAC_CD34_BMMC_R1.insertions.coverage.h5, 12.127 mins elapsed.
## Number of Cells = 151
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:18:05 : Group PreB._.Rep1 (19 of 22) : Creating Group Coverage File : PreB._.Rep1.insertions.coverage.h5, 12.809 mins elapsed.
## Number of Cells = 314
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:18:47 : Group PreB._.Rep2 (20 of 22) : Creating Group Coverage File : PreB._.Rep2.insertions.coverage.h5, 13.498 mins elapsed.
## Number of Cells = 40
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:19:27 : Group Progenitor._.scATAC_CD34_BMMC_R1 (21 of 22) : Creating Group Coverage File : Progenitor._.scATAC_CD34_BMMC_R1.insertions.coverage.h5, 14.178 mins elapsed.
## Number of Cells = 500
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:20:09 : Group Progenitor._.scATAC_BMMC_R1 (22 of 22) : Creating Group Coverage File : Progenitor._.scATAC_BMMC_R1.insertions.coverage.h5, 14.875 mins elapsed.
## Number of Cells = 138
## Coverage File Exists!
## Added Coverage Group
## Added Metadata Group
## Added ArrowCoverage Class
## Added Coverage/Info
## Added Coverage/Info/CellNames
## 2022-12-23 07:20:50 : Adding Kmer Bias to Coverage Files!, 15.556 mins elapsed.
## Completed Kmer Bias Calculation
## Adding Kmer Bias (1 of 22)
## Adding Kmer Bias (2 of 22)
## Adding Kmer Bias (3 of 22)
## Adding Kmer Bias (4 of 22)
## Adding Kmer Bias (5 of 22)
## Adding Kmer Bias (6 of 22)
## Adding Kmer Bias (7 of 22)
## Adding Kmer Bias (8 of 22)
## Adding Kmer Bias (9 of 22)
## Adding Kmer Bias (10 of 22)
## Adding Kmer Bias (11 of 22)
## Adding Kmer Bias (12 of 22)
## Adding Kmer Bias (13 of 22)
## Adding Kmer Bias (14 of 22)
## Adding Kmer Bias (15 of 22)
## Adding Kmer Bias (16 of 22)
## Adding Kmer Bias (17 of 22)
## Adding Kmer Bias (18 of 22)
## Adding Kmer Bias (19 of 22)
## Adding Kmer Bias (20 of 22)
## Adding Kmer Bias (21 of 22)
## Adding Kmer Bias (22 of 22)
## 2022-12-23 07:22:29 : Finished Creation of Coverage Files!, 17.208 mins elapsed.
## ArchR logging successful to : ArchRLogs/ArchR-addGroupCoverages-371b07c38aa5d-Date-2022-12-23_Time-07-05-17.log
Once we have run addGroupCoverages()
, this creates an entry in the project metadata of the ArchRProject
, including a file path to the HDF5-format coverage object that is stored on disk.
@projectMetadata$GroupCoverages$Clusters2$coverageMetadata
projHeme4## DataFrame with 22 rows and 5 columns
## Group Name File nCells
## <character> <character> <character> <integer>
## 1 B B._.scATAC_BMMC_R1 /corces/home/rcorces.. 257
## 2 B B._.scATAC_PBMC_R1 /corces/home/rcorces.. 166
## 3 CD4.M CD4.M._.scATAC_PBMC_R1 /corces/home/rcorces.. 500
## 4 CD4.M CD4.M._.scATAC_BMMC_R1 /corces/home/rcorces.. 84
## 5 CD4.N CD4.N._.scATAC_BMMC_R1 /corces/home/rcorces.. 500
## ... ... ... ... ...
## 18 pDC pDC._.scATAC_CD34_BM.. /corces/home/rcorces.. 151
## 19 PreB PreB._.Rep1 /corces/home/rcorces.. 314
## 20 PreB PreB._.Rep2 /corces/home/rcorces.. 40
## 21 Progenitor Progenitor._.scATAC_.. /corces/home/rcorces.. 500
## 22 Progenitor Progenitor._.scATAC_.. /corces/home/rcorces.. 138
## nInsertions
## <numeric>
## 1 1291546
## 2 1130972
## 3 4757644
## 4 511298
## 5 2129280
## ... ...
## 18 1108616
## 19 1995046
## 20 208888
## 21 3526630
## 22 618374
We can also obtain the actual cell assignments for the pseudo-bulk replicates by calling addGroupCoverages()
a second time using the returnGroups = TRUE
. The returned object is a list of lists where each sub-list contains the replicates for a given cell group.
<- addGroupCoverages(ArchRProj = projHeme3, groupBy = "Clusters2", returnGroups = TRUE)
groups ## ArchR logging to : ArchRLogs/ArchR-addGroupCoverages-371b0d3396f1-Date-2022-12-23_Time-07-22-29.log
## If there is an issue, please report to github with logFile!
## B (1 of 11) : CellGroups N = 2
## CD4.M (2 of 11) : CellGroups N = 2
## CD4.N (3 of 11) : CellGroups N = 2
## CLP (4 of 11) : CellGroups N = 2
## Erythroid (5 of 11) : CellGroups N = 2
## GMP (6 of 11) : CellGroups N = 2
## Mono (7 of 11) : CellGroups N = 2
## NK (8 of 11) : CellGroups N = 2
## pDC (9 of 11) : CellGroups N = 2
## PreB (10 of 11) : CellGroups N = 2
## Progenitor (11 of 11) : CellGroups N = 2
groups## List of length 11
## names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor
If we looked at one of the sub-lists, we would see the cell names for each cell that was used to make up the given pseudo-bulk replicate.
$B
groups## CharacterList of length 2
## [["scATAC_BMMC_R1"]] scATAC_BMMC_R1#TTATGTCAGTGATTAG-1 ...
## [["scATAC_PBMC_R1"]] scATAC_PBMC_R1#TGAGTCAGTACTTGAC-1 ...
With these pseudo-bulk replicates generated, we can now call peaks in our data. As mentioned previously, we do not want to call peaks on the merged set of all single cells so having these more granular cell groups defined, either through clustering or otherwise, provides the ideal starting point for peak calling.