19.3 Exporting pseudo-bulked data to a SummarizedExperiment

Often times, we want to analyze our single-cell data as if it were bulk data to get around issues of sparsity. For example, you might want to use pseudo-bulk replicates when trying to understand “how does cluster X or cell type Y differ across cases and controls”. To enable this type of analysis, ArchR provides the getGroupSE() and getPBGroupSE() functions. These function will group your cells based on different parameters. getGroupSE() will group cells based on the values found in a column of cellColData. For example, you might want to get an SE containing information from each sample for each cluster. As you will remember from earlier chapters, you can add whatever information you want as new columns to cellColData and this can be helpful to create the grouping divisions that you want to extract with getGroupSE(). getPBGroupSE() is similar but it uses the exact pseudo-bulk cell groupings that were used in your ArchRProject to create coverage files, call peaks, etc. If the difference between these functions seems subtle, you should re-read the chapter on pseudo-bulks to understand the difference.

For example, if we wanted to create pseudo-bulk replicates based on Cluster and Sample, creating a single replicate for each sample x cluster combination, we could add a new column to cellColData representing this information:

projHeme5 <- addCellColData(ArchRProj = projHeme5, data = paste0(projHeme5@cellColData$Sample,"_x_",projHeme5@cellColData$Clusters2), name = "Clusters2_x_Sample", cells = getCellNames(projHeme5))
head(projHeme5@cellColData$Clusters2_x_Sample)
## [1] "scATAC_BMMC_R1_x_B"    "scATAC_BMMC_R1_x_NK"   "scATAC_BMMC_R1_x_PreB"
## [4] "scATAC_BMMC_R1_x_PreB" "scATAC_BMMC_R1_x_PreB" "scATAC_BMMC_R1_x_GMP"

We can then use this column in cellColData as the groupBy parameter to getGroupSE() to get a SummarizedExperiment object containing a single column per sample x cluster combination:

groupSE <- getGroupSE(ArchRProj = projHeme5, useMatrix = "PeakMatrix", groupBy = "Clusters2_x_Sample")
## ArchR logging to : ArchRLogs/ArchR-getGroupSE-371b02380712b-Date-2022-12-23_Time-09-19-33.log
## If there is an issue, please report to github with logFile!
## Getting Group Matrix
## 2022-12-23 09:19:54 : Successfully Created Group Matrix, 0.295 mins elapsed.
## Normalizing by number of Cells
## ArchR logging successful to : ArchRLogs/ArchR-getGroupSE-371b02380712b-Date-2022-12-23_Time-09-19-33.log
dim(groupSE)
## [1] 142475     25
colnames(groupSE)
##  [1] "scATAC_BMMC_R1_x_B"               "scATAC_BMMC_R1_x_CD4.M"          
##  [3] "scATAC_BMMC_R1_x_CD4.N"           "scATAC_BMMC_R1_x_CLP"            
##  [5] "scATAC_BMMC_R1_x_Erythroid"       "scATAC_BMMC_R1_x_GMP"            
##  [7] "scATAC_BMMC_R1_x_Mono"            "scATAC_BMMC_R1_x_NK"             
##  [9] "scATAC_BMMC_R1_x_pDC"             "scATAC_BMMC_R1_x_PreB"           
## [11] "scATAC_BMMC_R1_x_Progenitor"      "scATAC_CD34_BMMC_R1_x_B"         
## [13] "scATAC_CD34_BMMC_R1_x_CLP"        "scATAC_CD34_BMMC_R1_x_Erythroid" 
## [15] "scATAC_CD34_BMMC_R1_x_GMP"        "scATAC_CD34_BMMC_R1_x_Mono"      
## [17] "scATAC_CD34_BMMC_R1_x_pDC"        "scATAC_CD34_BMMC_R1_x_Progenitor"
## [19] "scATAC_PBMC_R1_x_B"               "scATAC_PBMC_R1_x_CD4.M"          
## [21] "scATAC_PBMC_R1_x_CD4.N"           "scATAC_PBMC_R1_x_GMP"            
## [23] "scATAC_PBMC_R1_x_Mono"            "scATAC_PBMC_R1_x_NK"             
## [25] "scATAC_PBMC_R1_x_pDC"

getGroupSE() is capable of exporting any matrix to a SummarizedExperiment object.

The functionality of getPBGroupSE() is nearly identical to getGroupSE(). The primary difference is that the cell groupings used will be determined by calling addGroupCoverages() with returnGroups = TRUE.