10.2 Adding Pseudo-scRNA-seq profiles for each scATAC-seq cell

Now that we are satisfied with the results of our scATAC-seq and scRNA-seq integration, we can re-run the integration with addToArrow = TRUE to add the linked gene expression data to each of the Arrow files. As described previously, we pass the groupList to constrain the integration and column names to nameCell, nameGroup, and nameScore for each of the metadata columns we will add to cellColData. Here, we create projHeme3 which will be carried forward in the tutorial.

projHeme3 <- addGeneIntegrationMatrix(
    ArchRProj = projHeme2, 
    useMatrix = "GeneScoreMatrix",
    matrixName = "GeneIntegrationMatrix",
    reducedDims = "IterativeLSI",
    seRNA = seRNA,
    addToArrow = TRUE,
    force= TRUE,
    groupList = groupList,
    groupRNA = "BioClassification",
    nameCell = "predictedCell",
    nameGroup = "predictedGroup",
    nameScore = "predictedScore"
)
## ArchR logging to : ArchRLogs/ArchR-addGeneIntegrationMatrix-371b073f6cca4-Date-2022-12-23_Time-06-48-52.log
## If there is an issue, please report to github with logFile!
## 2022-12-23 06:48:54 : Running Seurat's Integration Stuart* et al 2019, 0.018 mins elapsed.
## 2022-12-23 06:48:54 : Checking ATAC Input, 0.021 mins elapsed.
## 2022-12-23 06:48:55 : Checking RNA Input, 0.047 mins elapsed.
## 2022-12-23 06:49:03 : Found 18601 overlapping gene names from gene scores and rna matrix!, 0.172 mins elapsed.
## 2022-12-23 06:49:03 : Creating Integration Blocks, 0.172 mins elapsed.
## 2022-12-23 06:49:03 : Prepping Interation Data, 0.176 mins elapsed.
## subThreading Disabled since ArchRLocking is TRUE see `addArchRLocking`
## 2022-12-23 06:49:05 : Computing Integration in 2 Integration Blocks!, 0 mins elapsed.
## 2022-12-23 06:49:05 : Block (1 of 2) : Computing Integration, 0 mins elapsed.
## 2022-12-23 06:49:06 : Block (1 of 2) : Identifying Variable Genes, 0.015 mins elapsed.
## 2022-12-23 06:49:07 : Block (1 of 2) : Getting GeneScoreMatrix, 0.037 mins elapsed.
## 2022-12-23 06:49:12 : Block (1 of 2) : Imputing GeneScoreMatrix, 0.121 mins elapsed.
## Getting ImputeWeights
## Warning: The following arguments are not used: row.names
## 2022-12-23 06:49:17 : Block (1 of 2) : Seurat FindTransferAnchors, 0.204 mins elapsed.
## 2022-12-23 06:50:13 : Block (1 of 2) : Seurat TransferData Cell Group Labels, 1.133 mins elapsed.
## 2022-12-23 06:50:13 : Block (1 of 2) : Seurat TransferData Cell Names Labels, 1.145 mins elapsed.
## 2022-12-23 06:50:18 : Block (1 of 2) : Seurat TransferData GeneMatrix, 1.216 mins elapsed.
## 2022-12-23 06:50:21 : Block (1 of 2) : Saving TransferAnchors Joint CCA, 1.275 mins elapsed.
## 2022-12-23 06:50:23 : Block (1 of 2) : Transferring Paired RNA to Temp File, 1.306 mins elapsed.
## 2022-12-23 06:50:25 : Block (1 of 2) : Completed Integration, 1.331 mins elapsed.
## 2022-12-23 06:50:26 : Block (2 of 2) : Computing Integration, 1.358 mins elapsed.
## 2022-12-23 06:50:27 : Block (2 of 2) : Identifying Variable Genes, 1.374 mins elapsed.
## 2022-12-23 06:50:29 : Block (2 of 2) : Getting GeneScoreMatrix, 1.4 mins elapsed.
## 2022-12-23 06:50:36 : Block (2 of 2) : Imputing GeneScoreMatrix, 1.52 mins elapsed.
## Getting ImputeWeights
## Warning: The following arguments are not used: row.names
## 2022-12-23 06:50:57 : Block (2 of 2) : Seurat FindTransferAnchors, 1.87 mins elapsed.
## 2022-12-23 06:51:52 : Block (2 of 2) : Seurat TransferData Cell Group Labels, 2.79 mins elapsed.
## 2022-12-23 06:51:54 : Block (2 of 2) : Seurat TransferData Cell Names Labels, 2.818 mins elapsed.
## 2022-12-23 06:52:04 : Block (2 of 2) : Seurat TransferData GeneMatrix, 2.981 mins elapsed.
## 2022-12-23 06:52:10 : Block (2 of 2) : Saving TransferAnchors Joint CCA, 3.091 mins elapsed.
## 2022-12-23 06:52:12 : Block (2 of 2) : Transferring Paired RNA to Temp File, 3.123 mins elapsed.
## 2022-12-23 06:52:16 : Block (2 of 2) : Completed Integration, 3.192 mins elapsed.
## 2022-12-23 06:52:18 : Block (1 of 2) : Plotting Joint UMAP, 3.219 mins elapsed.
## 2022-12-23 06:52:40 : Block (2 of 2) : Plotting Joint UMAP, 3.593 mins elapsed.
## 2022-12-23 06:53:11 : Transferring Data to ArrowFiles, 4.098 mins elapsed.
## 2022-12-23 06:53:11 : scATAC_BMMC_R1 (1 of 3) Getting GeneIntegrationMatrix From TempFiles!, 4.098 mins elapsed.
## 2022-12-23 06:53:14 : scATAC_BMMC_R1 (1 of 3) Adding GeneIntegrationMatrix to ArrowFile!, 4.152 mins elapsed.
## 2022-12-23 06:54:19 : scATAC_CD34_BMMC_R1 (2 of 3) Getting GeneIntegrationMatrix From TempFiles!, 5.247 mins elapsed.
## Warning in NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, :
## subscript is an array, passing it thru as.vector() first
## Warning in NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, :
## subscript is an array, passing it thru as.vector() first

## Warning in NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, :
## subscript is an array, passing it thru as.vector() first
## 2022-12-23 06:54:22 : scATAC_CD34_BMMC_R1 (2 of 3) Adding GeneIntegrationMatrix to ArrowFile!, 5.289 mins elapsed.
## 2022-12-23 06:55:24 : scATAC_PBMC_R1 (3 of 3) Getting GeneIntegrationMatrix From TempFiles!, 6.314 mins elapsed.
## 2022-12-23 06:55:25 : scATAC_PBMC_R1 (3 of 3) Adding GeneIntegrationMatrix to ArrowFile!, 6.344 mins elapsed.
## 2022-12-23 06:56:23 : Completed Integration with RNA Matrix, 7.304 mins elapsed.
## ArchR logging successful to : ArchRLogs/ArchR-addGeneIntegrationMatrix-371b073f6cca4-Date-2022-12-23_Time-06-48-52.log

Now, when we check which matrices are available using getAvailableMatrices(), we see that the GeneIntegrationMatrix has been added to the Arrow files.

getAvailableMatrices(projHeme3)
## [1] "GeneIntegrationMatrix" "GeneScoreMatrix"       "TileMatrix"

With this new GeneIntegrationMatrix we can compare the linked gene expression with the inferred gene expression obtained through gene scores.

First, lets make sure we have added impute weights to our project:

projHeme3 <- addImputeWeights(projHeme3)
## ArchR logging to : ArchRLogs/ArchR-addImputeWeights-371b06a3c8acd-Date-2022-12-23_Time-06-56-25.log
## If there is an issue, please report to github with logFile!
## 2022-12-23 06:56:26 : Computing Impute Weights Using Magic (Cell 2018), 0 mins elapsed.

Now, lets make some UMAP plots overlayed with the gene expression values from our GeneIntegrationMatrix.

markerGenes  <- c(
    "CD34", #Early Progenitor
    "GATA1", #Erythroid
    "PAX5", "MS4A1", #B-Cell Trajectory
    "CD14", #Monocytes
    "CD3D", "CD8A", "TBX21", "IL7R" #TCells
  )

p1 <- plotEmbedding(
    ArchRProj = projHeme3, 
    colorBy = "GeneIntegrationMatrix", 
    name = markerGenes, 
    continuousSet = "horizonExtra",
    embedding = "UMAP",
    imputeWeights = getImputeWeights(projHeme3)
)
## Getting ImputeWeights
## ArchR logging to : ArchRLogs/ArchR-plotEmbedding-371b02cea7dca-Date-2022-12-23_Time-06-56-35.log
## If there is an issue, please report to github with logFile!
## Getting UMAP Embedding
## ColorBy = GeneIntegrationMatrix
## Getting Matrix Values...
## 2022-12-23 06:56:37 :
## 
## Imputing Matrix
## Using weights on disk
## Using weights on disk
## Plotting Embedding
## 1 2 3 4 5 6 7 8 9 
## ArchR logging successful to : ArchRLogs/ArchR-plotEmbedding-371b02cea7dca-Date-2022-12-23_Time-06-56-35.log

We can make the same UMAP plots but overlay them with the gene score values from our GeneScoreMatrix

p2 <- plotEmbedding(
    ArchRProj = projHeme3, 
    colorBy = "GeneScoreMatrix", 
    continuousSet = "horizonExtra",
    name = markerGenes, 
    embedding = "UMAP",
    imputeWeights = getImputeWeights(projHeme3)
)
## Getting ImputeWeights
## ArchR logging to : ArchRLogs/ArchR-plotEmbedding-371b06b28007c-Date-2022-12-23_Time-06-56-48.log
## If there is an issue, please report to github with logFile!
## Getting UMAP Embedding
## ColorBy = GeneScoreMatrix
## Getting Matrix Values...
## 2022-12-23 06:56:50 :
## 
## Imputing Matrix
## Using weights on disk
## Using weights on disk
## Plotting Embedding
## 1 2 3 4 5 6 7 8 9 
## ArchR logging successful to : ArchRLogs/ArchR-plotEmbedding-371b06b28007c-Date-2022-12-23_Time-06-56-48.log

To plot all marker genes we can use cowplot. First, lets organize our plots.

p1c <- lapply(p1, function(x){
    x + guides(color = FALSE, fill = FALSE) + 
    theme_ArchR(baseSize = 6.5) +
    theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) +
    theme(
        axis.text.x=element_blank(), 
        axis.ticks.x=element_blank(), 
        axis.text.y=element_blank(), 
        axis.ticks.y=element_blank()
    )
})
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.

p2c <- lapply(p2, function(x){
    x + guides(color = FALSE, fill = FALSE) + 
    theme_ArchR(baseSize = 6.5) +
    theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) +
    theme(
        axis.text.x=element_blank(), 
        axis.ticks.x=element_blank(), 
        axis.text.y=element_blank(), 
        axis.ticks.y=element_blank()
    )
})
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
## `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.

This is what the gene expression looks like:

do.call(cowplot::plot_grid, c(list(ncol = 3), p1c))

And this is what the gene score looks like:

do.call(cowplot::plot_grid, c(list(ncol = 3), p2c))

As expected, the results from these two methods for inferring gene expression are similar but not identical.

To save an editable vectorized version of this plot, we use the plotPDF() function.

plotPDF(plotList = p1, 
    name = "Plot-UMAP-Marker-Genes-RNA-W-Imputation.pdf", 
    ArchRProj = projHeme3, 
    addDOC = FALSE, width = 5, height = 5)
## Plotting Ggplot!
## Plotting Ggplot!
## Plotting Ggplot!
## Plotting Ggplot!
## Plotting Ggplot!
## Plotting Ggplot!
## Plotting Ggplot!
## Plotting Ggplot!
## Plotting Ggplot!