Add a GeneIntegrationMatrix to ArrowFiles or an ArchRProject

This function, will integrate multiple subsets of scATAC cells with a scRNA experiment, compute matched scRNA profiles and then store this in each samples ArrowFile.

addGeneIntegrationMatrix(
  ArchRProj = NULL,
  useMatrix = "GeneScoreMatrix",
  matrixName = "GeneIntegrationMatrix",
  reducedDims = "IterativeLSI",
  seRNA = NULL,
  groupATAC = NULL,
  groupRNA = NULL,
  groupList = NULL,
  sampleCellsATAC = 10000,
  sampleCellsRNA = 10000,
  embeddingATAC = NULL,
  embeddingRNA = NULL,
  dimsToUse = 1:30,
  scaleDims = NULL,
  corCutOff = 0.75,
  plotUMAP = TRUE,
  UMAPParams = list(n_neighbors = 40, min_dist = 0.4, metric = "cosine", verbose = FALSE),
  nGenes = 2000,
  useImputation = TRUE,
  reduction = "cca",
  addToArrow = TRUE,
  scaleTo = 10000,
  genesUse = NULL,
  nameCell = "predictedCell",
  nameGroup = "predictedGroup",
  nameScore = "predictedScore",
  transferParams = list(),
  threads = getArchRThreads(),
  verbose = TRUE,
  force = FALSE,
  logFile = createLogFile("addGeneIntegrationMatrix"),
  ...
)

Arguments

ArchRProj: An ArchRProject object.
useMatrix: The name of a matrix in the ArchRProject containing gene scores to be used for RNA integration.
matrixName: The name to use for the output matrix containing scRNA-seq integration to be stored in the ArchRProject.
reducedDims: The name of the reducedDims object (i.e. "IterativeLSI") to retrieve from the designated ArchRProject. This reducedDims will be used in weighting the transfer of data to scRNA to scATAC. See Seurat::TransferData for more info.
seRNA: A SeuratObject or a scRNA-seq SummarizedExperiment (cell x gene) to be integrated with the scATAC-seq data.
groupATAC: A column name in cellColData of the ArchRProj that will be used to determine the subgroupings specified in groupList. This is used to constrain the integration to occur across biologically relevant groups.
groupRNA: A column name in either colData (if SummarizedExperiment) or metadata (if SeuratObject) of seRNA that will be used to determine the subgroupings specified in groupList. This is used to constrain the integration to occur across biologically relevant groups. Additionally this groupRNA is used for the nameGroup output of this function.
groupList: A list of cell groupings for both ATAC-seq and RNA-seq cells to be used for RNA-ATAC integration. This is used to constrain the integration to occur across biologically relevant groups. The format of this should be a list of groups with subgroups of ATAC and RNA specifying cells to integrate from both platforms. For example groupList <- list(groupA = list(ATAC = cellsATAC_A, RNA = cellsRNA_A), groupB = list(ATAC = cellsATAC_B, RNA = cellsRNA_B))
sampleCellsATAC: An integer describing the number of scATAC-seq cells to be used for integration. This number will be evenly sampled across the total number of cells in the ArchRProject.
sampleCellsRNA: An integer describing the number of scRNA-seq cells to be used for integration.
embeddingATAC: A data.frame of cell embeddings such as a UMAP for scATAC-seq cells to be used for density sampling. The data.frame object should have a row for each single cell described in row.names and 2 columns, one for each dimension of the embedding.
embeddingRNA: A data.frame of cell embeddings such as a UMAP for scRNA-seq cells to be used for density sampling. The data.frame object should have a row for each single cell described in row.names and 2 columns, one for each dimension of the embedding.
dimsToUse: A vector containing the dimensions from the reducedDims object to use in clustering.
scaleDims: A boolean value that indicates whether to z-score the reduced dimensions for each cell. This is useful for minimizing the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific biases since it is over-weighting latent PCs. If set to NULL this will scale the dimensions based on the value of scaleDims when the reducedDims were originally created during dimensionality reduction. This idea was introduced by Timothy Stuart.
corCutOff: A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation to sequencing depth that is greater than the corCutOff, it will be excluded from analysis.
plotUMAP: A boolean determining whether to plot a UMAP for each integration block.
UMAPParams: The list of parameters to pass to the UMAP function if "plotUMAP = TRUE". See the function umap in the uwot package.
nGenes: The number of variable genes determined by Seurat::FindVariableGenes() to use for integration.
useImputation: A boolean value indicating whether to use imputation for creating the Gene Score Matrix prior to integration.
reduction: The Seurat reduction method to use for integrating modalities. See Seurat::FindTransferAnchors() for possible reduction methods.
addToArrow: A boolean value indicating whether to add the log2-normalized transcript counts from the integrated matched RNA to the Arrow files.
scaleTo: Each column in the integrated RNA matrix will be normalized to a column sum designated by scaleTo prior to adding to Arrow files.
genesUse: If desired a character vector of gene names to use for integration instead of determined ones from Seurat::variableGenes.
nameCell: A column name to add to cellColData for the predicted scRNA-seq cell in the specified ArchRProject. This is useful for identifying which cell was closest to the scATAC-seq cell.
nameGroup: A column name to add to cellColData for the predicted scRNA-seq group in the specified ArchRProject. See groupRNA for more details.
nameScore: A column name to add to cellColData for the predicted scRNA-seq score in the specified ArchRProject. These scores represent the assignment accuracy of the group in the RNA cells. Lower scores represent ambiguous predictions and higher scores represent precise predictions.
transferParams: Additional params to be passed to Seurat::TransferData.
threads: The number of threads to be used for parallel computing.
verbose: A boolean value that determines whether standard output includes verbose sections.
force: A boolean value indicating whether to force the matrix indicated by matrixName to be overwritten if it already exists in the given input.
logFile: The path to a file to be used for logging ArchR output.
...: Additional params to be added to Seurat::FindTransferAnchors

Examples


#Get Test Project
proj <- getTestProject()

#Get RNA Matrix
sePBMC <- readRDS(
  file.path(system.file("testdata", package = "ArchR"), "seRNA_PBMC.rds")
)

#Gene Integration Matrix
proj <- addGeneIntegrationMatrix(
    ArchRProj = proj, 
    useMatrix = "GeneScoreMatrix",
    matrixName = "GeneIntegrationMatrix",
    reducedDims = "IterativeLSI",
    seRNA = sePBMC,
    addToArrow = FALSE,
    groupRNA = "CellType",
    nameCell = "predictedCell_Un2",
    nameGroup = "predictedGroup_Un2",
    nameScore = "predictedScore_Un2",
    dimsToUse = 1:10,
    nGenes = 250,
    force = TRUE
)