addGeneIntegrationMatrix.Rd
This function, will integrate multiple subsets of scATAC cells with a scRNA experiment, compute matched scRNA profiles and then store this in each samples ArrowFile.
addGeneIntegrationMatrix(
ArchRProj = NULL,
useMatrix = "GeneScoreMatrix",
matrixName = "GeneIntegrationMatrix",
reducedDims = "IterativeLSI",
seRNA = NULL,
groupATAC = NULL,
groupRNA = NULL,
groupList = NULL,
sampleCellsATAC = 10000,
sampleCellsRNA = 10000,
embeddingATAC = NULL,
embeddingRNA = NULL,
dimsToUse = 1:30,
scaleDims = NULL,
corCutOff = 0.75,
plotUMAP = TRUE,
UMAPParams = list(n_neighbors = 40, min_dist = 0.4, metric = "cosine", verbose = FALSE),
nGenes = 2000,
useImputation = TRUE,
reduction = "cca",
addToArrow = TRUE,
scaleTo = 10000,
genesUse = NULL,
nameCell = "predictedCell",
nameGroup = "predictedGroup",
nameScore = "predictedScore",
transferParams = list(),
threads = getArchRThreads(),
verbose = TRUE,
force = FALSE,
logFile = createLogFile("addGeneIntegrationMatrix"),
...
)
An ArchRProject
object.
The name of a matrix in the ArchRProject
containing gene scores to be used for RNA integration.
The name to use for the output matrix containing scRNA-seq integration to be stored in the ArchRProject
.
The name of the reducedDims
object (i.e. "IterativeLSI") to retrieve from the designated ArchRProject
.
This reducedDims
will be used in weighting the transfer of data to scRNA to scATAC. See Seurat::TransferData
for more info.
A SeuratObject
or a scRNA-seq SummarizedExperiment
(cell x gene) to be integrated with the scATAC-seq data.
A column name in cellColData
of the ArchRProj
that will be used to determine the subgroupings specified in groupList
.
This is used to constrain the integration to occur across biologically relevant groups.
A column name in either colData
(if SummarizedExperiment
) or metadata
(if SeuratObject
) of seRNA
that
will be used to determine the subgroupings specified in groupList
. This is used to constrain the integration to occur across biologically relevant groups.
Additionally this groupRNA is used for the nameGroup
output of this function.
A list of cell groupings for both ATAC-seq and RNA-seq cells to be used for RNA-ATAC integration.
This is used to constrain the integration to occur across biologically relevant groups. The format of this should be a list of groups
with subgroups of ATAC and RNA specifying cells to integrate from both platforms.
For example groupList
<- list(groupA = list(ATAC = cellsATAC_A, RNA = cellsRNA_A), groupB = list(ATAC = cellsATAC_B, RNA = cellsRNA_B))
An integer describing the number of scATAC-seq cells to be used for integration. This number will be evenly sampled across the total number of cells in the ArchRProject.
An integer describing the number of scRNA-seq cells to be used for integration.
A data.frame
of cell embeddings such as a UMAP for scATAC-seq cells to be used for density sampling. The data.frame
object
should have a row for each single cell described in row.names
and 2 columns, one for each dimension of the embedding.
A data.frame
of cell embeddings such as a UMAP for scRNA-seq cells to be used for density sampling. The data.frame
object
should have a row for each single cell described in row.names
and 2 columns, one for each dimension of the embedding.
A vector containing the dimensions from the reducedDims
object to use in clustering.
A boolean value that indicates whether to z-score the reduced dimensions for each cell. This is useful for minimizing
the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific
biases since it is over-weighting latent PCs. If set to NULL
this will scale the dimensions based on the value of scaleDims
when the
reducedDims
were originally created during dimensionality reduction. This idea was introduced by Timothy Stuart.
A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a
correlation to sequencing depth that is greater than the corCutOff
, it will be excluded from analysis.
A boolean determining whether to plot a UMAP for each integration block.
The list of parameters to pass to the UMAP function if "plotUMAP = TRUE". See the function umap
in the uwot package.
The number of variable genes determined by Seurat::FindVariableGenes()
to use for integration.
A boolean value indicating whether to use imputation for creating the Gene Score Matrix prior to integration.
The Seurat reduction method to use for integrating modalities. See Seurat::FindTransferAnchors()
for possible reduction methods.
A boolean value indicating whether to add the log2-normalized transcript counts from the integrated matched RNA to the Arrow files.
Each column in the integrated RNA matrix will be normalized to a column sum designated by scaleTo
prior to adding to Arrow files.
If desired a character vector of gene names to use for integration instead of determined ones from Seurat::variableGenes.
A column name to add to cellColData
for the predicted scRNA-seq cell in the specified ArchRProject
. This is useful for identifying which cell was closest to the scATAC-seq cell.
A column name to add to cellColData
for the predicted scRNA-seq group in the specified ArchRProject
. See groupRNA
for more details.
A column name to add to cellColData
for the predicted scRNA-seq score in the specified ArchRProject
. These scores represent
the assignment accuracy of the group in the RNA cells. Lower scores represent ambiguous predictions and higher scores represent precise predictions.
Additional params to be passed to Seurat::TransferData
.
The number of threads to be used for parallel computing.
A boolean value that determines whether standard output includes verbose sections.
A boolean value indicating whether to force the matrix indicated by matrixName
to be overwritten if it already exists in the given input
.
The path to a file to be used for logging ArchR output.
Additional params to be added to Seurat::FindTransferAnchors
#Get Test Project
proj <- getTestProject()
#Get RNA Matrix
sePBMC <- readRDS(
file.path(system.file("testdata", package = "ArchR"), "seRNA_PBMC.rds")
)
#Gene Integration Matrix
proj <- addGeneIntegrationMatrix(
ArchRProj = proj,
useMatrix = "GeneScoreMatrix",
matrixName = "GeneIntegrationMatrix",
reducedDims = "IterativeLSI",
seRNA = sePBMC,
addToArrow = FALSE,
groupRNA = "CellType",
nameCell = "predictedCell_Un2",
nameGroup = "predictedGroup_Un2",
nameScore = "predictedScore_Un2",
dimsToUse = 1:10,
nGenes = 250,
force = TRUE
)