addGeneScoreMatrix.Rd
This function, for each sample, will independently compute counts for each tile per cell and then infer gene activity scores.
addGeneScoreMatrix(
input = NULL,
genes = getGenes(input),
geneModel = "exp(-abs(x)/5000) + exp(-1)",
matrixName = "GeneScoreMatrix",
extendUpstream = c(1000, 1e+05),
extendDownstream = c(1000, 1e+05),
geneUpstream = 5000,
geneDownstream = 0,
useGeneBoundaries = TRUE,
useTSS = FALSE,
extendTSS = FALSE,
tileSize = 500,
ceiling = 4,
geneScaleFactor = 5,
scaleTo = 10000,
excludeChr = c("chrY", "chrM"),
blacklist = getBlacklist(input),
threads = getArchRThreads(),
parallelParam = NULL,
subThreading = TRUE,
force = FALSE,
logFile = createLogFile("addGeneScoreMatrix")
)
An ArchRProject
object or character vector of ArrowFiles.
A stranded GRanges
object containing the ranges associated with all gene start and end coordinates.
A string giving a "gene model function" used for weighting peaks for gene score calculation. This string
should be a function of x
, where x
is the stranded distance from the transcription start site of the gene.
The name to be used for storage of the gene activity score matrix in the provided ArchRProject
or ArrowFiles.
The minimum and maximum number of basepairs upstream of the transcription start site to consider for gene activity score calculation.
The minimum and maximum number of basepairs downstream of the transcription start site or transcription termination site (based on 'useTSS') to consider for gene activity score calculation.
An integer describing the number of bp upstream the gene to extend the gene body. This effectively makes the gene body larger as there are proximal peaks that should be weighted equally to the gene body. This parameter is used if 'useTSS=FALSE'.
An integer describing the number of bp downstream the gene to extend the gene body.This effectively makes the gene body larger as there are proximal peaks that should be weighted equally to the gene body. This parameter is used if 'useTSS=FALSE'.
A boolean value indicating whether gene boundaries should be employed during gene activity score calculation. Gene boundaries refers to the process of preventing tiles from contributing to the gene score of a given gene if there is a second gene's transcription start site between the tile and the gene of interest.
A boolean describing whether to build gene model based on gene TSS or the gene body.
A boolean describing whether to extend the gene TSS. By default useTSS uses the 1bp TSS while this parameter enables the extension of this region with 'geneUpstream' and 'geneDownstream' respectively.
The size of the tiles used for binning counts prior to gene activity score calculation.
The maximum counts per tile allowed. This is used to prevent large biases in tile counts.
A numeric scaling factor to weight genes based on the inverse of there length i.e. (Scale Factor)/(Gene Length). This is scaled from 1 to the scale factor. Small genes will be the scale factor while extremely large genes will be closer to 1. This scaling helps with the relative gene score value.
Each column in the calculated gene score matrix will be normalized to a column sum designated by scaleTo
.
A character vector containing the seqnames
of the chromosomes that should be excluded from this analysis.
A GRanges
object containing genomic regions to blacklist that may be extremeley over-represented and thus
biasing the geneScores for genes nearby that locus.
The number of threads to be used for parallel computing.
A list of parameters to be passed for biocparallel/batchtools parallel computing.
A boolean determining whether possible use threads within each multi-threaded subprocess if greater than the number of input samples.
A boolean value indicating whether to force the matrix indicated by matrixName
to be overwritten if it already exist in the given input
.
The path to a file to be used for logging ArchR output.