This function will add peak-to-gene links to a given ArchRProject

addPeak2GeneLinks(
  ArchRProj = NULL,
  reducedDims = "IterativeLSI",
  useMatrix = "GeneIntegrationMatrix",
  dimsToUse = 1:30,
  scaleDims = NULL,
  corCutOff = 0.75,
  cellsToUse = NULL,
  excludeChr = NULL,
  k = 100,
  knnIteration = 500,
  overlapCutoff = 0.8,
  maxDist = 250000,
  scaleTo = 10^4,
  log2Norm = TRUE,
  predictionCutoff = 0.4,
  addEmpiricalPval = FALSE,
  addPermutedPval = FALSE,
  nperm = 100,
  seed = 1,
  threads = max(floor(getArchRThreads()/2), 1),
  verbose = TRUE,
  logFile = createLogFile("addPeak2GeneLinks")
)

Arguments

ArchRProj

An ArchRProject object.

reducedDims

The name of the reducedDims object (i.e. "IterativeLSI") to retrieve from the designated ArchRProject.

useMatrix

The name of the matrix containing gene expression information to be used for determining peak-to-gene links. See getAvailableMatrices(ArchRProj)

dimsToUse

A vector containing the dimensions from the reducedDims object to use in clustering.

scaleDims

A boolean value that indicates whether to z-score the reduced dimensions for each cell. This is useful for minimizing the contribution of strong biases (dominating early PCs) and lowly abundant populations. However, this may lead to stronger sample-specific biases since it is over-weighting latent PCs. If set to NULL this will scale the dimensions based on the value of scaleDims when the reducedDims were originally created during dimensionality reduction. This idea was introduced by Timothy Stuart.

corCutOff

A numeric cutoff for the correlation of each dimension to the sequencing depth. If the dimension has a correlation to sequencing depth that is greater than the corCutOff, it will be excluded from analysis.

cellsToUse

A character vector of cellNames to compute peak-to-gene links on if desired to run on a subset of the total cells.

excludeChr

A character vector containing the seqnames of the chromosomes that should be excluded from this analysis.

k

The number of k-nearest neighbors to use for creating single-cell groups for correlation analyses.

knnIteration

The number of k-nearest neighbor groupings to test for passing the supplied overlapCutoff.

overlapCutoff

The maximum allowable overlap between the current group and all previous groups to permit the current group be added to the group list during k-nearest neighbor calculations.

maxDist

The maximum allowable distance in basepairs between two peaks to consider for co-accessibility.

scaleTo

The total insertion counts from the designated group of single cells is summed across all relevant peak regions from the peakSet of the ArchRProject and normalized to the total depth provided by scaleTo.

log2Norm

A boolean value indicating whether to log2 transform the single-cell groups prior to computing co-accessibility correlations.

predictionCutoff

A numeric describing the cutoff for RNA integration to use when picking cells for groupings.

addEmpiricalPval

Add empirical p-values based on randomly correlating peaks and genes not on the same seqname.

addPermutedPval

Add permuted p-values based on shuffle sample correlating peaks and genes. This approach was adapted from Regner et al 2021 "A multi-omic single-cell landscape of human gynecologic malignancies".

nperm

An integer representing the number of permutations to run for Regner et al 2021 approach.

seed

A number to be used as the seed for random number generation required in knn determination. It is recommended to keep track of the seed used so that you can reproduce results downstream.

threads

The number of threads to be used for parallel computing.

verbose

A boolean value that determines whether standard output should be printed.

logFile

The path to a file to be used for logging ArchR output.

Examples


# Get Test ArchR Project
proj <- getTestProject()

# Add P2G Links
proj <- addPeak2GeneLinks(proj, k = 20)

# Get P2G Links
p2g <- getPeak2GeneLinks(proj)