This function will merge cells within each designated cell group for the generation of pseudo-bulk replicates and then merge these replicates into a single insertion coverage file.

addGroupCoverages(
  ArchRProj = NULL,
  groupBy = "Clusters",
  useLabels = TRUE,
  sampleLabels = "Sample",
  minCells = 40,
  maxCells = 500,
  maxFragments = 25 * 10^6,
  minReplicates = 2,
  maxReplicates = 5,
  sampleRatio = 0.8,
  excludeChr = NULL,
  kmerLength = 6,
  threads = getArchRThreads(),
  returnGroups = FALSE,
  parallelParam = NULL,
  force = FALSE,
  verbose = TRUE,
  logFile = createLogFile("addGroupCoverages")
)

Arguments

ArchRProj

An ArchRProject object.

groupBy

The name of the column in cellColData to use for grouping multiple cells together prior to generation of the insertion coverage file.

useLabels

A boolean value indicating whether to use sample labels to create sample-aware subgroupings during as pseudo-bulk replicate generation.

sampleLabels

The name of a column in cellColData to use to identify samples. In most cases, this parameter should be left as NULL and you should only use this parameter if you do not want to use the default sample labels stored in cellColData$Sample. However, if your individual Arrow files do not map to individual samples, then you should set this parameter to accurately identify your samples. This is the case in (for example) multiplexing applications where cells from different biological samples are mixed into the same reaction and demultiplexed based on a lipid barcode or genotype.

minCells

The minimum number of cells required in a given cell group to permit insertion coverage file generation.

maxCells

The maximum number of cells to use during insertion coverage file generation.

maxFragments

The maximum number of fragments per cell group to use in insertion coverage file generation. This prevents the generation of excessively large files which would negatively impact memory requirements.

minReplicates

The minimum number of pseudo-bulk replicates to be generated.

maxReplicates

The maximum number of pseudo-bulk replicates to be generated.

sampleRatio

The fraction of the total cells that can be sampled to generate any given pseudo-bulk replicate.

excludeChr

A character vector containing the seqnames of the chromosomes that should be excluded from this analysis.

kmerLength

The length of the k-mer used for estimating Tn5 bias.

threads

The number of threads to be used for parallel computing.

returnGroups

A boolean value that indicates whether to return sample-guided cell-groupings without creating coverages. This is used mainly in addReproduciblePeakSet() when MACS2 is not being used to call peaks but rather peaks are called from a TileMatrix (peakMethod = "Tiles").

parallelParam

A list of parameters to be passed for biocparallel/batchtools parallel computing.

force

A boolean value that indicates whether or not to skip validation and overwrite the relevant data in the ArchRProject object if insertion coverage / pseudo-bulk replicate information already exists.

verbose

A boolean value that determines whether standard output includes verbose sections.

logFile

The path to a file to be used for logging ArchR output.

Examples


# Get Test ArchR Project
proj <- getTestProject()

# Add Group Coverages
proj <- addGroupCoverages(proj, force = TRUE)