3.1 Creating An ArchRProject
First, we must create our ArchRProject
by providing a list of Arrow files and a few other parameters. The outputDirectory
here describes where all downstream analyses and plots will be saved. ArchR will automatically associate the previously provided geneAnnotation
and genomeAnnotation
with the new ArchRProject
. These were stored when we ran addArchRGenome("hg19")
in a previous chapter.
projHeme1 <- ArchRProject(
ArrowFiles = ArrowFiles,
outputDirectory = "HemeTutorial",
copyArrows = TRUE #This is recommened so that if you modify the Arrow files you have an original copy for later usage.
)
## Using GeneAnnotation set by addArchRGenome(Hg19)!
## Using GeneAnnotation set by addArchRGenome(Hg19)!
## Validating Arrows…
## Getting SampleNames…
##
## Copying ArrowFiles to Ouptut Directory! If you want to save disk space set copyArrows = FALSE
## 1 2 3
## Getting Cell Metadata…
##
## Merging Cell Metadata…
## Initializing ArchRProject…
We call this ArchRProject
“projHeme1” because it is the first iteration of our hematopoiesis project. Throughout this walkthrough we will modify and update this ArchRProject
and keep track of which version of the project we are using by iterating the project number (i.e. “projHeme2”).
We can examine the contents of our ArchRProject
:
## class: ArchRProject
## outputDirectory: /oak/stanford/groups/howchang/users/jgranja/ArchRTutorial/ArchRBook/BookOutput4/HemeTutorial
## samples(3): scATAC_BMMC_R1 scATAC_CD34_BMMC_R1 scATAC_PBMC_R1
## sampleColData names(1): ArrowFiles
## cellColData names(11): Sample TSSEnrichment … DoubletScore
## DoubletEnrichment
## numberOfCells(1): 10661
## medianTSS(1): 16.832
## medianFrags(1): 3050
We can see from the above that our ArchRProject
has been initialized with a few important attributes:
- The specified
outputDirectory
. - The
sampleNames
of each sample which were obtained from the Arrow files. - A matrix called
sampleColData
which contains data associated with each sample. - A matrix called
cellColData
which contains data associated with each cell. Because we already computed doublet enrichment scores usingaddDoubletScores()
, which added those values to each cell in the Arrow files, we can see columns corresponding to the “DoubletEnrichment” and “DoubletScore” in thecellColData
matrix. - The total number of cells in our project which represents all samples after doublet identification and removal.
- The median TSS enrichment score and the median number of fragments across all cells and all samples.
We can check how much memory is used to store the ArchRProject
in memory within R:
## [1] “Memory Size = 37.135 MB”
We can also ask which data matrices are available within the ArchRProject
which will be useful downstream once we start adding to this project:
## [1] “GeneScoreMatrix” “TileMatrix”