9.3 Identifying Marker Genes
To identify marker genes based on gene scores, we call the getMarkerFeatures()
function with useMatrix = "GeneScoreMatrix"
. We specify that we want to know the cluster-specific features with groupBy = "Clusters"
which tells ArchR to use the “Clusters” column in cellColData
to stratify cell groups.
<- getMarkerFeatures(
markersGS ArchRProj = projHeme2,
useMatrix = "GeneScoreMatrix",
groupBy = "Clusters",
bias = c("TSSEnrichment", "log10(nFrags)"),
testMethod = "wilcoxon"
)## ArchR logging to : ArchRLogs/ArchR-getMarkerFeatures-371b059dca2ed-Date-2022-12-23_Time-06-31-01.log
## If there is an issue, please report to github with logFile!
## MatrixClass = Sparse.Double.Matrix
## 2022-12-23 06:31:02 : Matching Known Biases, 0.004 mins elapsed.
## ###########
## 2022-12-23 06:32:06 : Completed Pairwise Tests, 1.063 mins elapsed.
## ###########
## ArchR logging successful to : ArchRLogs/ArchR-getMarkerFeatures-371b059dca2ed-Date-2022-12-23_Time-06-31-01.log
This function returns a SummarizedExperiment
object containing relevant information on the marker features identified. This type of return value is common in ArchR and is one of the key ways that ArchR enables downstream data analysis. SummarizedExperiment
objects are similar to a matrix where rows represent features of interest (i.e. genes) and columns represent samples. A SummarizedExperiment
object contains one or more assays, each represented by a matrix-like object of numeric data, and metadata that applies to the rows or columns of the assays matrices. It is beyond the scope of this tutorial to dive into SummarizedExperiment
objects but check out the bioconductor page if you need more information.
We can get a list of DataFrame
objects, one for each of our clusters, containing the relevant marker features using the getMarkers()
function:
<- getMarkers(markersGS, cutOff = "FDR <= 0.01 & Log2FC >= 1.25")
markerList $C6
markerList## DataFrame with 179 rows and 9 columns
## seqnames start end strand name idx Log2FC
## <Rle> <integer> <integer> <integer> <character> <integer> <numeric>
## 364 chr1 25291612 25226002 2 RUNX3 364 1.47523
## 14815 chr22 39493229 39500072 1 APOBEC3H 357 2.99750
## 6856 chr14 99737822 99635625 2 BCL11B 595 1.34812
## 19040 chr6 112194655 111981535 2 FYN 881 1.30604
## 14759 chr22 37545962 37521880 2 IL2RB 301 1.95170
## ... ... ... ... ... ... ... ...
## 17256 chr5 34043371 34017963 2 C1QTNF3 116 1.64203
## 6197 chr13 103411422 103381717 2 CCDC168 385 1.31090
## 13591 chr20 3644046 3639939 2 GFRA4 71 1.54291
## 13393 chr2 232321234 232321155 2 SNORD20 1334 1.50347
## 4783 chr12 9392599 9395645 1 LINC00987 145 1.90817
## FDR MeanDiff
## <numeric> <numeric>
## 364 3.93151e-45 2.00153
## 14815 9.91762e-41 1.40132
## 6856 2.51985e-40 1.38908
## 19040 2.07674e-39 1.84158
## 14759 5.41528e-35 1.11355
## ... ... ...
## 17256 0.00747736 0.0811619
## 6197 0.00763824 0.0413260
## 13591 0.00766232 0.2052287
## 13393 0.00774152 0.1337612
## 4783 0.00848428 0.0814575
To visualize all of the marker features simultaneously, we can create a heatmap using the markerHeatmap()
function, optionally supplying some marker genes to label on the heatmap via the labelMarkers
parameter.
<- c(
markerGenes "CD34", #Early Progenitor
"GATA1", #Erythroid
"PAX5", "MS4A1", "EBF1", "MME", #B-Cell Trajectory
"CD14", "CEBPB", "MPO", #Monocytes
"IRF8",
"CD3D", "CD8A", "TBX21", "IL7R" #TCells
)
<- markerHeatmap(
heatmapGS seMarker = markersGS,
cutOff = "FDR <= 0.01 & Log2FC >= 1.25",
labelMarkers = markerGenes,
transpose = TRUE
)## Warning: 'markerHeatmap' is deprecated.
## Use 'plotMarkerHeatmap' instead.
## See help("Deprecated")
## ArchR logging to : ArchRLogs/ArchR-plotMarkerHeatmap-371b05c1b5519-Date-2022-12-23_Time-06-32-07.log
## If there is an issue, please report to github with logFile!
## Printing Top Marker Genes:
## C1:
## ARHGEF10L, TMEM61, MIR1262, DNASE2B, HSD3BP4, FCGR1CP, CTSS, S100A12, CD1C, SPTA1, FCGR2C, F5, SHISA4, LAX1, OR2W5
## C2:
## PERM1, PPIEL, TACSTD2, MIR4711, L1TD1, MIR3671, PALMD, GPR88, MYBPHL, GNAT2, CASQ2, IGSF3, NOTCH2, S100A11, ETV3L
## C3:
## CLCNKB, RPS14P3, MIR378F, GRHL3, TRIM63, NKAIN1, GJB5, GJB3, GJA4, SMIM12, RHBDL2, PPIE, FAM183A, CDCP2, PCSK9
## C4:
## CFAP74, ACTRT2, MIR34A, C1QB, TBATA, MINPP1, RASSF10, SLC6A5, PRG3, SLC22A12, KRTAP5-9, BARX2, LINC02393, SACS-AS1, CNMD
## C5:
## MMEL1, TTC34, LINC00982, MIR4689, PADI3, NBL1, HSPG2, MIR4253, CATSPER4, FCN3, SLC25A3P1, GLIS1, ACOT11, KCND3, LOC643441
## C6:
## MMP23B, FBXO44, RUNX3, GBP5, VCAM1, BCL2L15, CD160, SH2D2A, PYHIN1, SLAMF7, FCGR3A, MIR557, LOC100505918, FASLG, OPTN
## C7:
## SNORA59B, RCAN3AS, RCAN3, PATJ, IFI44L, MTF2, CHI3L2, MAGI3, CD2, HIST2H2BF, CIART, C1orf56, PIGM, HSPA6, HSPA7
## C8:
## ESPN, MIR4252, TNFRSF25, GPR89A, CRABP2, TSTD1, GAS5-AS1, BNIP3, BTBD18, LOC341056, DTX3, SNORD116-3, IPW, C15orf48, MTRNR2L4
## C9:
## LINC01342, LINC01346, IFNLR1, BEND5, SLC44A5, CNN3, LOC729970, SLC16A4, DRAM2, TENT5C, FCRL3, FCRL2, PEA15, VANGL2, SLAMF6
## C10:
## MAD2L2, DRAXIN, SLC30A2, MIR4254, ROR1, TTC24, FCRLB, SLC16A9, LHPP, IGF2, CALCA, CALCB, C11orf74, RAG2, CDC42BPG
## C11:
## PLA2G5, PLA2G2D, NCMAP, TSPAN1, TTC39A, GBP1P1, GFI1, LOC100129046, PGCP1, HAO2, FCRL5, MAEL, TBX19, FMO1, SLC9C2
## C12:
## TTLL10, TFAP2E, EDN2, ERICH3, ABCA4, UCK2, BRINP2, LEMD1, WNT9A, MIR5008, TBCE, MIR4480, MIR1265, RASGEF1A, C10orf53
## Identified 2721 markers!
## [1] "CD34" "GATA1" "PAX5" "MS4A1" "EBF1" "MME" "CD14" "CEBPB" "MPO"
## [10] "IRF8" "CD3D" "CD8A" "TBX21" "IL7R"
## Adding Annotations..
## Preparing Main Heatmap..
## 'magick' package is suggested to install to give better rasterization.
##
## Set `ht_opt$message = FALSE` to turn off this message.
## ArchR logging successful to : ArchRLogs/ArchR-plotMarkerHeatmap-371b05c1b5519-Date-2022-12-23_Time-06-32-07.log
To plot this heatmap, we can use the ComplexHeatmap::draw()
function because the heatmapGS
object is actually a list of heatmaps:
::draw(heatmapGS, heatmap_legend_side = "bot", annotation_legend_side = "bot") ComplexHeatmap
Many heatmaps in ArchR utilize the ComplexHeatmap
package which provides a very versatile and flexible framework for plotting heatmaps. If you need to change something about how the heatmap looks when plotted, you should consult the documentation for ComplexHeatmap
. For example, if you want to change the order in which the heatmap rows are plotted, you can edit the row_order
slot of the heatmap object and re-plot it:
@row_order <- c(10,11,12,3,4,5,1,2,6,8,7,9)
heatmapGS::draw(heatmapGS, heatmap_legend_side = "bot", annotation_legend_side = "bot") ComplexHeatmap
To save an editable vectorized version of this plot, we use plotPDF()
.
plotPDF(heatmapGS, name = "GeneScores-Marker-Heatmap", width = 8, height = 6, ArchRProj = projHeme2, addDOC = FALSE)
## Plotting ComplexHeatmap!