9.3 Identifying Marker Genes

To identify marker genes based on gene scores, we call the getMarkerFeatures() function with useMatrix = "GeneScoreMatrix". We specify that we want to know the cluster-specific features with groupBy = "Clusters" which tells ArchR to use the “Clusters” column in cellColData to stratify cell groups.

markersGS <- getMarkerFeatures(
    ArchRProj = projHeme2, 
    useMatrix = "GeneScoreMatrix", 
    groupBy = "Clusters",
    bias = c("TSSEnrichment", "log10(nFrags)"),
    testMethod = "wilcoxon"
)
## ArchR logging to : ArchRLogs/ArchR-getMarkerFeatures-371b059dca2ed-Date-2022-12-23_Time-06-31-01.log
## If there is an issue, please report to github with logFile!
## MatrixClass = Sparse.Double.Matrix
## 2022-12-23 06:31:02 : Matching Known Biases, 0.004 mins elapsed.
## ###########
## 2022-12-23 06:32:06 : Completed Pairwise Tests, 1.063 mins elapsed.
## ###########
## ArchR logging successful to : ArchRLogs/ArchR-getMarkerFeatures-371b059dca2ed-Date-2022-12-23_Time-06-31-01.log

This function returns a SummarizedExperiment object containing relevant information on the marker features identified. This type of return value is common in ArchR and is one of the key ways that ArchR enables downstream data analysis. SummarizedExperiment objects are similar to a matrix where rows represent features of interest (i.e. genes) and columns represent samples. A SummarizedExperiment object contains one or more assays, each represented by a matrix-like object of numeric data, and metadata that applies to the rows or columns of the assays matrices. It is beyond the scope of this tutorial to dive into SummarizedExperiment objects but check out the bioconductor page if you need more information.

We can get a list of DataFrame objects, one for each of our clusters, containing the relevant marker features using the getMarkers() function:

markerList <- getMarkers(markersGS, cutOff = "FDR <= 0.01 & Log2FC >= 1.25")
markerList$C6
## DataFrame with 179 rows and 9 columns
##       seqnames     start       end    strand        name       idx    Log2FC
##          <Rle> <integer> <integer> <integer> <character> <integer> <numeric>
## 364       chr1  25291612  25226002         2       RUNX3       364   1.47523
## 14815    chr22  39493229  39500072         1    APOBEC3H       357   2.99750
## 6856     chr14  99737822  99635625         2      BCL11B       595   1.34812
## 19040     chr6 112194655 111981535         2         FYN       881   1.30604
## 14759    chr22  37545962  37521880         2       IL2RB       301   1.95170
## ...        ...       ...       ...       ...         ...       ...       ...
## 17256     chr5  34043371  34017963         2     C1QTNF3       116   1.64203
## 6197     chr13 103411422 103381717         2     CCDC168       385   1.31090
## 13591    chr20   3644046   3639939         2       GFRA4        71   1.54291
## 13393     chr2 232321234 232321155         2     SNORD20      1334   1.50347
## 4783     chr12   9392599   9395645         1   LINC00987       145   1.90817
##               FDR  MeanDiff
##         <numeric> <numeric>
## 364   3.93151e-45   2.00153
## 14815 9.91762e-41   1.40132
## 6856  2.51985e-40   1.38908
## 19040 2.07674e-39   1.84158
## 14759 5.41528e-35   1.11355
## ...           ...       ...
## 17256  0.00747736 0.0811619
## 6197   0.00763824 0.0413260
## 13591  0.00766232 0.2052287
## 13393  0.00774152 0.1337612
## 4783   0.00848428 0.0814575

To visualize all of the marker features simultaneously, we can create a heatmap using the markerHeatmap() function, optionally supplying some marker genes to label on the heatmap via the labelMarkers parameter.

markerGenes  <- c(
    "CD34", #Early Progenitor
    "GATA1", #Erythroid
    "PAX5", "MS4A1", "EBF1", "MME", #B-Cell Trajectory
    "CD14", "CEBPB", "MPO", #Monocytes
    "IRF8", 
    "CD3D", "CD8A", "TBX21", "IL7R" #TCells
  )

heatmapGS <- markerHeatmap(
  seMarker = markersGS, 
  cutOff = "FDR <= 0.01 & Log2FC >= 1.25", 
  labelMarkers = markerGenes,
  transpose = TRUE
)
## Warning: 'markerHeatmap' is deprecated.
## Use 'plotMarkerHeatmap' instead.
## See help("Deprecated")
## ArchR logging to : ArchRLogs/ArchR-plotMarkerHeatmap-371b05c1b5519-Date-2022-12-23_Time-06-32-07.log
## If there is an issue, please report to github with logFile!
## Printing Top Marker Genes:
## C1:
##  ARHGEF10L, TMEM61, MIR1262, DNASE2B, HSD3BP4, FCGR1CP, CTSS, S100A12, CD1C, SPTA1, FCGR2C, F5, SHISA4, LAX1, OR2W5
## C2:
##  PERM1, PPIEL, TACSTD2, MIR4711, L1TD1, MIR3671, PALMD, GPR88, MYBPHL, GNAT2, CASQ2, IGSF3, NOTCH2, S100A11, ETV3L
## C3:
##  CLCNKB, RPS14P3, MIR378F, GRHL3, TRIM63, NKAIN1, GJB5, GJB3, GJA4, SMIM12, RHBDL2, PPIE, FAM183A, CDCP2, PCSK9
## C4:
##  CFAP74, ACTRT2, MIR34A, C1QB, TBATA, MINPP1, RASSF10, SLC6A5, PRG3, SLC22A12, KRTAP5-9, BARX2, LINC02393, SACS-AS1, CNMD
## C5:
##  MMEL1, TTC34, LINC00982, MIR4689, PADI3, NBL1, HSPG2, MIR4253, CATSPER4, FCN3, SLC25A3P1, GLIS1, ACOT11, KCND3, LOC643441
## C6:
##  MMP23B, FBXO44, RUNX3, GBP5, VCAM1, BCL2L15, CD160, SH2D2A, PYHIN1, SLAMF7, FCGR3A, MIR557, LOC100505918, FASLG, OPTN
## C7:
##  SNORA59B, RCAN3AS, RCAN3, PATJ, IFI44L, MTF2, CHI3L2, MAGI3, CD2, HIST2H2BF, CIART, C1orf56, PIGM, HSPA6, HSPA7
## C8:
##  ESPN, MIR4252, TNFRSF25, GPR89A, CRABP2, TSTD1, GAS5-AS1, BNIP3, BTBD18, LOC341056, DTX3, SNORD116-3, IPW, C15orf48, MTRNR2L4
## C9:
##  LINC01342, LINC01346, IFNLR1, BEND5, SLC44A5, CNN3, LOC729970, SLC16A4, DRAM2, TENT5C, FCRL3, FCRL2, PEA15, VANGL2, SLAMF6
## C10:
##  MAD2L2, DRAXIN, SLC30A2, MIR4254, ROR1, TTC24, FCRLB, SLC16A9, LHPP, IGF2, CALCA, CALCB, C11orf74, RAG2, CDC42BPG
## C11:
##  PLA2G5, PLA2G2D, NCMAP, TSPAN1, TTC39A, GBP1P1, GFI1, LOC100129046, PGCP1, HAO2, FCRL5, MAEL, TBX19, FMO1, SLC9C2
## C12:
##  TTLL10, TFAP2E, EDN2, ERICH3, ABCA4, UCK2, BRINP2, LEMD1, WNT9A, MIR5008, TBCE, MIR4480, MIR1265, RASGEF1A, C10orf53
## Identified 2721 markers!
##  [1] "CD34"  "GATA1" "PAX5"  "MS4A1" "EBF1"  "MME"   "CD14"  "CEBPB" "MPO"  
## [10] "IRF8"  "CD3D"  "CD8A"  "TBX21" "IL7R"
## Adding Annotations..
## Preparing Main Heatmap..
## 'magick' package is suggested to install to give better rasterization.
## 
## Set `ht_opt$message = FALSE` to turn off this message.
## ArchR logging successful to : ArchRLogs/ArchR-plotMarkerHeatmap-371b05c1b5519-Date-2022-12-23_Time-06-32-07.log

To plot this heatmap, we can use the ComplexHeatmap::draw() function because the heatmapGS object is actually a list of heatmaps:

ComplexHeatmap::draw(heatmapGS, heatmap_legend_side = "bot", annotation_legend_side = "bot")

Many heatmaps in ArchR utilize the ComplexHeatmap package which provides a very versatile and flexible framework for plotting heatmaps. If you need to change something about how the heatmap looks when plotted, you should consult the documentation for ComplexHeatmap. For example, if you want to change the order in which the heatmap rows are plotted, you can edit the row_order slot of the heatmap object and re-plot it:

heatmapGS@row_order <- c(10,11,12,3,4,5,1,2,6,8,7,9)
ComplexHeatmap::draw(heatmapGS, heatmap_legend_side = "bot", annotation_legend_side = "bot")

To save an editable vectorized version of this plot, we use plotPDF().

plotPDF(heatmapGS, name = "GeneScores-Marker-Heatmap", width = 8, height = 6, ArchRProj = projHeme2, addDOC = FALSE)
## Plotting ComplexHeatmap!