9.3 Identifying Marker Genes

To identify marker genes based on gene scores, we call the getMarkerFeatures() function with useMatrix = "GeneScoreMatrix". We specify that we want to know the cluster-specific features with groupBy = "Clusters" which tells ArchR to use the “Clusters” column in cellColData to stratify cell groups.

markersGS <- getMarkerFeatures(
    ArchRProj = projHeme2, 
    useMatrix = "GeneScoreMatrix", 
    groupBy = "Clusters",
    bias = c("TSSEnrichment", "log10(nFrags)"),
    testMethod = "wilcoxon"
)
## ArchR logging to : ArchRLogs/ArchR-getMarkerFeatures-930080ad0-Date-2025-02-06_Time-01-08-45.467915.log
## If there is an issue, please report to github with logFile!
## MatrixClass = Sparse.Double.Matrix
## 2025-02-06 01:08:46.30961 : Matching Known Biases, 0.003 mins elapsed.
## ###########
## 2025-02-06 01:10:24.870633 : Completed Pairwise Tests, 1.646 mins elapsed.
## ###########
## ArchR logging successful to : ArchRLogs/ArchR-getMarkerFeatures-930080ad0-Date-2025-02-06_Time-01-08-45.467915.log

This function returns a SummarizedExperiment object containing relevant information on the marker features identified. This type of return value is common in ArchR and is one of the key ways that ArchR enables downstream data analysis. SummarizedExperiment objects are similar to a matrix where rows represent features of interest (i.e. genes) and columns represent samples. A SummarizedExperiment object contains one or more assays, each represented by a matrix-like object of numeric data, and metadata that applies to the rows or columns of the assays matrices. It is beyond the scope of this tutorial to dive into SummarizedExperiment objects but check out the bioconductor page if you need more information.

We can get a list of DataFrame objects, one for each of our clusters, containing the relevant marker features using the getMarkers() function:

markerList <- getMarkers(markersGS, cutOff = "FDR <= 0.01 & Log2FC >= 1.25")
markerList$C6
## DataFrame with 460 rows and 9 columns
##       seqnames     start       end    strand        name       idx    Log2FC
##          <Rle> <integer> <integer> <integer> <character> <integer> <numeric>
## 14724    chr22  33454377  32908540         2        SYN3       266   1.50383
## 7180     chr15  33603177  34158303         1        RYR3       143   1.61610
## 15180     chr3  37493813  37861281         1       ITGA9       190   1.42296
## 730       chr1  47655771  47649261         2    PDZK1IP1       730   2.31953
## 4691     chr12   6055398   5671817         2        ANO2        53   1.66399
## ...        ...       ...       ...       ...         ...       ...       ...
## 4722     chr12   6949118   6956557         1        GNB3        84   1.72832
## 12015    chr19  58180303  58190520         1      ZSCAN4      1563   1.30289
## 17723     chr5 135528851 135527156         2      SMIM32       583   1.57534
## 18899     chr6  78173120  78171948         2       HTR1B       740   1.25133
## 22727     chrX 101112549 101087085         2        NXF5       510   1.82215
##               FDR  MeanDiff
##         <numeric> <numeric>
## 14724 1.00235e-28  1.258169
## 7180  1.12058e-22  0.913549
## 15180 6.17469e-21  0.908384
## 730   1.52549e-20  0.758720
## 4691  1.52549e-20  0.588930
## ...           ...       ...
## 4722   0.00934267 0.2275534
## 12015  0.00940705 0.0876258
## 17723  0.00941627 0.2378894
## 18899  0.00960201 0.4154427
## 22727  0.00987594 0.0780624

To visualize all of the marker features simultaneously, we can create a heatmap using the markerHeatmap() function, optionally supplying some marker genes to label on the heatmap via the labelMarkers parameter.

markerGenes  <- c(
    "CD34", #Early Progenitor
    "GATA1", #Erythroid
    "PAX5", "MS4A1", "EBF1", "MME", #B-Cell Trajectory
    "CD14", "CEBPB", "MPO", #Monocytes
    "IRF8", 
    "CD3D", "CD8A", "TBX21", "IL7R" #TCells
  )

heatmapGS <- markerHeatmap(
  seMarker = markersGS, 
  cutOff = "FDR <= 0.01 & Log2FC >= 1.25", 
  labelMarkers = markerGenes,
  transpose = TRUE
)
## Warning in markerHeatmap(seMarker = markersGS, cutOff = "FDR <= 0.01 & Log2FC >= 1.25", : 'markerHeatmap' is deprecated.
## Use 'plotMarkerHeatmap' instead.
## See help("Deprecated")
## ArchR logging to : ArchRLogs/ArchR-plotMarkerHeatmap-9e5e6cd4-Date-2025-02-06_Time-01-10-25.770159.log
## If there is an issue, please report to github with logFile!
## Printing Top Marker Genes:
## C1:
##  LOC100288069, RBP7, ARHGEF10L, CELA3A, LINC01226, C1orf185, ZFYVE9, MIR1262, F3, HSD3BP4, SRGAP2D, SNX27, S100A9, FCGR2C, MIR556
## C2:
##  PERM1, RHD, FGR, PPIEL, BTBD19, TCTEX1D4, L1TD1, MIR3671, GPR88, CASQ2, NHLH2, SLC22A15, IGSF3, FCGR1CP, LINC00869
## C3:
##  FAM131C, PLA2G5, IFNLR1, NCMAP, TSPAN1, TTC39A, SLC25A3P1, GBP1P1, GFI1, LOC100129046, S100A4, S100A16, BCAN, FCRL5, FCRL4
## C4:
##  LINC01342, MAD2L2, LDLRAP1, ITGB3BP, PGM1, ROR1, TTC24, FCRLB, MIR1231, KLHDC8A, LYPLAL1, PGBD5, OPN3, OR2L13, ZNF22
## C5:
##  TTLL10, KAZN, TFAP2E, RIMS3, TRABD2B, PGLYRP3, UCK2, TNR, KIAA1614, LEMD1, SERTAD4-AS1, CCDC3, DEPP1, PRKG1-AS1, SH2D4B
## C6:
##  PAX7, RPS14P3, FAM43B, MIR378F, GRHL3, TRIM63, GJA4, RHBDL2, PPIE, MPL, PDZK1IP1, FOXE3, DNAJC6, OLFM3, PHTF1
## C7:
##  CFAP74, MIR34A, DMRTB1, MGAT4EP, LOC148709, EIF5AL1, RASSF10, SLC6A5, MIR4488, LRRC10B, SHANK2-AS1, CACNA1C-AS4, DCLK1, HTR2A, LRRC36
## C8:
##  LINC02593, PRDM16, MIR4251, MIR4689, TMEM51-AS1, PADI3, NBL1, RAP1GAP, LDLRAD2, HSPG2, MIR4253, CATSPER4, FAM229A, EPHA10, EDN2
## C9:
##  ID3, CNR2, BEND5, SHISAL2A, CLCA3P, LOC729970, TMEM56, MFSD14A, LRRC39, SLC16A4, DRAM2, PIFO, FCRL3, FCRL2, DNM3OS
## C10:
##  MMP23B, RUNX3, IL12RB2, GBP2, GBP5, LOC729930, TGFBR3, VCAM1, HIST2H2BA, FCGR1B, CD160, PDZK1, SH2D2A, NTRK1, PYHIN1
## C11:
##  RCAN3AS, CRYBG2, PPP1R8, LCK, HOOK1, SYDE2, SNORD21, CHI3L2, CD2, HSPA7, BLZF1, SELL, RNU6-72P, PRKCQ-AS1, TAF3
## C12:
##  SNORA59B, C1orf194, AMIGO1, GPR61, CRABP2, GAS5-AS1, POLR2G, ZPR1, RPL13P5, PIP4K2C, DTX3, LINC02291, MTRNR2L4, TNK1, HOXB2
## Identified 2736 markers!
##  [1] "CD34"  "GATA1" "PAX5"  "MS4A1" "EBF1"  "MME"   "CD14"  "CEBPB" "MPO"  
## [10] "IRF8"  "CD3D"  "CD8A"  "TBX21" "IL7R"
## Adding Annotations..
## Preparing Main Heatmap..
## 'magick' package is suggested to install to give better rasterization.
## 
## Set `ht_opt$message = FALSE` to turn off this message.
## ArchR logging successful to : ArchRLogs/ArchR-plotMarkerHeatmap-9e5e6cd4-Date-2025-02-06_Time-01-10-25.770159.log

To plot this heatmap, we can use the ComplexHeatmap::draw() function because the heatmapGS object is actually a list of heatmaps:

ComplexHeatmap::draw(heatmapGS, heatmap_legend_side = "bot", annotation_legend_side = "bot")

Many heatmaps in ArchR utilize the ComplexHeatmap package which provides a very versatile and flexible framework for plotting heatmaps. If you need to change something about how the heatmap looks when plotted, you should consult the documentation for ComplexHeatmap. For example, if you want to change the order in which the heatmap rows are plotted, you can edit the row_order slot of the heatmap object and re-plot it:

heatmapGS@row_order <- c(10,11,12,3,4,5,1,2,6,8,7,9)
ComplexHeatmap::draw(heatmapGS, heatmap_legend_side = "bot", annotation_legend_side = "bot")

To save an editable vectorized version of this plot, we use plotPDF().

plotPDF(heatmapGS, name = "GeneScores-Marker-Heatmap", width = 8, height = 6, ArchRProj = projHeme2, addDOC = FALSE)
## Plotting ComplexHeatmap!