Visium QC

Author

Harvard Chan Bioinformatics Core

Published

July 2, 2025

This code is in this revision.

1 Visium Description

This report was adapted from this tutorial.

Spatial transcriptomic data with the Visium (HD) platform is in many ways similar to scRNAseq data. The data is represented per spot/bin on the slide, as we have spatial barcode but no cellular barcodes.

For Visium data, each spot contains UMI counts for 5-20 cells instead of single cells, but is still quite sparse in the same way as scRNAseq data is, but with the additional information about spatial location in the tissue.

For Visium HD, the slide contain two 6.5 x 6.5 mm Capture Areas with a continuous lawn of oligonucleotides arrayed in millions of 2 x 2 µm barcoded squares without gaps, achieving single cell–scale spatial resolution. The data is output at 2 µm, as well as multiple bin sizes. You can choose the bin resolution for downstream visualization and analysis.

The term spot(s)/bin(s) are used throughout this tutorial which corresponds to two technology Visium and Visium HD.

The main objective of quality control is to filter the data so that we include only data from spots/bins that are of high quality. This makes it so that when we cluster our spots/bins, it is easier to identify distinct cell type populations.

In spatial transcriptomic data, the main challenge is in delineating spots/bins that are poor quality from spots/bins containing reads from less complex cells. If you expect a particular cell type in your dataset to be less transcriptionally active as compared other cell types in your dataset, the spots/bins underneath this cell type will naturally have fewer detected genes and transcripts. However, having fewer detected genes and transcripts can also be a technical artifact and not a result of biological signal.

Various metrics can be used to filter low-quality cells from high-quality ones, including:

UMI counts per spot/bin - This is the number of unique transcripts detected per spot/bin. Because the spot/bin are very small, this number is less than what we would expect for non-spatial scRNAseq data.
Genes detected per spot/bin - This is the number of unique genes detected per spot/bin. Again, because the spots/bins are very small, this number is less than what we would expect for non-spatial scRNAseq data.
Complexity (novelty score) - The novelty score is computed by taking the ratio of nGenes over nUMI. If there are many captured transcripts (high nUMI) and a low number of genes detected in a spot, this likely means that you only captured a low number of genes and simply sequenced transcripts from those lower number of genes over and over again. These low complexity (low novelty) spots/bins could represent a specific cell type (i.e. red blood cells which lack a typical transcriptome), or could be due to an artifact or contamination. Generally, we expect the novelty score to be above 0.80 for good quality spots/bins.
Mitochondrial counts ratio - This metric can identify whether there is a large amount of mitochondrial contamination from dead or dying cells. We define poor quality samples for mitochondrial counts as spots/bins which surpass the 0.2 (20%) mitochondrial ratio mark, unless of course you are expecting this in your sample.
Hemoglobin counts ratio - This metric can identify whether there is a large amount of hemoglobin gene contamination from blood. We define poor quality samples for hemoglobin counts as spots/bins which surpass the 0.2 (20%) hemoglobin ratio mark, unless of course you are expecting this in your sample.

Code

# This set up the working directory to this file so all files can be found
# library(rstudioapi)
# setwd(fs::path_dir(getSourceEditorContext()$path))
stopifnot(R.version$major>= 4) # requires R4
if (compareVersion(R.version$minor,"3.1")<0) warning("We recommend >= R4.3.1") 
stopifnot(compareVersion(as.character(BiocManager::version()), "3.16")>=0)
stopifnot(compareVersion(as.character(packageVersion("Seurat")), "5.1")>=0)

2 Project details

Project: name_hbcXXXXX
PI: person name
Analyst: person in the core
Experiment: short description
Aim: short description

Code

# Metrics like `nCount` and `nfeature` are named with the suffix of default assay name, to make the variable usage more generalizable, we removed the suffix by pulling out the default assay of the visium `seurat` object.
visium <- qs_read(visiumHD_obj)
visium <- PercentageFeatureSet(visium, "^mt-", col.name = "percent_mito")
visium <- PercentageFeatureSet(visium, "^Hb.*-", col.name = "percent_hb")
metaD <- visium@meta.data
metaD$log10GenesPerUMI <- log10(metaD$nFeature)/log10(metaD$nCount)
colnames(metaD)%<>%gsub(pattern=glue("_{DefaultAssay(visium)}"),replacement="")

Let’s take a quick look at the data and make a decision on whether we need to apply any filtering.

3 Quality control per spot/bin

3.1 Number of UMIs and genes detected per spot/bin

Those two metrics is really dependent on tissue type, RNA quality, and sequencing depth. Since the test data is generated from Visium HD technology, we use bin and corresponding reference thresholds in the plot. Reference line at 100 is plotted as the suggested cut-offs for both metrics.

Code

summary_metaD <- apply(metaD[,-1],2,mean)
metacol_label <- list("nFeature"="Genes","nCount"="UMI")
refs <- list("nFeature"=100,"nCount"=100)
dists_before <- imap(metacol_label,\(label,col)
  ggdensity(metaD,
          x = col,xscale="log10",add = "mean", rug = TRUE,
          alpha = 0.2,fill = "lightgray",
          xlab=glue("Number of {label} per bin(in log10 scale)"),
          ylab="Cell density",
          title=glue('Pre-QC {label}/Bin'))+
  geom_vline(xintercept = refs[[col]],color="darkred",cex=rel(1.3),linetype="dashed")+
  annotate("text",x=summary_metaD[col],y = Inf,
           label = glue("Mean \n = {round(summary_metaD[col],0)}"),
            vjust = 1,hjust=2)
)
dists_before[[1]] | dists_before[[2]]

3.2 Overall complexity of transcriptional profile per spot/bin

We can evaluate each spot/bin in terms of how complex the RNA species are by using a measure called the novelty score. The novelty score is computed by taking the ratio of nGenes over nUMI. If there are many captured transcripts (high nUMI) and a low number of genes detected in a cell, this likely means that you only captured a low number of genes and simply sequenced transcripts from those lower number of genes over and over again.

With scRNA-seq this is more easily interpreted for a single cell, but for spatial data this would give us complexity of the spot, which is across multiple cells.

Code

col <- "log10GenesPerUMI"
ggdensity(metaD,x = col,add = "mean", rug = TRUE,
          alpha = 0.2,fill = "lightgray",
          xlab="complexity",ylab="Cell density",title=glue('Novelty score'))+
  geom_vline(xintercept = 0.8,color="darkred",cex=rel(1.3),linetype="dashed")+
  annotate("text",x=summary_metaD[col],y = Inf,
           label = glue("Mean = {round(summary_metaD[col],0)}"),
            vjust = 1,hjust=2)+
    theme(plot.title = element_text(hjust=0.5, face="bold"))

3.3 mitochondria & hemoglospot/bingene ratios

Code

ggplot(metaD %>% 
         select(orig.ident,starts_with("percent_")) %>% 
         tidyr::gather(class,percent_unexpected,-orig.ident), 
        aes_string(x = "orig.ident", y = "percent_unexpected")) +
    geom_violin(position=position_dodge(1),alpha=1, na.rm=TRUE,trim=FALSE)+
    ggbeeswarm::geom_quasirandom(na.rm=TRUE,dodge.width=0.5,
                                 method='quasirandom',alpha=0.01)+
    geom_boxplot(width=0.1,outliers = F)+
   geom_hline(yintercept=20)+
   facet_grid(~class,scales = "free")+
    theme(
      axis.text.x = element_text(size=rel(1),face="bold"),
      plot.title = element_text(hjust = 0.5),
      strip.text.x = element_text(size = rel(1.5), colour = "black"),
      legend.position = "none"
          )+
  scale_y_log10(breaks=c(1,5,10,20,100))+
  # ylim(c(0,100))+
  labs(x="",y="% of contamination genes")

4 QC metrics visualized on slides

Here, we can look at all the QC metrics we discussed above on the individual tissue slide.

Code

features2check <- c(glue('nCount_{DefaultAssay(visium)}'),
                    glue('nFeature_{DefaultAssay(visium)}'),
                    "percent_mito","percent_hb")

Code

for(f in features2check){
  cat("### ", f, "\n\n")
  p1 <- SpatialFeaturePlot(visium, 
                   feature = f,
                  pt.size.factor = 4)
  print(p1)
  cat("\n\n")
}

4.1 nCount_Spatial

4.2 nFeature_Spatial

4.3 percent_mito

4.4 percent_hb

5 Top expressed genes

Now, it is time to choose some cut-offs for QC metrics mentioned above and removing low-quality cells, as well as mitochondria, hemoglobin genes from the feature space and we can take a quick look at what are our top 20 expressed genes.

Code

GeneVar <- glue('nFeature_{DefaultAssay(visium)}')
UMIVar <- glue('nCount_{DefaultAssay(visium)}')
cutoffs <- list("nFeature"=100,"nCount"=100,"hb"=20,"mito"=20)
Qced <-  visium@meta.data[,GeneVar] > cutoffs$nFeature & 
     visium@meta.data[,UMIVar] > cutoffs$nCount & 
     visium$percent_hb < cutoffs$hb & 
     visium$percent_mito < cutoffs$mito
visium <- visium[,Qced]
# Filter Mitocondrial
visium <- visium[!grepl("^mt-", rownames(visium)), ]
# Filter Hemoglobin gene (optional if that is a problem on your data)
visium <- visium[!grepl("^Hb.*-", rownames(visium)), ]

C <- GetAssayData(visium, slot = "counts")
C@x <- C@x / rep.int(colSums(C), diff(C@p))
most_expressed <- order(Matrix::rowSums(C), decreasing = T)[20:1]
exprD <- as.data.frame(t(C[most_expressed, ])) %>% 
  tibble::rownames_to_column("bin") %>% 
  tidyr::gather(gene,expr,-bin)


ggplot(exprD,aes(x=gene,y=expr,color=gene,fill=gene))+
  geom_violin(position=position_dodge(1),alpha=0.5,
              na.rm=TRUE,trim=FALSE)+
  geom_boxplot(width=0.1,outliers = F,color="black")+
  theme_minimal()+
  theme(
      axis.text.x = element_text(size=rel(1),face="bold"),
      plot.title = element_text(hjust = 0.5),
      legend.position = "none"
    )+ 
  scale_y_log10(breaks=c(0.001,.01,.1,1),labels=c(0.1,1,10,100))+
  labs(x="",y="% of total UMIs/bin \n (log10 scaled)")+
  coord_flip()

Code

if(!dir.exists(results_dir)){
  system(glue("mkdir -p {results_dir}"))
}
qs_save(visium, file.path(results_dir, "01_qc.qs"))
outputPath = file.path(results_dir, "01_qc.qs")

We saved your qc-filled Seurat object in ./results/01_qc.qs.

6 Methods

6.1 Citation

Code

citation("Seurat")

To cite Seurat in publications, please use:

  Hao et al. Dictionary learning for integrative, multimodal and
  scalable single-cell analysis. Nature Biotechnology (2023) [Seurat
  V5]

  Hao and Hao et al. Integrated analysis of multimodal single-cell
  data. Cell (2021) [Seurat V4]

  Stuart and Butler et al. Comprehensive Integration of Single-Cell
  Data. Cell (2019) [Seurat V3]

  Butler et al. Integrating single-cell transcriptomic data across
  different conditions, technologies, and species. Nat Biotechnol
  (2018) [Seurat V2]

  Satija and Farrell et al. Spatial reconstruction of single-cell gene
  expression data. Nat Biotechnol (2015) [Seurat V1]

To see these entries in BibTeX format, use 'print(<citation>,
bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'.

6.2 Session Information

Code

sessionInfo()

R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Seurat_5.3.0       SeuratObject_5.1.0 sp_2.2-0           scales_1.4.0      
 [5] gridExtra_2.3      ggpubr_0.6.0       grafify_5.0.0.1    ggprism_1.0.5     
 [9] ggplot2_3.5.2      purrr_1.0.4        dplyr_1.1.4        qs2_0.1.5         
[13] glue_1.8.0         import_1.3.2       knitr_1.50        

loaded via a namespace (and not attached):
  [1] RcppAnnoy_0.0.22       splines_4.5.1          later_1.4.2           
  [4] tibble_3.2.1           polyclip_1.10-7        rpart_4.1.24          
  [7] fastDummies_1.7.5      lifecycle_1.0.4        Rdpack_2.6.4          
 [10] rstatix_0.7.2          globals_0.17.0         lattice_0.22-7        
 [13] MASS_7.3-65            backports_1.5.0        magrittr_2.0.3        
 [16] Hmisc_5.2-3            plotly_4.10.4          rmarkdown_2.29        
 [19] yaml_2.3.10            httpuv_1.6.16          sctransform_0.4.1     
 [22] spam_2.11-1            spatstat.sparse_3.1-0  reticulate_1.42.0     
 [25] cowplot_1.1.3          pbapply_1.7-2          minqa_1.2.8           
 [28] RColorBrewer_1.1-3     abind_1.4-8            Rtsne_0.17            
 [31] nnet_7.3-20            ggrepel_0.9.6          irlba_2.3.5.1         
 [34] listenv_0.9.1          spatstat.utils_3.1-3   goftest_1.2-3         
 [37] RSpectra_0.16-2        spatstat.random_3.3-3  fitdistrplus_1.2-2    
 [40] parallelly_1.43.0      codetools_0.2-20       tidyselect_1.2.1      
 [43] farver_2.1.2           lme4_1.1-37            matrixStats_1.5.0     
 [46] base64enc_0.1-3        spatstat.explore_3.4-2 jsonlite_2.0.0        
 [49] progressr_0.15.1       Formula_1.2-5          ggridges_0.5.6        
 [52] survival_3.8-3         emmeans_1.11.0         tools_4.5.1           
 [55] ica_1.0-3              Rcpp_1.0.14            xfun_0.52             
 [58] mgcv_1.9-3             withr_3.0.2            numDeriv_2016.8-1.1   
 [61] BiocManager_1.30.25    fastmap_1.2.0          boot_1.3-31           
 [64] digest_0.6.37          R6_2.6.1               mime_0.13             
 [67] estimability_1.5.1     colorspace_2.1-1       scattermore_1.2       
 [70] tensor_1.5             dichromat_2.0-0.1      spatstat.data_3.1-6   
 [73] tidyr_1.3.1            generics_0.1.3         data.table_1.17.0     
 [76] httr_1.4.7             htmlwidgets_1.6.4      uwot_0.2.3            
 [79] pkgconfig_2.0.3        gtable_0.3.6           lmtest_0.9-40         
 [82] htmltools_0.5.8.1      carData_3.0-5          dotCall64_1.2         
 [85] png_0.1-8              spatstat.univar_3.1-2  reformulas_0.4.0      
 [88] rstudioapi_0.17.1      reshape2_1.4.4         checkmate_2.3.2       
 [91] nlme_3.1-168           nloptr_2.2.1           zoo_1.8-14            
 [94] stringr_1.5.1          KernSmooth_2.23-26     vipor_0.4.7           
 [97] parallel_4.5.1         miniUI_0.1.2           foreign_0.8-90        
[100] pillar_1.10.2          grid_4.5.1             vctrs_0.6.5           
[103] RANN_2.6.2             promises_1.3.2         car_3.1-3             
[106] stringfish_0.16.0      xtable_1.8-4           cluster_2.1.8.1       
[109] beeswarm_0.4.0         htmlTable_2.4.3        evaluate_1.0.3        
[112] mvtnorm_1.3-3          cli_3.6.5              compiler_4.5.1        
[115] rlang_1.1.6            future.apply_1.11.3    ggsignif_0.6.4        
[118] labeling_0.4.3         plyr_1.8.9             ggbeeswarm_0.7.2      
[121] stringi_1.8.7          viridisLite_0.4.2      deldir_2.0-4          
[124] lmerTest_3.1-3         lazyeval_0.2.2         spatstat.geom_3.3-6   
[127] Matrix_1.7-3           RcppHNSW_0.6.0         patchwork_1.3.0       
[130] future_1.40.0          shiny_1.10.0           rbibutils_2.3         
[133] ROCR_1.0-11            igraph_2.1.4           broom_1.0.8           
[136] RcppParallel_5.1.10