Please download the app, and execute it to identify parameters interactively
After you have decided on your QC metrics load your raw object (i.e. right after you first read data into seurat) and put the parameters.R file you got from the shiny app in the same folder as this rmd.
Overview
Project: name_hbcXXXXX
PI: person name
Analyst: person in the core
Experiment: short description
Code
if (isUrl(seurat_obj)) { seurat_raw <-readRDS(url(seurat_obj))} else { seurat_raw <-readRDS(seurat_obj)}
Code
# if you suspect that your dataset contains doublets, you can use this code to# detect them and filter them out. if your seurat object contains data from more# than one sample, it is important to pass the metadata column containing the# sample name to scDblFinder using the "samples" argumentsce <-as.SingleCellExperiment(seurat_raw, assay ="RNA")sce <-scDblFinder(sce, samples ="orig.ident")meta_scdblfinder <- sce@colData@listData %>%as.data.frame() %>% dplyr::select(starts_with("scDblFinder")) %>%mutate(barcode = sce@colData@rownames)seurat_raw_meta <- seurat_raw@meta.dataseurat_raw_meta_new <- seurat_raw_meta %>%left_join(meta_scdblfinder, by ="barcode")rownames(seurat_raw_meta_new) <- seurat_raw_meta_new$barcodeseurat_raw@meta.data <- seurat_raw_meta_newseurat_raw <-subset(seurat_raw, scDblFinder.class =="singlet")
In this section, we review quality control (QC) metrics for the raw feature matrices generated by Cellranger. Only a low level filter excluding cells with <100 nUMIs (= number of unique molecular identifiers, or sequenced reads per cell) was applied when uploading the data into R.
Here, we look at the distribution of UMIs (unique molecular identifiers, or sequenced reads) per cell (droplet) in the dataset. Before QC, we expect a biomodal distribution with a first small peak at low numbers of UMIs (<250) corresponding to droplets that encapsulated background/dying cells, and a second higher peak centered at >1000. The line is at 0.
Code
metadata0 %>%ggplot(aes(x = nUMI, color = orig.ident, fill = orig.ident)) +geom_density(alpha =0.2) +theme_classic() +ylab("Cell density") +scale_x_log10() +geom_vline(xintercept = nCount_RNA_cutoff) +facet_wrap(. ~ orig.ident) +ggtitle("UMIs per cell in raw dataset")
Code
# Visualize the distribution of nUMIs per cell (boxplot)metadata0 %>%ggplot(aes(x = orig.ident, y =log10(nUMI), fill = orig.ident)) +geom_violin() +geom_boxplot(width =0.1, fill =alpha("white", 0.7)) +theme_classic() +geom_hline(yintercept =log10(nCount_RNA_cutoff)) +theme(axis.text.x =element_text(angle =45, vjust =1, hjust =1)) +theme(plot.title =element_text(hjust =0.5, face ="bold"))
Genes per cell
Here, we look at the number of different genes that were detected in each cell. By “detected”, we mean genes with a non-zero read count measurement. Gene detection in the range of 500 to 5000 is normal for most single-cell experiments. The line is at 0.
Code
# Visualize the distribution of genes detected per cell (histogram)metadata0 %>%ggplot(aes(x = nGene, color = orig.ident, fill = orig.ident)) +geom_density(alpha =0.2) +theme_classic() +scale_x_log10() +geom_vline(xintercept =c(nFeature_RNA_cutoff)) +facet_wrap(. ~ orig.ident) +ggtitle("Detected genes per cell in raw dataset")
Code
# Visualize the distribution of nUMIs per cell (boxplot)metadata0 %>%ggplot(aes(x = orig.ident, y =log10(nGene), fill = orig.ident)) +geom_violin() +geom_boxplot(width =0.1, fill =alpha("white", 0.7)) +theme_classic() +geom_hline(yintercept =c(log10(nFeature_RNA_cutoff))) +theme(axis.text.x =element_text(angle =45, vjust =1, hjust =1)) +theme(plot.title =element_text(hjust =0.5, face ="bold"))
Mitochondrial ratio
We evaluate overall mitochondrial gene expression as a biomarker of cellular stress during sample preparation. Typically, we expect mitochondrial genes to account for <20% of overall transcripts in each cell. The line indicates 100 %.
Code
# Visualize the distribution of mitochondrial gene expression detected per cellmetadata0 %>%ggplot(aes(color = orig.ident, x = mitoRatio, fill = orig.ident)) +geom_density(alpha =0.2) +scale_x_log10() +theme_classic() +geom_vline(xintercept = mitoRatio_cutoff) +facet_wrap(. ~ orig.ident) +ggtitle("Percentage of mitochondrial gene expression per cell in raw dataset")
Code
# Visualize the distribution of mitochondrial gene expression per cell (violin plot)metadata0 %>%ggplot(aes(x = orig.ident, y = mitoRatio, fill = orig.ident)) +geom_violin() +geom_boxplot(width =0.1, fill =alpha("white", 0.7)) +theme_classic() +geom_hline(yintercept =c(mitoRatio_cutoff)) +theme(axis.text.x =element_text(angle =45, vjust =1, hjust =1)) +theme(plot.title =element_text(hjust =0.5, face ="bold"))
Code
cat("## Ribosomal ratio \n")cat("We evaluate overall ribosomal gene expression. Different cells types are expected to have different levels of ribosomal expression. Due to this, there is no suggested cutoff for ribosomal ratio. We merely expect it to be similar among samples with similar cellular composition. Note that extremely high levels can indicate low quality reads. \n")# Visualize the distribution of mitochondrial gene expression detected per cellmetadata0 %>%ggplot(aes(color = orig.ident, x = riboRatio, fill = orig.ident)) +geom_density(alpha =0.2) +scale_x_log10() +theme_classic() +facet_wrap(. ~ orig.ident) +ggtitle("Percentage of ribosomal gene expression per cell in raw dataset")
UMIs vs. Genes
By plotting the number of UMIs per cell (x-axis) vs. the number of genes per cell (y-axis), we can visually assess whether there is a large proportion of low quality cells with low read counts and/or gene detection (bottom left quadrant of the plot). In the following representation, cells are further color-coded based on the percentage of mitochondrial genes found among total detected genes. The line for nUMI is at 0 and the line for nGene is at 0.
Code
# Visualize the correlation between genes detected and number of UMIs and determine whether strong presence of cells with low numbers of genes/UMIsmetadata0 %>%ggplot(aes(x = nUMI, y = nGene, color = mitoRatio)) +geom_point() +stat_smooth(method = lm) +scale_x_log10() +scale_y_log10() +theme_classic() +geom_vline(xintercept = nCount_RNA_cutoff) +geom_hline(yintercept = nFeature_RNA_cutoff) +ggtitle("Genes vs. nUMIs in raw dataset") +facet_wrap(~orig.ident)
Complexity
Another way to assess the quality and purity of a single-cell dataset is to look for cells that have fewer detected genes per UMI than others. Typical values for this metric are >0.8 for most cells. Cells with lower diversity in the genes they express may be low-complexity cell types such as red blood cells. With sorted populations, we expect high purity and a very similar complexity distribution across samples. The line is at 0.
Code
# Visualize the overall novelty of the gene expression by visualizing the genes detected per UMImetadata0 %>%ggplot(aes(x = Log10GenesPerUMI, color = orig.ident, fill = orig.ident)) +geom_density(alpha =0.2) +theme_classic() +geom_vline(xintercept = Log10GenesPerUMI_cutoff) +facet_wrap(. ~ orig.ident) +ggtitle("log10(Genes per UMI) in raw dataset")
Code
# Visualize the overall novelty of the gene expression by visualizing the genes detected per UMI (boxplot)metadata0 %>%ggplot(aes(x = orig.ident, Log10GenesPerUMI, fill = orig.ident)) +geom_violin() +geom_boxplot(width =0.1, fill =alpha("white", 0.7)) +theme_classic() +geom_hline(yintercept = Log10GenesPerUMI_cutoff) +theme(axis.text.x =element_text(angle =45, vjust =1, hjust =1)) +theme(plot.title =element_text(hjust =0.5, face ="bold"))
QC metrics: Filtered data
Based on the above QC metrics, we filtered the dataset to isolate cells passing the following thresholds: >0 UMIs, >0 genes, <1 mitochondrial gene ratio, and >0 complexity.
In this section, we review QC metrics for our filtered dataset.
metadata1 %>%ggplot(aes(color = orig.ident, x = nUMI, fill = orig.ident)) +geom_density(alpha =0.2) +scale_x_log10() +theme_classic() +ylab("Cell density") +xlab("nUMI") +facet_wrap(. ~ orig.ident) +ggtitle("UMIs per cell in filtered dataset")
Code
# Visualize the distribution of nUMIs per cell (boxplot)metadata1 %>%ggplot(aes(x = orig.ident, y =log10(nUMI), fill = orig.ident)) +geom_violin() +geom_boxplot(width =0.1, fill =alpha("white", 0.7)) +theme_classic() +theme(axis.text.x =element_text(angle =45, vjust =1, hjust =1)) +theme(plot.title =element_text(hjust =0.5, face ="bold"))
Genes per cell
Code
# Visualize the distribution of genes detected per cell via histogrammetadata1 %>%ggplot(aes(color = orig.ident, x = nGene, fill = orig.ident)) +geom_density(alpha =0.2) +theme_classic() +scale_x_log10() +xlab("nGene") +facet_wrap(. ~ orig.ident) +ggtitle("Detected genes per cell in filtered dataset")
Code
# Visualize the distribution of genes detected per cell (boxplot)metadata1 %>%ggplot(aes(x = orig.ident, y =log10(nGene), fill = orig.ident)) +geom_violin() +geom_boxplot(width =0.1, fill =alpha("white", 0.7)) +theme_classic() +theme(axis.text.x =element_text(angle =45, vjust =1, hjust =1)) +theme(plot.title =element_text(hjust =0.5, face ="bold"))
Mitochondrial ratio
Code
# Visualize the distribution of mitochondrial gene expression detected per cellmetadata1 %>%ggplot(aes(color = orig.ident, x = mitoRatio, fill = orig.ident)) +geom_density(alpha =0.2) +scale_x_log10() +theme_classic() +geom_vline(xintercept =0.1)
Code
# +# facet_wrap(. ~ surgery)
Code
# Visualize the distribution of mitochondrial gene expression per cell (violin plot)metadata1 %>%ggplot(aes(x = orig.ident, y = mitoRatio, fill = orig.ident)) +geom_violin() +geom_boxplot(width =0.1, fill =alpha("white", 0.7)) +theme_classic() +geom_hline(yintercept =c(mitoRatio_cutoff)) +theme(axis.text.x =element_text(angle =45, vjust =1, hjust =1)) +theme(plot.title =element_text(hjust =0.5, face ="bold"))
Code
cat("## Ribosomal ratio \n")# Visualize the distribution of ribosomal gene expression detected per cellmetadata1 %>%ggplot(aes(color = orig.ident, x = riboRatio, fill = orig.ident)) +geom_density(alpha =0.2) +scale_x_log10() +theme_classic() +facet_wrap(. ~ orig.ident) +ggtitle("Percentage of ribosomal gene expression per cell in filtered dataset")
Code
metadata1 %>%ggplot(aes(x = orig.ident, y = riboRatio, fill = orig.ident)) +geom_violin() +geom_boxplot(width =0.1, fill =alpha("white", 0.7)) +theme_classic() +theme(axis.text.x =element_text(angle =45, vjust =1, hjust =1)) +theme(plot.title =element_text(hjust =0.5, face ="bold"))
UMIs vs. Genes
Code
metadata1 %>%ggplot(aes(x = nUMI, y = nGene, color = mitoRatio)) +geom_point() +stat_smooth(method = lm) +scale_x_log10() +scale_y_log10() +theme_classic() +ggtitle("Genes vs. nUMIs in filtered dataset") +xlab("nUMI") +ylab("nGene") +facet_wrap(~orig.ident)
Complexity
Code
# Visualize the overall novelty of the gene expression by visualizing the genes detected per UMImetadata1 %>%ggplot(aes(x = Log10GenesPerUMI, color = orig.ident, fill = orig.ident)) +geom_density(alpha =0.2) +theme_classic() +facet_wrap(. ~ orig.ident) +ggtitle("log10(Genes per UMI) in filtered dataset")
Code
# Visualize the distribution of nUMIs per cell (boxplot)metadata1 %>%ggplot(aes(x = orig.ident, Log10GenesPerUMI, fill = orig.ident)) +geom_violin() +geom_boxplot(width =0.1, fill =alpha("white", 0.7)) +theme_classic() +theme(axis.text.x =element_text(angle =45, vjust =1, hjust =1)) +theme(plot.title =element_text(hjust =0.5, face ="bold"))