DEGpattern visualizations

Author

Harvard Chan Bioinformatics Core

Published

July 12, 2025

1 Overview of this report

Template developed with materials in HBC training: Intro-to-DGE.

Default test data was originally from this paper, required raw data can be downloaded with links (Salmon data, Annotation file).

Steps taking from raw data to intermediate files required for this visualization can be found in Data_prep.R and are adapted from two main DGE training materials: data set up; count normalization.

Three intermediate files required for this tutorial are .rds files containing:

  • deseq_obj: a DESeq2 object formatted from your tximport
  • deseq_meta: a data.frame specifying the sample groups of interest
  • deseq_deg: a named vector with Differentially Expressed Genes (DEG) as the name and adjusted p value as the value.

All test data can be found in bcbioR test data github repo.

There are two additional parameters can be tuned in generating deseq_deg from the original DESeq2 results:

  • padj.cutoff: cutoff for adjusted p-value of DESeq results; Default: 0.05
  • topN: A second filtering after padj.cutoff to keep only top significant genes for clustering for computing efficiency. If number of significant genes are less than the number supplied here, all genes will be used for clustering. Default: 1000
Warning: replacing previous import 'S4Arrays::makeNindexFromArrayViewport' by
'DelayedArray::makeNindexFromArrayViewport' when loading 'SummarizedExperiment'

2 Identifying clusters of genes with shared expression profiles

A good next step is to identify groups of genes that share a pattern of expression change across the sample groups (levels).

To do this we will be using a clustering tool called degPatterns from the DEGreport package. The degPatterns tool uses a hierarchical clustering approach based on pair-wise correlations between genes, then cuts the hierarchical tree to generate groups of genes with similar expression profiles. The tool cuts the tree in a way to optimize the diversity of the clusters, such that the variability inter-cluster > the variability intra-cluster.

The rlog transformed counts for the significant genes are input to degPatterns along with a few additional arguments:

  • metadata: the metadata dataframe that corresponds to samples
  • time: character column name in metadata that will be used as variable that changes
  • col: character column name in metadata to separate samples

The genes have been clustered into four different groups. For each group of genes, we have a boxplot illustrating expression change across the different sample groups. A line graph is overlayed to illustrate the trend in expression change.

3 Zoom in a specific cluster of genes

Since we are interested in Group 1, we can filter the dataframe to keep only those genes:

After extracting a group of genes, we can use annotation packages to obtain additional information. We can also use these lists of genes as input to downstream functional analysis tools to obtain more biological insight and see whether the groups of genes share a specific function.

This lesson has been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Materials and hands-on activities were adapted from RNA-seq workflow on the Bioconductor website

4 Conclusions

5 Methods

5.1 R package references

To cite package ‘DEGreport’ in publications use:

Pantano L (2025). DEGreport: Report of DEG analysis. doi:10.18129/B9.bioc.DEGreport https://doi.org/10.18129/B9.bioc.DEGreport, R package version 1.44.0, https://bioconductor.org/packages/DEGreport.

A BibTeX entry for LaTeX users is

@Manual{, title = {DEGreport: Report of DEG analysis}, author = {Lorena Pantano}, year = {2025}, note = {R package version 1.44.0}, url = {https://bioconductor.org/packages/DEGreport}, doi = {10.18129/B9.bioc.DEGreport}, } To cite package ‘DESeq2’ in publications use:

Love, M.I., Huber, W., Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biology 15(12):550 (2014)

A BibTeX entry for LaTeX users is

@Article{, title = {Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2}, author = {Michael I. Love and Wolfgang Huber and Simon Anders}, year = {2014}, journal = {Genome Biology}, doi = {10.1186/s13059-014-0550-8}, volume = {15}, issue = {12}, pages = {550}, } To cite ggplot2 in publications, please use

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

A BibTeX entry for LaTeX users is

@Book{, author = {Hadley Wickham}, title = {ggplot2: Elegant Graphics for Data Analysis}, publisher = {Springer-Verlag New York}, year = {2016}, isbn = {978-3-319-24277-4}, url = {https://ggplot2.tidyverse.org}, } To cite package ‘dplyr’ in publications use:

Wickham H, François R, Henry L, Müller K, Vaughan D (2023). dplyr: A Grammar of Data Manipulation. doi:10.32614/CRAN.package.dplyr https://doi.org/10.32614/CRAN.package.dplyr, R package version 1.1.4, https://CRAN.R-project.org/package=dplyr.

A BibTeX entry for LaTeX users is

@Manual{, title = {dplyr: A Grammar of Data Manipulation}, author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller and Davis Vaughan}, year = {2023}, note = {R package version 1.1.4}, url = {https://CRAN.R-project.org/package=dplyr}, doi = {10.32614/CRAN.package.dplyr}, }

5.2 R session

List and version of tools used for the QC report generation.

R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] ggprism_1.0.6               grafify_5.0.0.1            
 [3] R.utils_2.13.0              R.oo_1.27.1                
 [5] R.methodsS3_1.8.2           glue_1.8.0                 
 [7] knitr_1.50                  ggplot2_3.5.2              
 [9] dplyr_1.1.4                 DESeq2_1.48.1              
[11] SummarizedExperiment_1.38.1 Biobase_2.68.0             
[13] MatrixGenerics_1.20.0       matrixStats_1.5.0          
[15] GenomicRanges_1.60.0        GenomeInfoDb_1.44.0        
[17] IRanges_2.42.0              S4Vectors_0.46.0           
[19] BiocGenerics_0.54.0         generics_0.1.4             
[21] DEGreport_1.44.0           

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3          ggdendro_0.2.0             
  [3] rstudioapi_0.17.1           jsonlite_2.0.0             
  [5] shape_1.4.6.1               magrittr_2.0.3             
  [7] estimability_1.5.1          farver_2.1.2               
  [9] nloptr_2.2.1                rmarkdown_2.29             
 [11] GlobalOptions_0.1.2         vctrs_0.6.5                
 [13] minqa_1.2.8                 base64enc_0.1-3            
 [15] htmltools_0.5.8.1           S4Arrays_1.8.1             
 [17] broom_1.0.8                 SparseArray_1.8.0          
 [19] Formula_1.2-5               sass_0.4.10                
 [21] bslib_0.9.0                 htmlwidgets_1.6.4          
 [23] plyr_1.8.9                  cachem_1.1.0               
 [25] emmeans_1.11.2              lifecycle_1.0.4            
 [27] iterators_1.0.14            pkgconfig_2.0.3            
 [29] Matrix_1.7-3                R6_2.6.1                   
 [31] fastmap_1.2.0               GenomeInfoDbData_1.2.14    
 [33] rbibutils_2.3               clue_0.3-66                
 [35] digest_0.6.37               numDeriv_2016.8-1.1        
 [37] colorspace_2.1-1            reshape_0.8.10             
 [39] patchwork_1.3.1             crosstalk_1.2.1            
 [41] Hmisc_5.2-3                 labeling_0.4.3             
 [43] httr_1.4.7                  abind_1.4-8                
 [45] mgcv_1.9-3                  compiler_4.5.1             
 [47] withr_3.0.2                 doParallel_1.0.17          
 [49] htmlTable_2.4.3             ConsensusClusterPlus_1.72.0
 [51] backports_1.5.0             BiocParallel_1.42.1        
 [53] carData_3.0-5               psych_2.5.6                
 [55] MASS_7.3-65                 DelayedArray_0.34.1        
 [57] rjson_0.2.23                tools_4.5.1                
 [59] foreign_0.8-90              nnet_7.3-20                
 [61] nlme_3.1-168                grid_4.5.1                 
 [63] checkmate_2.3.2             cluster_2.1.8.1            
 [65] gtable_0.3.6                tidyr_1.3.1                
 [67] data.table_1.17.8           car_3.1-3                  
 [69] XVector_0.48.0              ggrepel_0.9.6              
 [71] foreach_1.5.2               pillar_1.11.0              
 [73] stringr_1.5.1               limma_3.64.1               
 [75] logging_0.10-108            circlize_0.4.16            
 [77] splines_4.5.1               lattice_0.22-7             
 [79] tidyselect_1.2.1            ComplexHeatmap_2.24.1      
 [81] locfit_1.5-9.12             reformulas_0.4.1           
 [83] gridExtra_2.3               edgeR_4.6.3                
 [85] xfun_0.52                   statmod_1.5.0              
 [87] DT_0.33                     stringi_1.8.7              
 [89] UCSC.utils_1.4.0            yaml_2.3.10                
 [91] boot_1.3-31                 evaluate_1.0.4             
 [93] codetools_0.2-20            tibble_3.3.0               
 [95] cli_3.6.5                   rpart_4.1.24               
 [97] xtable_1.8-4                Rdpack_2.6.4               
 [99] jquerylib_0.1.4             Rcpp_1.1.0                 
[101] png_0.1-8                   parallel_4.5.1             
[103] lme4_1.1-37                 mvtnorm_1.3-3              
[105] lmerTest_3.1-3              scales_1.4.0               
[107] purrr_1.0.4                 crayon_1.5.3               
[109] GetoptLong_1.0.5            rlang_1.1.6                
[111] cowplot_1.2.0               mnormt_2.1.1