Inputs
- Germline heterozygous SNPs, informative for purity/ploidy/clone estimation Need estimation for tumor-only
- Copy number calls – GC corrected and normalized (to normal or process-matched normal)
- Split copy number calls into major/minor alleles, potentially with multiple states
- Somatic variant calls with allele frequencies, for tumor subclones
- Estimate subclones from somatic calls + major/minor CNVs
Challenges
- Heterogeneous input samples ranging from WGS tumor/normal to panel/capture tumor-only, would like to have similar workflow to handle most cases
- Lack of good truth sets, so hard to determine if truth sets work well
- Most tools not fully automated and require some decision making during the process
Example figures
-
Overview of problem Figure 1: http://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0602-8
-
sequencing levels required for reconstruction depending on clonal complexity Figure 5: http://www.cell.com/action/showImagesData?pii=S2405-4712%2815%2900113-1 Figure 6: http://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0602-8
Tools
Purity, CNV allelic copy number
PureCN – purity/ploidy, classify variants as germline/somatic clonal/subclonal, panels/exome > 100x, support for no matching normals but needs process-matched normal BAM http://bioconductor.org/packages/release/bioc/html/PureCN.html
BubbleTree: purity, LOH and subclonality WGS/exome, from heterozygous variants and CNVs http://www.bioconductor.org/packages/release/bioc/html/BubbleTree.html
TitanCNA – WGS/exome; heterozygous variants => purity, CNVs into major/minor subclones, LOH https://github.com/gavinha/TitanCNA
Battenberg – WGS tumor/normal; BAMs => purity, CNV caller into major/minor subclones https://github.com/cancerit/cgpBattenberg
Subclonal reconstruction
PhyloWGS – subclone and tumor evolution from Battenberg or TITAN output (major/minor allele CN)+ VCFs https://github.com/morrislab/phylowgs
SciClone – exome/WGS: somatic CNV calls + variants. Uses only variants in CN=2 CN=2 regions, requires a relatively stable genome to have enough events. https://github.com/genome/sciclone https://github.com/hdng/clonevol
Canopy – exome/WGS: allele specific CNVs + variants. Recommend using Sequenza as input segmented CNV calls. https://github.com/yuchaojiang/Canopy https://github.com/yuchaojiang/Canopy/blob/master/instruction/SNA_CNA_input.md
Guan Lab, U of M – Somatic variants and CNVs, SMC-Het winner but demonstration implementation only https://www.synapse.org/#!Synapse:syn6087005/wiki/398911
THetA – integrated, CNVs only, academic only in latest version https://github.com/raphael-group/THetA
PyClone – academic only https://bitbucket.org/aroth85/pyclone/wiki/Home
CNV callers
- CNVkit https://github.com/etal/cnvkit
- Seq2C https://github.com/AstraZeneca-NGS/Seq2C
- Canvas https://github.com/Illumina/Canvas
Validation
tHapMix simulation – WGS tumor/normal https://github.com/Illumina/tHapMix ICGC-TCGA DREAM Tumor Heterogeneity Challenge (VCF + Battenberg stratified output), no truth sets available https://www.synapse.org/#!Synapse:syn2813581/wiki/303137
Remove artifacts
- small variants – Damage assessment
- germline CNVs https://github.com/chapmanb/bcbio-nextgen/issues/963
CNV benchmarking
HCC2218 truth set from Canvas https://github.com/Illumina/Canvas#demo-tumor-normal-enrichment-data GiaB NA24385 CNVs http://biorxiv.org/content/early/2016/12/13/093526
comments powered by Disqus