NGS FAQ

Identification of driver mutations in cancer.

Cancer genomes usually contain many somatic alterations, but only a subset contribute to tumour initiation, progression, metastasis or therapy resistance. Identifying driver mutations requires careful somatic variant calling, artefact filtering, annotation, biological prioritisation and, where possible, evidence from recurrence, function, pathways and clinical knowledgebases.

Quick answer

Driver-mutation identification starts with high-quality somatic variant detection and ends with evidence-based prioritisation. In practice, this means distinguishing true tumour alterations from germline variants and technical artefacts, annotating their functional consequences, checking whether they occur in known cancer genes or hotspots, evaluating tumour-type relevance and, where possible, integrating copy-number, structural-variant, fusion, RNA-seq and pathway evidence.

Practical rule: a candidate cancer driver should have one or more strong evidence signals: known oncogenic status, recurrence in the same tumour type, hotspot location, damaging loss-of-function in a tumour suppressor, activating event in an oncogene, copy-number or fusion evidence, pathway-level support, therapeutic relevance, or experimental/clinical support in curated databases.

Driver mutations vs passenger mutations

Tumours accumulate many mutations over time. Some are biologically important; many are passengers. The challenge is to identify alterations that contribute to cancer cell fitness, tumour biology or therapeutic response.

Driver mutation Promotes cancer initiation, progression, survival, invasion, metastasis, immune evasion or resistance. Examples include activating oncogene mutations, inactivating tumour-suppressor mutations, gene fusions and copy-number changes affecting cancer pathways.
Passenger mutation Present in the tumour genome but not causally important for the cancer phenotype. Passenger mutations can be common in tumours with high mutation burden or defective repair pathways.
A variant can be a true somatic mutation without being a cancer driver. Somatic status and driver status are separate interpretation steps.

Data types used to identify cancer drivers

Data type Driver evidence it can provide Typical outputs
Matched tumour-normal WGS Broadest DNA view
SNVs, indels, CNVs, structural variants, rearrangements, mutational signatures and non-coding events.
Somatic VCF, CNV segments, SV calls, signature profiles, genome-wide QC.
Matched tumour-normal WES Coding variants
SNVs and indels in coding regions, useful for known cancer genes and tumour mutational burden estimates.
Somatic coding variants, annotated consequences, gene-level summaries.
Targeted cancer panel Actionable genes
High-depth analysis of selected driver genes, hotspots, fusions or copy-number regions.
Panel variant report, hotspot calls, actionable alteration list.
RNA-seq Functional expression
Expression of altered genes, fusion transcripts, splice consequences, pathway activation and tumour subtype signatures.
Expression matrix, fusion calls, splice events, pathway scores.
Copy-number profiling Gene dosage
Amplification of oncogenes or deletion of tumour suppressors.
Segment files, gene-level CNV tables, focal amplification/deletion calls.
Structural-variant analysis Genome rearrangements
Fusions, enhancer hijacking, promoter swaps, deletions, inversions and translocations.
SV calls, breakpoint tables, gene-fusion candidates.
Methylation / epigenomics Regulatory disruption
Promoter silencing, tumour classification signatures and pathway-level regulatory changes.
Methylation profiles, DMRs, classifier outputs, regulatory annotation.

Practical workflow for identifying driver mutations

1. QC and preprocessing Assess FASTQ quality, contamination, mapping, coverage, tumour purity, duplication and sample identity.
2. Somatic calling Call SNVs, indels, CNVs, SVs and fusions using tumour-normal or tumour-only workflows.
3. Filtering Remove artefacts, low-support calls, common germline variants and context-specific noise.
4. Annotation Annotate effect, gene role, hotspot status, databases, population frequency and clinical evidence.
5. Prioritisation Rank candidate drivers using oncogenicity, tumour-type relevance, recurrence and functional impact.
6. Integration Add RNA expression, fusions, copy number, methylation, pathways and mutational signatures.
7. Review Inspect key variants manually and check literature or curated knowledgebases.
8. Report Provide transparent evidence levels, limitations, methods, parameters and candidate-driver rationale.

Evidence signals used to prioritise cancer drivers

Evidence signal How it supports driver status Example interpretation
Known oncogenic hotspot Recurrent mutation at a specific residue or domain suggests positive selection. Activating hotspot in an oncogene is stronger evidence than a random missense variant.
Loss-of-function in tumour suppressor Truncating mutation, frameshift, splice disruption or deletion may inactivate a tumour suppressor. More convincing if accompanied by second hit, loss of heterozygosity or deletion.
Copy-number amplification Focal high-level amplification can increase oncogene dosage. More convincing when RNA expression of the amplified gene is also elevated.
Homozygous deletion Deletion of both copies can inactivate a tumour suppressor. Needs careful evaluation of purity, ploidy and segmentation quality.
Gene fusion Can create an activated kinase, abnormal transcription factor or promoter-driven overexpression. RNA-seq confirmation strengthens evidence for an expressed fusion transcript.
Pathway convergence Multiple alterations in the same pathway can support biological relevance. MAPK, PI3K, DNA-repair, cell-cycle or immune-evasion pathways are common examples.
Cohort recurrence Genes or positions mutated more often than expected by background mutation processes are candidate drivers. Useful in cohort studies, but must control for mutation rate, gene length and tumour type.
Curated evidence Databases and literature can classify oncogenicity or clinical significance. OncoKB, CIViC and COSMIC can support interpretation when used with tumour-type context.

Single-patient analysis vs cohort driver discovery

The strategy differs depending on whether the goal is to prioritise driver alterations in one tumour or discover driver genes across a cohort.

Single tumour or clinical-style report
  • Prioritise known oncogenic and actionable alterations.
  • Use tumour type, variant allele fraction, purity and copy number.
  • Prefer matched normal DNA where possible.
  • Integrate RNA-seq for expression, fusions and splicing evidence.
  • Report evidence level and limitations clearly.
Research cohort discovery
  • Run consistent calling across all samples.
  • Estimate background mutation rates and mutation signatures.
  • Test gene-level and position-level recurrence.
  • Consider pathway-level enrichment and mutual exclusivity.
  • Validate top candidates experimentally or in independent cohorts.

Annotation resources and knowledgebases

Driver identification is strengthened by curated resources, but database hits must be interpreted carefully. Some resources are gene-level, others are variant-level; some focus on biological oncogenicity, others on clinical actionability.

Resource Useful for Interpretation note
COSMIC / Cancer Gene Census Known cancer genes, somatic mutation catalogues, cancer-gene curation. Excellent for gene-level and mutation-context evidence, but not every variant in a Cancer Gene Census gene is automatically a driver.
OncoKB Oncogenicity, biological effect, clinical actionability and evidence levels. Useful for precision-oncology-style interpretation and therapy relevance.
CIViC Open, evidence-based interpretations of cancer variants from the literature. Useful for transparent, literature-supported clinical and biological variant interpretation.
ClinVar / ClinGen Clinical variant assertions, especially germline interpretation. Useful when hereditary predisposition or germline findings are relevant.
gnomAD and population databases Filtering common germline variants. Population frequency is not enough alone; artefact context and tumour biology also matter.
TCGA, ICGC and public cohorts Tumour-type recurrence, subtype context and comparative analysis. Useful for benchmarking a cohort or placing a tumour in broader cancer-genomic context.

Why DNA-seq alone may not be enough

DNA-seq identifies candidate alterations, but RNA-seq and other omics layers can reveal whether those alterations are expressed or functionally active. This is especially important for fusions, splice-site variants, copy-number amplifications, enhancer or promoter changes, and pathway activation.

DNA alteration Mutation, deletion, amplification, rearrangement or fusion breakpoint.
RNA consequence Expression, allele-specific expression, fusion transcript or abnormal splicing.
Functional context Pathway activity, tumour subtype, mutational signature or therapeutic relevance.
Example: a DNA-level fusion breakpoint is more convincing as a functional driver if RNA-seq confirms an expressed in-frame fusion transcript involving a known oncogenic domain.

Common pitfalls in driver-mutation identification

Calling every cancer-gene variant a driver Gene-level cancer association is not equivalent to variant-level oncogenicity.
Ignoring matched normal DNA Tumour-only workflows can misclassify germline variants as somatic candidates.
Ignoring tumour purity and ploidy Variant allele fraction, copy number and clonality are difficult to interpret without purity and ploidy context.
Overtrusting prediction scores In silico pathogenicity predictions are helpful but insufficient for driver classification by themselves.
Missing non-SNV drivers Drivers can be CNVs, structural variants, fusions, splice events, promoter mutations or epigenetic changes.
Ignoring tumour type The same alteration may have different biological or clinical meaning depending on cancer type and subtype.

AI integration for driver-mutation prioritisation

AI can support cancer driver analysis by combining structured annotations, curated databases, published literature, pathway knowledge, sequence context and multi-omics results. It can be especially useful for ranking candidate variants, summarising evidence and producing transparent review-ready reports.

Useful AI-supported tasks
  • Literature triage for candidate variants and genes.
  • Evidence summarisation from curated resources.
  • Prioritisation of variants by biological plausibility.
  • Integration of DNA-seq, RNA-seq and copy-number evidence.
  • Report drafting and review checklists.
Required safeguards
  • Use versioned databases and reproducible workflows.
  • Keep human scientific review in the loop.
  • Separate research interpretation from clinical claims.
  • Document evidence sources and uncertainty.
  • Validate important findings where needed.

How SciBerg can support driver-mutation analysis

SciBerg can help research and industrial partners identify and prioritise candidate driver mutations from NGS data using reproducible bioinformatics workflows and evidence-based interpretation.

  • FASTQ, BAM/CRAM, VCF and sample-metadata QC.
  • Somatic SNV and indel calling from tumour-normal or tumour-only data.
  • Copy-number, structural-variant and fusion analysis where data support it.
  • Variant annotation and prioritisation using cancer gene resources.
  • RNA-seq integration for expression, splicing and fusion validation.
  • Candidate driver tables with rationale and evidence categories.
  • AI-assisted literature and knowledgebase review with documented sources.
  • Transparent scripts, software versions, parameters and final reports.

Frequently asked questions

What is a driver mutation in cancer?

A driver mutation is a genomic alteration that gives a cancer cell a selective growth, survival, invasion, immune-evasion or treatment-resistance advantage. It contributes to cancer development or progression rather than simply being a by-product of genomic instability.

What is the difference between a driver and a passenger mutation?

A driver mutation contributes to the cancer phenotype, while a passenger mutation is present in the tumour genome but does not materially promote tumour growth or survival. Distinguishing them requires biological, statistical and clinical evidence.

Can a mutation be a driver in one cancer type but not another?

Yes. Driver status is context-dependent. The same alteration may be strongly oncogenic in one tumour type, weakly relevant in another, or clinically irrelevant if the affected pathway is not active in that cellular context.

Is every mutation in a known cancer gene a driver?

No. A gene may be a known cancer gene, but not every variant in that gene is oncogenic. Variant-level interpretation is essential, especially for missense variants and variants of uncertain significance.

Can tumour-only sequencing identify driver mutations?

Tumour-only sequencing can identify candidate driver alterations, but matched normal DNA is preferred because it helps distinguish somatic mutations from germline variants and technical artefacts.

How can RNA-seq help identify driver mutations?

RNA-seq can show whether a mutated gene is expressed, detect expressed fusion transcripts, reveal splicing consequences, and provide pathway-level evidence that complements DNA-seq.

Can AI identify cancer driver mutations?

AI can help prioritise candidate drivers by integrating sequence context, variant annotations, databases, literature, pathway information and multi-omics data. However, AI output should be reviewed, documented and validated rather than treated as final evidence.

Selected resources and documentation

The following resources are useful starting points for cancer-driver annotation, somatic-variant calling and evidence-based cancer-variant interpretation.