Bisulfite sequencing estimates DNA methylation by comparing converted and unconverted cytosines. After bisulfite treatment, unmethylated cytosines are read as thymines, while methylated cytosines remain cytosines. The ratio of methylated reads to total informative reads is used as the methylation estimate.
PreprocessQC, adapter removal and protocol-specific trimming.
Map and callBisulfite-aware alignment and methylation extraction.
InterpretCoverage filtering, DMP/DMR testing, annotation and reporting.
Core principle: bisulfite-seq is not ordinary DNA-seq. The conversion chemistry changes sequence complexity, strand interpretation and quality-control expectations.
2. Bisulfite-seq assay types
Assay
Objective
Analysis focus
WGBS
Genome-wide methylation profiling.
Large data volume, genome-wide coverage, conversion efficiency and DMR discovery.
RRBS
CpG-rich reduced representation.
Restriction-enzyme bias, M-bias, RRBS-specific trimming and region-specific coverage.
Targeted bisulfite-seq
Selected loci or panels.
Target coverage, primer handling and locus-level interpretation.
Amplicon bisulfite-seq
Deep methylation profiling of specific regions.
Per-amplicon QC, overlap handling and high-depth interpretation.
3. Experimental design
Before analysis, define the biological comparison, replicate structure, batch variables, methylation context and reporting scope.
Clarify whether the analysis targets CpG, CHG, CHH or all cytosines.
Use balanced biological replicates across conditions and batches.
Record library type: directional, non-directional, PBAT-like, RRBS or targeted.
Define minimum coverage and whether single-site or region-level testing is preferred.
Record whether spike-in controls are available for conversion-efficiency estimation.
4. Inputs and metadata
Input
Format
Use
Raw reads
FASTQ.GZ
Sequencing reads after bisulfite conversion.
Reference genome
FASTA
Used to build bisulfite-converted indexes.
Sample sheet
TSV/CSV
Defines groups, batches, library type and FASTQ paths.
Gene annotation
GTF/GFF3
Promoter, gene body and transcript annotation.
Feature regions
BED
CpG islands, promoters, enhancers, targets or custom regions.
Example sample sheet
sample_id group batch assay library_type fastq_1 fastq_2
S1 control A WGBS directional S1_R1.fastq.gz S1_R2.fastq.gz
S2 control A WGBS directional S2_R1.fastq.gz S2_R2.fastq.gz
S3 treated B WGBS directional S3_R1.fastq.gz S3_R2.fastq.gz
S4 treated B WGBS directional S4_R1.fastq.gz S4_R2.fastq.gz
5. FASTQ quality control
Bisulfite-seq base composition can look unusual because conversion changes C/T balance. Interpret FastQC warnings in the context of the assay.
Inspect read quality, adapter content and overrepresented sequences.
Check duplication, but interpret it differently for WGBS, RRBS and amplicons.
Compare read counts and quality between samples and between R1/R2.
Look for signs of short inserts or protocol-specific sequence bias.
6. Bisulfite-specific trimming
Trimming removes adapters and biased bases. RRBS often needs special treatment because restriction-enzyme and fill-in steps can create biased methylation at read ends.
The directional or non-directional setting must match the library protocol. Wrong strand settings can cause low mapping or distorted methylation calls.
9. Deduplication
Duplicate removal is common in WGBS, but RRBS and amplicon bisulfite-seq require assay-aware interpretation because many reads may start at the same coordinate by design.
Low rate can indicate wrong reference, contamination, poor quality or incorrect strand setting.
Conversion efficiency
Completeness of bisulfite conversion.
Poor conversion can create false methylation.
M-bias
Position-specific methylation bias.
May require additional end trimming.
CpG coverage
Depth at methylation sites.
Low coverage increases uncertainty; extreme coverage may reflect bias.
Global methylation
Sample-level methylation distribution.
Outliers may be biological or technical and should be investigated.
12. Differential methylation analysis
Differential methylation can be tested at single cytosines or across regions. Region-level analysis is often more robust and easier to interpret biologically.
DMPsSingle cytosines with methylation differences.
DMRsClusters of CpGs with coordinated methylation changes.
RegionsPromoters, CpG islands, enhancers, gene bodies or custom regions.
Common R/Bioconductor approaches include methylKit, DSS, bsseq and DMRcate. Selection depends on coverage, design complexity, sample size and whether smoothing is appropriate.
13. DMR annotation
Annotation connects methylation changes to genomic features, but proximity does not prove regulatory causality. Interpret DMRs together with biological context and, where possible, expression or chromatin data.
PromotersMethylation near transcription start sites can be linked to gene regulation.
CpG islandsImportant regulatory features often evaluated with shores and shelves.
EnhancersIntegration with ATAC-seq or ChIP-seq strengthens interpretation.
Gene bodiesInterpretation is context-dependent and should be handled carefully.
Common visualization outputs include bedGraph and bigWig methylation tracks, DMR heatmaps, global methylation distributions, PCA/clustering and browser snapshots of key loci.
It is the computational analysis of bisulfite-converted DNA sequencing reads to estimate DNA methylation levels at cytosines, most often CpG sites, and to identify methylation differences between samples or groups.
Why does bisulfite-seq need special alignment?
Bisulfite treatment converts unmethylated cytosines to uracils, which are read as thymines. This changes read composition and requires bisulfite-aware genome indexing, alignment and methylation calling.
What is the difference between WGBS and RRBS?
WGBS profiles methylation genome-wide, while RRBS enriches CpG-rich regions using restriction digestion and size selection. RRBS is cheaper but covers a reduced and protocol-dependent part of the genome.
Which tools are commonly used?
Common tools include FastQC, MultiQC, Trim Galore, Cutadapt, Bismark, BS-Seeker2, bwa-meth, MethylDackel, methylKit, DSS, bsseq, DMRcate, BEDTools and genome-browser utilities.
What is conversion efficiency?
Conversion efficiency measures how completely unmethylated cytosines were converted. Poor conversion can cause false methylation calls, so spike-ins or other conversion checks are important where available.
Should duplicates be removed?
For WGBS, duplicate removal is commonly performed. For RRBS, targeted or amplicon bisulfite sequencing, duplicate handling must consider protocol design because identical genomic starts may be expected.
What are DMPs and DMRs?
DMPs are differentially methylated positions, usually single cytosines. DMRs are regions containing multiple CpGs with coordinated methylation differences between groups.
Can AI help with bisulfite-seq?
AI can help summarize QC, flag outliers, draft reports and interpret DMR-associated genes or pathways, while the core methylation calling and statistics should remain reproducible and auditable.
Privacy noticeWe process contact-form data only to respond to your enquiry. Please review our Privacy Policy for details.