Chromatin Accessibility Tutorial

ATAC-seq data analysis: from raw reads to accessible chromatin maps.

A practical tutorial for bulk ATAC-seq and chromatin accessibility projects. It covers experimental design, metadata, FASTQ quality control, adapter trimming, alignment, mitochondrial reads, blacklist filtering, Tn5 shift correction, peak calling, TSS enrichment, FRiP, signal tracks, motif analysis, differential accessibility, visualization and reproducible reporting.

1. Overview: what is ATAC-seq data analysis?

ATAC-seq analysis converts sequencing reads from transposase-accessible chromatin into maps of regulatory regions. The analysis identifies open chromatin peaks, evaluates data quality, annotates regulatory elements, discovers enriched motifs and compares chromatin accessibility between conditions.

PreprocessQC, trimming, alignment, filtering and blacklist removal.
Detect accessibilityCall peaks and generate normalized chromatin-accessibility tracks.
Interpret regulationAnnotate peaks, test motifs and integrate gene-expression or epigenomic data.
Core principle: ATAC-seq is a chromatin-accessibility assay. Strong analysis requires both sequencing QC and regulatory-genomics interpretation.

2. ATAC-seq assay principle

ATAC-seq uses Tn5 transposase to insert sequencing adapters preferentially into accessible DNA. Open chromatin regions generate more reads than compact or nucleosome-protected regions. The fragment-size distribution can reveal nucleosome-free and nucleosome-associated fragments.

Nucleosome-free fragmentsShort fragments enriched at highly accessible regulatory sites.
Mono-nucleosome fragmentsLonger fragments reflecting DNA wrapped around one nucleosome.
TSS enrichmentSignal around transcription start sites reflects regulatory accessibility and data quality.
Motif signalAccessible regions can reveal transcription-factor motifs and regulatory programs.

3. ATAC-seq study types

Study typeTypical questionAnalysis focus
Bulk ATAC-seqWhich regulatory regions are accessible in a sample or condition?Peak calling, QC, motif enrichment and differential accessibility.
Low-input ATAC-seqCan accessibility be profiled from limited material?Stringent QC, duplicates, library complexity and contamination checks.
Single-cell ATAC-seqWhich regulatory states exist at cell resolution?Barcode processing, fragments, sparse peak matrices, clustering and cell-type annotation.
Multiome RNA+ATACHow do accessibility and expression relate in the same cells?Joint embedding, peak-gene links and regulatory programs.
Time-course ATAC-seqHow does chromatin accessibility change over time?Dynamic peaks, motif activity and trajectory-like regulatory interpretation.

4. Experimental design

ATAC-seq is sensitive to nuclei preparation, cell viability, mitochondrial DNA, sequencing depth and batch effects. Plan biological replicates and controls before sequencing.

Questions to answer early

  • Is the project bulk ATAC-seq, low-input ATAC-seq, single-cell ATAC-seq or multiome?
  • Which biological conditions, tissues, time points or treatments are compared?
  • How many biological replicates are available per group?
  • Are samples balanced across library-preparation batches and sequencing runs?
  • What reference genome, blacklist and gene annotation will be used?
  • Is the main goal peak discovery, differential accessibility, motif discovery, enhancer mapping or integration with RNA-seq?
  • Are there expected cell-composition changes that may influence bulk ATAC-seq signal?
Bulk ATAC-seq from mixed tissues can reflect changes in cell-type composition as well as changes in regulatory accessibility within cell types.

5. Input files and metadata

InputTypical formatUse
Raw readsFASTQ.GZSequencing reads from ATAC-seq libraries.
Sample metadataTSV/CSVGroups, replicates, batches, tissue and sequencing information.
Reference genomeFASTA and aligner indexesCoordinate system for alignment and peak calling.
Blacklist regionsBEDRegions with recurrent artefactual signal.
Gene annotationGTF/GFF3/BEDPromoter, TSS and gene annotation for interpretation.
Chromosome sizesTSVNeeded for bedGraph/bigWig conversion and genome-browser tracks.
Example ATAC-seq sample sheet
sample_id	group	replicate	batch	tissue	fastq_1	fastq_2
A_rep1	control	1	A	cells	A_rep1_R1.fastq.gz	A_rep1_R2.fastq.gz
A_rep2	control	2	A	cells	A_rep2_R1.fastq.gz	A_rep2_R2.fastq.gz
B_rep1	treated	1	B	cells	B_rep1_R1.fastq.gz	B_rep1_R2.fastq.gz
B_rep2	treated	2	B	cells	B_rep2_R1.fastq.gz	B_rep2_R2.fastq.gz

6. FASTQ quality control

Raw-read QC detects adapter contamination, low-quality bases, uneven read counts and sequencing problems before alignment.

FastQC and MultiQC
mkdir -p results/qc/fastqc results/qc/multiqc

fastqc data/fastq/*.fastq.gz \
  --outdir results/qc/fastqc \
  --threads 8

multiqc results/qc \
  --outdir results/qc/multiqc
  • Inspect per-base quality, adapter content and overrepresented sequences.
  • Compare read counts across samples.
  • Review duplication in context of library complexity and enrichment.
  • Expect short fragments; adapter trimming is often relevant.

7. Adapter trimming

ATAC-seq libraries often contain short inserts, so adapter sequences may be present in reads. Trimming can improve alignment and downstream QC.

Paired-end trimming with fastp
mkdir -p results/trimmed results/qc/fastp

sample="A_rep1"

fastp \
  --in1 "data/fastq/${sample}_R1.fastq.gz" \
  --in2 "data/fastq/${sample}_R2.fastq.gz" \
  --out1 "results/trimmed/${sample}_R1.trimmed.fastq.gz" \
  --out2 "results/trimmed/${sample}_R2.trimmed.fastq.gz" \
  --html "results/qc/fastp/${sample}.html" \
  --json "results/qc/fastp/${sample}.json" \
  --thread 8

8. Alignment

ATAC-seq reads are commonly aligned to the reference genome using Bowtie2, BWA or similar short-read aligners. Paired-end data are especially useful because fragment sizes are biologically informative.

Bowtie2 paired-end alignment
mkdir -p results/alignments results/logs

sample="A_rep1"

bowtie2 \
  -x reference/bowtie2/genome \
  -1 "results/trimmed/${sample}_R1.trimmed.fastq.gz" \
  -2 "results/trimmed/${sample}_R2.trimmed.fastq.gz" \
  -p 16 \
  2> "results/logs/${sample}.bowtie2.log" | \
  samtools sort -@ 8 -o "results/alignments/${sample}.sorted.bam"

samtools index "results/alignments/${sample}.sorted.bam"

9. BAM filtering and blacklist removal

Filtering removes reads that are unmapped, low quality, duplicates, mitochondrial reads or located in known problematic regions. Filtering rules should be documented because they influence peaks and QC metrics.

Common ATAC-seq BAM filtering
mkdir -p results/filtered

sample="A_rep1"

# Keep high-quality, properly mapped nuclear reads.
samtools view -b -q 30 -F 1804 \
  "results/alignments/${sample}.sorted.bam" \
  > "results/filtered/${sample}.mapq30.bam"

samtools index "results/filtered/${sample}.mapq30.bam"

# Remove blacklist regions.
bedtools intersect \
  -v \
  -abam "results/filtered/${sample}.mapq30.bam" \
  -b reference/blacklist.bed \
  > "results/filtered/${sample}.filtered.bam"

samtools index "results/filtered/${sample}.filtered.bam"
Blacklist regions are often removed before final peak calling and FRiP calculation to reduce recurrent artefactual signal.

10. Mitochondrial reads

ATAC-seq often contains mitochondrial DNA because mitochondria are accessible to Tn5. High mitochondrial fraction can reduce usable nuclear reads and indicate sample-preparation issues.

MetricMeaningInterpretation
Mitochondrial read fractionReads mapping to mitochondrial chromosome.High values may indicate damaged cells or suboptimal nuclei preparation.
Nuclear mapped readsReads remaining after mitochondrial removal.Determines usable depth for peak calling.
Sample outlier statusWhether one sample has unusually high mitochondrial content.Investigate sample preparation, viability and batch effects.
Count reads by chromosome
samtools idxstats results/alignments/A_rep1.sorted.bam \
  > results/qc/A_rep1.idxstats.txt

11. Tn5 shift correction

Tn5 transposase inserts adapters with a characteristic offset from the true cut site. Shift correction is often applied before generating cut-site tracks, peak summits or footprinting inputs.

Positive strandOften shifted +4 bp to represent the insertion site.
Negative strandOften shifted -5 bp to represent the insertion site.
Use caseImportant for footprinting, summit refinement and cut-site signal tracks.
DocumentationRecord whether shifted or unshifted files were used for each analysis.
Not every downstream step requires a shifted BAM. Keep file names clear to avoid mixing shifted and unshifted alignments.

12. Insert-size and nucleosome periodicity QC

Paired-end ATAC-seq fragment sizes should often show patterns corresponding to nucleosome-free, mono-nucleosome and multi-nucleosome fragments.

Fragment classTypical interpretationUse
Short fragmentsNucleosome-free accessible regions.Often used for high-resolution peak signal.
Mono-nucleosome fragmentsDNA protected by one nucleosome.Reflects chromatin organization around accessible sites.
Di-/tri-nucleosome fragmentsLonger periodic fragments.Indicates nucleosomal patterning and library quality.
Collect insert-size statistics
samtools stats results/filtered/A_rep1.filtered.bam \
  > results/qc/A_rep1.samtools.stats.txt

multiqc results/qc --outdir results/qc/multiqc_alignment

13. TSS enrichment and FRiP

TSS enrichment and FRiP are two of the most useful ATAC-seq quality metrics.

MetricMeaningInterpretation
TSS enrichmentSignal enrichment around transcription start sites.High enrichment usually indicates strong regulatory signal and good library quality.
FRiPFraction of reads in called peaks.Measures how much of the library falls in accessible regions.
Peak countNumber of accessible regions detected.Depends on cell type, depth, peak caller and filtering.
Replicate concordanceAgreement between biological replicates.Important for confidence in peaks and differential accessibility.

14. Peak calling

Peak calling identifies accessible regions enriched for ATAC-seq fragments. MACS2 or MACS3 are commonly used for bulk ATAC-seq peak calling.

ATAC-seq peak calling with MACS2
mkdir -p results/peaks

sample="A_rep1"

macs2 callpeak \
  -t "results/filtered/${sample}.filtered.bam" \
  -f BAMPE \
  -g hs \
  -n "${sample}" \
  --outdir results/peaks \
  --nomodel \
  --shift -100 \
  --extsize 200 \
  -q 0.01
Peak-calling parameters differ between ATAC-seq workflows. Use consistent, documented parameters and check whether your workflow expects shifted reads, paired-end fragments or cut-site representations.

15. Replicates and reproducibility

Biological replicate consistency is essential for trustworthy ATAC-seq. Technical replicates can be useful, but biological replication is needed for condition-level conclusions.

Peak overlapCompare called peaks across replicates.
Signal correlationCorrelate normalized bigWig signal or peak counts.
FRiP consistencyReplicates should have comparable enrichment quality.
Outlier reviewInvestigate samples with unusual mitochondrial fraction, TSS enrichment or peak counts.

16. Consensus peak sets

A consensus peak set is often used for read counting, annotation and differential accessibility. It can be created from merged peaks across replicates or conditions.

Create a merged consensus peak set
mkdir -p results/peaks/consensus

cat results/peaks/*_peaks.narrowPeak \
  | cut -f1-3 \
  | sort -k1,1 -k2,2n \
  | bedtools merge \
  > results/peaks/consensus/consensus_peaks.bed
Use a stricter consensus strategy for high-confidence regulatory maps and a broader union strategy for differential accessibility, depending on project goals.

17. Signal tracks and genome-browser visualization

bigWig signal tracks allow visualization of chromatin accessibility across the genome and are useful for reports, genome browsers and manual review of key loci.

Create normalized bigWig with deepTools
mkdir -p results/tracks

bamCoverage \
  -b results/filtered/A_rep1.filtered.bam \
  -o results/tracks/A_rep1.RPGC.bw \
  --normalizeUsing RPGC \
  --effectiveGenomeSize 2913022398 \
  --binSize 10 \
  --numberOfProcessors 8
  • Use consistent normalization when comparing samples visually.
  • Inspect representative promoters, enhancers and positive-control regions.
  • Always interpret browser snapshots together with genome-wide statistics.

18. Peak annotation

Peak annotation connects accessible regions to promoters, enhancers, genes, CpG islands, repeats or custom regulatory features. Nearby genes are useful hypotheses, not automatic regulatory targets.

PromotersAccessible promoters often mark active or poised transcriptional regulation.
EnhancersDistal peaks may represent enhancers and require context for gene assignment.
Gene bodiesAccessibility within genes can reflect transcription or regulatory elements.
Intergenic peaksMay contain distal regulatory elements or unannotated features.
Annotate peaks by overlap
bedtools intersect \
  -a results/peaks/consensus/consensus_peaks.bed \
  -b annotation/promoters.bed \
  -wa -wb \
  > results/annotation/atac_peaks_overlapping_promoters.tsv

19. Motif analysis

Motif enrichment analysis identifies transcription-factor binding motifs enriched in accessible regions. It is especially useful for interpreting regulatory programs and differential accessibility.

Extract peak sequences for motif analysis
mkdir -p results/motifs

bedtools getfasta \
  -fi reference/genome.fa \
  -bed results/peaks/consensus/consensus_peaks.bed \
  -fo results/motifs/consensus_peak_sequences.fa
  • Use appropriate background sequences matched for GC content and accessibility context when possible.
  • Separate promoter and distal peaks if they represent different regulatory contexts.
  • Motif enrichment suggests candidate regulators; it does not prove binding without additional evidence.

20. Transcription-factor footprinting

Footprinting aims to detect local depletion of Tn5 insertions at transcription-factor binding sites. It can provide regulatory hypotheses but is sensitive to bias, sequencing depth and normalization.

Input requirementHigh-quality, high-depth ATAC-seq improves footprint reliability.
Tn5 biasSequence insertion bias must be considered.
Motif contextFootprints are usually evaluated at known or predicted motif sites.
ValidationFootprints should be interpreted with motif, expression and ChIP evidence where possible.

21. Differential accessibility analysis

Differential accessibility analysis tests whether chromatin accessibility differs between groups at peak regions. It usually uses read counts over a consensus peak set and count-based statistical models.

StepPurposeNotes
Consensus peak setDefines genomic regions to test.Use a consistent peak set across samples.
Read countingCounts fragments in each peak for each sample.Use filtered BAM files and consistent rules.
NormalizationCorrects library size and composition effects.Global accessibility shifts can complicate normalization.
Statistical modelTests group differences.Include batches, donors or paired design where appropriate.
Annotation and motifsInterprets differential peaks.Connect changes to regulatory elements and candidate TFs.

22. Integration with RNA-seq, ChIP-seq and methylation

ATAC-seq is most powerful when integrated with other regulatory and expression data.

IntegrationQuestionInterpretation
RNA-seqDo accessibility changes correspond to gene-expression changes?Supports regulatory hypotheses and gene programs.
ChIP-seqDo accessible regions overlap TF binding or histone marks?Helps distinguish promoters, enhancers and repressed regions.
Bisulfite-seqDo accessibility changes correspond to DNA methylation changes?Useful for epigenetic regulation studies.
Hi-C or promoter-capture dataWhich distal peaks may contact promoters?Improves enhancer-gene linking.

23. Note on single-cell ATAC-seq

Single-cell ATAC-seq requires specialized processing because reads are assigned to cell barcodes and converted to sparse peak-by-cell or tile-by-cell matrices.

Fragment filesStore genomic fragments linked to cell barcodes.
Cell QCUses TSS enrichment, fragments per cell, blacklist fraction and nucleosome signal.
Peak matrixSparse matrix of accessible regions by cells.
Gene activityApproximates regulatory signal around genes for annotation and integration.

24. Example ATAC-seq analysis workflow

The following simplified workflow illustrates a common paired-end bulk ATAC-seq route. Real projects should adapt parameters to organism, sample type, replicates and validation requirements.

Minimal bulk ATAC-seq workflow
# 1. Raw-read QC
fastqc data/fastq/*.fastq.gz --outdir results/qc/fastqc --threads 8

# 2. Trim reads
fastp \
  --in1 data/fastq/A_rep1_R1.fastq.gz \
  --in2 data/fastq/A_rep1_R2.fastq.gz \
  --out1 results/trimmed/A_rep1_R1.fastq.gz \
  --out2 results/trimmed/A_rep1_R2.fastq.gz \
  --html results/qc/fastp/A_rep1.html \
  --json results/qc/fastp/A_rep1.json \
  --thread 8

# 3. Align
bowtie2 -x reference/bowtie2/genome \
  -1 results/trimmed/A_rep1_R1.fastq.gz \
  -2 results/trimmed/A_rep1_R2.fastq.gz \
  -p 16 2> results/logs/A_rep1.bowtie2.log | \
  samtools sort -@ 8 -o results/alignments/A_rep1.sorted.bam

samtools index results/alignments/A_rep1.sorted.bam

# 4. Filter
samtools view -b -q 30 -F 1804 results/alignments/A_rep1.sorted.bam \
  > results/filtered/A_rep1.mapq30.bam

bedtools intersect -v \
  -abam results/filtered/A_rep1.mapq30.bam \
  -b reference/blacklist.bed \
  > results/filtered/A_rep1.filtered.bam

samtools index results/filtered/A_rep1.filtered.bam

# 5. Peak calling
macs2 callpeak \
  -t results/filtered/A_rep1.filtered.bam \
  -f BAMPE -g hs -n A_rep1 \
  --outdir results/peaks \
  --nomodel --shift -100 --extsize 200 -q 0.01

# 6. Signal track
bamCoverage \
  -b results/filtered/A_rep1.filtered.bam \
  -o results/tracks/A_rep1.RPGC.bw \
  --normalizeUsing RPGC \
  --effectiveGenomeSize 2913022398 \
  --binSize 10 \
  --numberOfProcessors 8

# 7. Summarize
multiqc results --outdir results/qc/multiqc_final

25. Deliverables and reporting

  • FASTQ QC, trimming and final MultiQC reports.
  • Sorted, indexed and filtered BAM files.
  • Mitochondrial read fraction, duplicate rate, mapping rate and blacklist fraction.
  • Insert-size periodicity, TSS enrichment, FRiP and peak-count summaries.
  • Peak files in BED/narrowPeak format.
  • Consensus peak set for annotation and differential accessibility.
  • Normalized bigWig signal tracks for genome-browser visualization.
  • Peak annotation tables for promoters, enhancers, genes and custom features.
  • Motif enrichment and candidate transcription-factor summaries.
  • Differential accessibility tables and plots when comparing conditions.
  • Methods section with software versions, genome assembly, blacklist version, parameters and limitations.

26. ATAC-seq analysis cheat sheet

StepCommon toolsMain outputs
FASTQ QCFastQC, MultiQC, fastpRaw-read QC and project summary.
Trimmingfastp, Cutadapt, Trim GaloreTrimmed FASTQ and trimming reports.
AlignmentBowtie2, BWA, SAMtoolsSorted and indexed BAM files.
FilteringSAMtools, BEDTools, PicardFiltered BAM files and blacklist-removed alignments.
QC metricsdeepTools, ATACseqQC, MultiQC, custom scriptsTSS enrichment, FRiP, insert size, duplicate and mitochondrial metrics.
Peak callingMACS2, MACS3, Genrich-style workflowsAccessible chromatin peaks.
Signal tracksdeepTools bamCoverage, bedGraphToBigWigbigWig tracks for visualization.
AnnotationBEDTools, ChIPseeker, HOMER, custom R/PythonPeak-to-feature and peak-to-gene tables.
MotifsHOMER, MEME Suite, chromVAR-style workflowsMotif enrichment and candidate regulators.
Differential accessibilityDiffBind, csaw, DESeq2/edgeR-style modelsDifferential peak tables and regulatory interpretation.

Frequently asked questions

What is ATAC-seq?

ATAC-seq is Assay for Transposase-Accessible Chromatin using sequencing. It profiles open chromatin by using a hyperactive Tn5 transposase to insert sequencing adapters into accessible DNA regions.

What does ATAC-seq measure?

ATAC-seq measures chromatin accessibility. Accessible regions often correspond to active promoters, enhancers and regulatory elements, but interpretation should be supported by annotation and, when possible, complementary data such as RNA-seq or ChIP-seq.

What are the main ATAC-seq analysis steps?

Common steps include FASTQ QC, adapter trimming, alignment, filtering, mitochondrial read assessment, duplicate handling, Tn5 shift correction, peak calling, QC metrics such as TSS enrichment and FRiP, peak annotation, motif analysis and differential accessibility analysis.

Why are mitochondrial reads important in ATAC-seq?

High mitochondrial read fraction often indicates damaged cells, poor nuclei preparation or excessive mitochondrial DNA accessibility. Expected levels vary by sample type, but very high mitochondrial fractions can reduce usable nuclear signal.

What is TSS enrichment?

TSS enrichment measures how strongly ATAC-seq signal is enriched around transcription start sites. It is a common quality metric for chromatin accessibility data and reflects signal-to-background quality.

What is FRiP in ATAC-seq?

FRiP means fraction of reads in peaks. It measures the fraction of aligned reads overlapping called accessible regions and is commonly used as an enrichment quality metric.

Why is Tn5 shift correction used?

Tn5 transposase inserts adapters with a characteristic offset relative to the cut site. Shifting read positions helps represent the actual transposition event more accurately for footprinting, peak summits and signal visualization.

What is the difference between bulk ATAC-seq and single-cell ATAC-seq?

Bulk ATAC-seq profiles average accessibility across a cell population, while single-cell ATAC-seq measures chromatin accessibility at cell resolution and requires specialized cell barcode, fragment and sparse-matrix workflows.

Can AI help with ATAC-seq analysis?

AI can help summarize QC, flag unusual samples, interpret peak annotations, prioritize motifs and integrate ATAC-seq with RNA-seq, ChIP-seq or single-cell data, while the workflow should remain reproducible and auditable.