This tutorial introduces the first quality-control step for raw sequencing reads. FastQC provides a quick, visual overview of high-throughput sequencing data before trimming, alignment, quantification, variant calling, or downstream statistical analysis.
FASTQ files are the standard starting point for many NGS workflows. Before any downstream analysis, raw reads should be inspected for quality, sequence composition, GC distribution, adapter contamination, duplicate reads, and overrepresented sequences.
FastQC is a widely used quality-control tool for high-throughput sequencing data. It can import FASTQ, BAM, and SAM files, provides summary graphs and tables, and exports an HTML report for permanent documentation.
InputFASTQ, FASTQ.GZ, BAM, or SAM files from sequencing pipelines.
AnalysisModular checks for quality, content, GC bias, duplication, and adapters.
OutputHTML reports, summary files, and optional extracted report directories.
Install FastQC
FastQC requires a suitable Java Runtime Environment. For reproducible bioinformatics environments, package managers such as Conda or Mamba are usually the simplest installation route.
Option 1: Install with Mamba or Conda
# Create an environment for NGS quality-control tools
mamba create -n ngs-qc -c conda-forge -c bioconda fastqc
# Activate the environment
mamba activate ngs-qc
# Check the installation
fastqc --version
Option 2: Manual download
If you prefer a manual installation, download FastQC from Babraham Bioinformatics and make the wrapper script executable.
# Example version-specific installation
# Check the FastQC download page and update the version number if needed
wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.12.0.zip
unzip fastqc_v0.12.0.zip
cd FastQC
chmod 755 fastqc
# Optional: add FastQC to your PATH
sudo ln -s /path/to/FastQC/fastqc /usr/local/bin/fastqc
Tip
When working on shared servers or HPC systems, avoid using sudo unless permitted by your administrator. Instead, add the FastQC folder to your personal PATH or use a Conda/Mamba environment.
Run FastQC
FastQC can be launched interactively, but command-line execution is usually preferred for reproducible NGS projects and batch processing.
Launch the graphical interface
fastqc
Analyze one or more FASTQ files
# Analyze individual files
fastqc sample_R1.fastq.gz sample_R2.fastq.gz
# Create an output directory and process all FASTQ files
mkdir -p fastqc_reports
fastqc -o fastqc_reports *.fastq.gz
Use multiple CPU threads
# Run FastQC with 8 threads
fastqc -t 8 -o fastqc_reports *.fastq.gz
Extract report folders automatically
# Produce HTML reports and extracted data folders
fastqc --extract -t 8 -o fastqc_reports *.fastq.gz
Interpret FastQC reports
FastQC reports should not be interpreted mechanically. A warning or failure flag does not always mean a sample is unusable; the interpretation depends on library type, read length, sequencing platform, organism, and downstream analysis goal.
Per-base sequence qualityShows whether quality decreases across read positions, especially toward read ends.
Per-sequence quality scoresSummarizes whether many reads have globally low quality.
Per-base sequence contentHighlights positional nucleotide biases that may reflect library preparation or technical effects.
Sequence duplication levelsHelps estimate redundancy, PCR duplication, and library complexity.
Adapter contentIndicates whether trimming may be needed before alignment or quantification.
Overrepresented sequencesLists abundant sequences that may represent adapters, contaminants, rRNA, or biological signal.
Sequence length distributionShows whether reads have fixed or variable lengths after sequencing or preprocessing.
Important interpretation note
Small RNA-seq, ATAC-seq, amplicon sequencing, metagenomics, and bisulfite sequencing can produce FastQC patterns that look unusual compared with standard RNA-seq or WGS data. Always interpret QC metrics in the context of the experimental design.
Batch analysis example
In most projects, FastQC is applied to all FASTQ files in a folder and results are collected into a dedicated QC directory.
After running FastQC, inspect the HTML files in fastqc_reports. For larger projects, you may also combine multiple QC reports into one summary dashboard using workflow-specific reporting tools.
Next steps after primary FASTQ QC
The next step depends on the sequencing assay and QC findings. Some datasets may require trimming or filtering before alignment, while others may proceed directly into alignment, quantification, variant calling, or taxonomic classification.