Bioinformatics Scripts

Trimming and filtering of NGS reads with Cutadapt.

This tutorial shows how to remove adapters, poly-A tails, primers, low-quality bases, and unwanted read-length ranges from sequencing reads. These preprocessing steps are often required before alignment, quantification, variant calling, or downstream statistical analysis.

Overview

Read trimming and filtering are common preprocessing steps in NGS analysis. They can remove technical sequences introduced during library preparation, low-quality read ends, poly-A or poly-T stretches, fixed barcodes, and very short reads that are unlikely to align or quantify reliably.

Cutadapt is a flexible command-line tool for removing adapters, primers, poly-A tails, and other unwanted sequences from high-throughput sequencing reads. It also supports quality trimming, length filtering, fixed-base removal, and paired-end processing.

Input FASTQ or FASTQ.GZ files from single-end or paired-end sequencing.
Processing Adapter trimming, quality trimming, poly-A trimming, fixed-base removal, and size selection.
Output Cleaned FASTQ files and trimming logs for documentation and downstream analysis.

Install Cutadapt

For reproducible bioinformatics work, the recommended installation method is usually a Conda or Mamba environment with the Bioconda and conda-forge channels configured.

Install with Mamba or Conda

# Create a clean environment for read trimming
mamba create -n ngs-trim -c conda-forge -c bioconda cutadapt

# Activate the environment
mamba activate ngs-trim

# Check the installed version
cutadapt --version

Alternative installation with pip

# Use this only if a Python environment is already managed carefully
python -m pip install --upgrade cutadapt

# Check the installation
cutadapt --version

Why avoid old manual setup commands?

Older tutorials often used manual source downloads and python setup.py install. For modern projects, Conda/Mamba or pip-based installation is usually easier to reproduce and maintain.

Core Cutadapt commands

Replace adapter sequences with the exact adapter or primer sequences used in your library preparation protocol. The examples below use common Illumina-style adapter fragments and should be adapted to your experiment.

Remove a 3′ adapter from single-end reads

cutadapt \
  -a AGATCGGAAGAGCACACGTCT \
  -m 18 \
  -o sample.trimmed.fastq.gz \
  sample.fastq.gz

Trim low-quality bases

# Trim low-quality bases from the 3′ end
cutadapt \
  -q 20 \
  -o sample.q20.fastq.gz \
  sample.fastq.gz

# Trim 5′ and 3′ ends with different quality cutoffs
cutadapt \
  -q 15,20 \
  -o sample.qtrim.fastq.gz \
  sample.fastq.gz

Remove fixed bases from the beginning or end

# Remove the first 3 bases from each read
cutadapt \
  -u 3 \
  -o sample.trim5p.fastq.gz \
  sample.fastq.gz

# Remove 5 bases from the 3′ end
cutadapt \
  -u -5 \
  -o sample.trim3p.fastq.gz \
  sample.fastq.gz

Trim poly-A tails

# Modern Cutadapt versions support poly-A/poly-T trimming
cutadapt \
  --poly-a \
  -m 18 \
  -o sample.polyA_trimmed.fastq.gz \
  sample.fastq.gz

Size-select reads

# Keep reads at least 18 nt long and at most 35 nt long
cutadapt \
  -m 18 \
  -M 35 \
  -o sample.18to35nt.fastq.gz \
  sample.fastq.gz
-a Trim a 3′ adapter from single-end reads or read 1.
-A Trim a 3′ adapter from read 2 in paired-end data.
-g Trim a 5′ adapter or primer from single-end reads or read 1.
-q Trim low-quality bases from read ends before adapter trimming.
-m Discard reads shorter than the specified minimum length.
-M Discard reads longer than the specified maximum length.

Paired-end read trimming

Paired-end processing keeps read pairs synchronized. Use -o for read 1 output and -p for read 2 output.

cutadapt \
  -a AGATCGGAAGAGCACACGTCT \
  -A AGATCGGAAGAGCGTCGTGTA \
  -q 20 \
  -m 18 \
  -o sample_R1.trimmed.fastq.gz \
  -p sample_R2.trimmed.fastq.gz \
  sample_R1.fastq.gz \
  sample_R2.fastq.gz

For paired-end libraries, adapter sequences for read 1 and read 2 can differ. Always confirm the adapter sequences in the protocol or sample sheet used for the specific sequencing run.

Adapter examples for common library preparation kits

The current SciBerg resource lists example Cutadapt commands for several library preparation kits. The examples below preserve those adapter patterns while presenting them in a cleaner format.

CATS RNA and DNA library preparation kits

# Read 1: remove first 3 bases, poly-A-like stretches, adapter fragments,
# 5′ template-switching sequence, and reads shorter than 18 nt
cutadapt -u 3 input_R1.fastq.gz | \
cutadapt -a AAAAAAAA - | \
cutadapt -a AAAAAAAN$ -a AAAAAAN$ -a AAAAAN$ - | \
cutadapt -a AGAGCACACGTCTG - | \
cutadapt -O 8 -g GTTCAGAGTTCTACAGTCCGACGATCNNN - | \
cutadapt -m 18 -o output_R1.fastq.gz -

# Read 2
cutadapt -a CCCGATCGTCGG input_R2.fastq.gz | \
cutadapt -a GGGGATCGTCGG - | \
cutadapt -m 18 -o output_R2.fastq.gz -

NEBNext Ultra and Ultra II DNA library prep kits

# Read 1
cutadapt \
  -a GATCGGAAGAGCACACGT \
  -m 18 \
  -o output_R1.fastq.gz \
  input_R1.fastq.gz

# Read 2
cutadapt \
  -a GATCGGAAGAGCACACGT \
  -m 18 \
  -o output_R2.fastq.gz \
  input_R2.fastq.gz

NEBNext Small RNA library prep kit

# Read 1
cutadapt \
  -a AGATCGGAAGAGCACACGTCT \
  -m 18 \
  -o output_R1.fastq.gz \
  input_R1.fastq.gz

# Read 2
cutadapt \
  -a GATCGTCGGACTGTAGAACTC \
  -m 18 \
  -o output_R2.fastq.gz \
  input_R2.fastq.gz

TruSeq Small RNA library preparation kits

# Read 1
cutadapt \
  -a TGGAATTCTCGGGTGCCAAGG \
  -m 18 \
  -o output_R1.fastq.gz \
  input_R1.fastq.gz

# Read 2
cutadapt \
  -a GATCGTCGGACTGTAGAACTC \
  -m 18 \
  -o output_R2.fastq.gz \
  input_R2.fastq.gz

TruSeq RNA, TruSeq stranded mRNA, TruSeq stranded total RNA, and ScriptSeq RNA-seq

# Read 1
cutadapt \
  -a AGATCGGAAGAGCACACGTCT \
  -m 18 \
  -o output_R1.fastq.gz \
  input_R1.fastq.gz

# Read 2
cutadapt \
  -a AGATCGGAAGAGCGTCGTGTA \
  -m 18 \
  -o output_R2.fastq.gz \
  input_R2.fastq.gz

Adapter sequences must match your experiment

Adapter sequences can vary by kit version, indexing strategy, read structure, protocol modification, and sequencing provider. Treat these examples as starting points and verify the final parameters for every project.

Batch trimming script

The script below processes paired-end FASTQ files named *_R1.fastq.gz and *_R2.fastq.gz, writes trimmed reads to a new folder, and stores Cutadapt reports.

#!/usr/bin/env bash
set -euo pipefail

INPUT_DIR="fastq"
OUTPUT_DIR="trimmed_fastq"
REPORT_DIR="cutadapt_reports"
THREADS=8

ADAPTER_R1="AGATCGGAAGAGCACACGTCT"
ADAPTER_R2="AGATCGGAAGAGCGTCGTGTA"

mkdir -p "$OUTPUT_DIR" "$REPORT_DIR"

for R1 in "$INPUT_DIR"/*_R1.fastq.gz; do
  SAMPLE=$(basename "$R1" _R1.fastq.gz)
  R2="$INPUT_DIR/${SAMPLE}_R2.fastq.gz"

  cutadapt \
    -j "$THREADS" \
    -a "$ADAPTER_R1" \
    -A "$ADAPTER_R2" \
    -q 20 \
    -m 18 \
    -o "$OUTPUT_DIR/${SAMPLE}_R1.trimmed.fastq.gz" \
    -p "$OUTPUT_DIR/${SAMPLE}_R2.trimmed.fastq.gz" \
    "$R1" "$R2" \
    > "$REPORT_DIR/${SAMPLE}.cutadapt.txt"
done

After trimming, run FastQC again and compare the pre- and post-trimming reports. In larger projects, aggregate QC output into a single project-level report.

Next steps after trimming

After trimming and filtering, reads can be used for downstream analysis. The correct next step depends on the sequencing assay.

  1. Run FastQC on trimmed reads and compare results with the raw-read reports.
  2. Proceed to read alignment, transcript quantification, variant calling, taxonomic classification, or another assay-specific workflow.
  3. Document Cutadapt version, adapter sequences, quality cutoffs, minimum/maximum lengths, and all command-line parameters.
  4. Keep both raw and processed FASTQ files, unless your project data-management policy specifies otherwise.