Read alignment with Bowtie, Bowtie2, BWA, HISAT2 and STAR.
This tutorial summarizes common Linux commands for building reference indexes and aligning NGS reads with short-read and RNA-seq aligners. It covers Bowtie and Bowtie2 for small RNA and general short reads, BWA for DNA-seq, HISAT2 for splice-aware RNA-seq, and STAR for high-performance transcriptome and genome alignment.
Read alignment maps sequencing reads to a reference genome, transcriptome, custom FASTA file, or RNA-class reference. The output is usually a SAM or BAM file that can be used for read counting, variant calling, peak calling, transcript quantification, or downstream quality control.
The original SciBerg page provides command examples for Bowtie, Bowtie2, BWA, HISAT2, and STAR. This updated page preserves the core commands and adds modern installation options, workflow context, validation steps, and reusable examples.
InputFASTQ files and a matching reference FASTA or prebuilt reference index.
ProcessingBuild an index, run the aligner with suitable parameters, and produce SAM or BAM output.
Different aligners are optimized for different read lengths, error models, and biological assays. The table below gives a practical starting point.
BowtieUseful for short reads and small RNA fragments where very strict mismatch control is needed.
Bowtie2General-purpose short-read alignment against genome, transcriptome, or custom FASTA references.
BWA-MEMCommon choice for DNA-seq, WGS, WES, and longer short reads mapped to a genome.
HISAT2Splice-aware RNA-seq aligner suitable for genome-based transcriptome analysis.
STARFast splice-aware RNA-seq aligner for genome and transcriptome outputs, requiring substantial memory for large genomes.
Custom referencesFor RNA-class, contaminant, viral, or small custom databases, Bowtie/Bowtie2 are often convenient.
Install the aligners
The original page uses manual binary and source installations. For modern reproducible analysis, Conda or Mamba environments are often easier to manage.
Manual installation examples from the original workflow
# Bowtie and Bowtie2 binary downloads, version examples from the original page
wget https://sourceforge.net/projects/bowtie-bio/files/bowtie/1.2.3/bowtie-1.2.3-linux-x86_64.zip/download
wget https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.3.5.1/bowtie2-2.3.5.1-linux-x86_64.zip/download
# BWA source example
wget https://sourceforge.net/projects/bio-bwa/files/bwa-0.7.17.tar.bz2/download
tar xvjf bwa-0.7.17.tar.bz2
cd bwa-0.7.17
make
Version note
The original SciBerg page lists historical versions such as Bowtie 1.2.3, Bowtie2 2.3.5.1, BWA 0.7.17, HISAT2 2.1.0, and STAR 2.6.1a. For new projects, use current stable releases and record exact versions in the methods section.
Bowtie and Bowtie2 alignment
Bowtie and Bowtie2 are convenient for mapping reads to genomes, transcriptomes, or custom FASTA references. Bowtie is especially useful when strict mismatch control is needed for short RNA fragments.
Build Bowtie and Bowtie2 indexes
# Bowtie index
bowtie-build reference.fa reference
# Bowtie2 index
bowtie2-build reference.fa reference
Bowtie mapping with mismatch control
# 0 mismatches: preferred for small RNA fragments of approximately 15–25 nt
bowtie -p 8 -v 0 reference INPUT.fastq -S INPUT_over_reference.sam
# 1 mismatch: often acceptable for RNA fragments of approximately 25–100 nt
bowtie -p 8 -v 1 reference INPUT.fastq -S INPUT_over_reference.sam
# 2 mismatches: often acceptable for longer fragments, depending on project goals
bowtie -p 8 -v 2 reference INPUT.fastq -S INPUT_over_reference.sam
In commands such as bowtie2 -x reference, the value after -x is the index prefix, not necessarily the FASTA filename. Keep index names simple and consistent.
BWA alignment
BWA-MEM is commonly used for DNA sequencing data such as WGS, WES, targeted DNA panels, and other genomic reads.
HISAT2 is a splice-aware aligner for RNA-seq data. It can use genome, SNP-aware, and transcriptome-aware index configurations. The original SciBerg page notes that building human GRCh38 transcriptome-aware indexes can require very large memory and suggests using prebuilt indexes.
Download prebuilt HISAT2 indexes
# Historical URLs from the original SciBerg workflow
wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38.tar.gz
tar -xzf grch38.tar.gz
wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38_snp.tar.gz
tar -xzf grch38_snp.tar.gz
wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38_tran.tar.gz
tar -xzf grch38_tran.tar.gz
For direct alignment to transcriptome FASTA references rather than genome indexes, Bowtie or Bowtie2 can be more appropriate. For splice-aware genome alignment, use HISAT2 or STAR.
STAR alignment
STAR is a high-performance aligner widely used for RNA-seq. It supports genome alignment, sorted BAM output, splice-junction-aware alignment, and transcriptome output for downstream quantification.
# Genome alignment plus transcriptome BAM output
STAR \
--runThreadN 8 \
--genomeDir folder_with_STAR_indexes_no_gtf/ \
--sjdbGTFfile folder_with_gtf_file/genome_annotation.gtf \
--readFilesIn INPUT.fastq \
--quantMode TranscriptomeSAM \
--outFileNamePrefix INPUT
# Direct alignment to a transcriptome reference index
STAR \
--runThreadN 8 \
--genomeDir folder_with_STAR_indexes_for_transcriptome/ \
--readFilesIn INPUT.fastq \
--outFileNamePrefix INPUT
Compressed FASTQ files with STAR
For .fastq.gz input files, add --readFilesCommand zcat to STAR commands so reads are decompressed on the fly.
Reusable alignment workflow skeleton
The script below shows a simple Bowtie2 alignment workflow that builds an index if needed, maps single-end reads, converts SAM to sorted BAM, and records mapping statistics.
#!/usr/bin/env bash
set -euo pipefail
THREADS=8
REFERENCE="reference.fa"
INDEX_PREFIX="reference_index/reference"
FASTQ="INPUT.fastq.gz"
OUT_PREFIX="INPUT_over_reference"
mkdir -p "$(dirname "$INDEX_PREFIX")" alignments logs
# Build Bowtie2 index if missing
if [[ ! -f "${INDEX_PREFIX}.1.bt2" && ! -f "${INDEX_PREFIX}.1.bt2l" ]]; then
bowtie2-build "$REFERENCE" "$INDEX_PREFIX"
fi
# Align reads
bowtie2 \
-q \
-p "$THREADS" \
-x "$INDEX_PREFIX" \
-U "$FASTQ" \
-S "alignments/${OUT_PREFIX}.sam" \
2> "logs/${OUT_PREFIX}.bowtie2.log"
# Convert SAM to sorted BAM
samtools view -@ "$THREADS" -bS "alignments/${OUT_PREFIX}.sam" | \
samtools sort -@ "$THREADS" -o "alignments/${OUT_PREFIX}.sorted.bam"
samtools index "alignments/${OUT_PREFIX}.sorted.bam"
# Optional: remove large intermediate SAM after validation
# rm "alignments/${OUT_PREFIX}.sam"
samtools flagstat "alignments/${OUT_PREFIX}.sorted.bam" \
> "logs/${OUT_PREFIX}.flagstat.txt"
echo "Done. Alignment files are in alignments/ and logs are in logs/"
Next steps after read alignment
After alignment, inspect mapping statistics and continue with the assay-specific downstream analysis.