SAM and BAM files are standard alignment outputs generated by tools such as Bowtie, Bowtie2, BWA, HISAT2 and STAR. This tutorial shows how to install samtools, convert SAM to BAM, sort alignments by coordinate, create indexes, inspect headers, and generate basic mapping summaries.
samtools is one of the most widely used command-line toolkits for working with SAM, BAM, and CRAM alignment files. After read alignment, typical operations include converting SAM to BAM, sorting BAM files, indexing sorted BAM files, extracting subsets of reads, and generating mapping statistics.
The original SciBerg page provides the core installation and conversion commands. This updated page preserves those commands and adds practical examples for modern NGS workflows.
InputSAM or BAM files produced by short-read or RNA-seq aligners.
ProcessingConvert, sort, index, filter, and summarize alignment files.
OutputSorted BAM files, BAM indexes, mapping statistics, and analysis-ready alignments.
Install samtools
The original SciBerg workflow installs samtools from source. For reproducible projects, a Conda or Mamba environment is usually the easiest option.
Source-install example from the original SciBerg workflow
# Get the source archive
wget https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2
# Unpack, compile, and install
tar -vxjf samtools-1.9.tar.bz2
cd samtools-1.9
make
sudo make install
Version note
The original command uses samtools 1.9 as an example. For new projects, use a current stable release and record the exact samtools version in your workflow documentation.
SAM versus BAM files
SAM is a human-readable text format. BAM is the compressed binary representation of SAM and is more efficient for storage, indexing, and downstream analysis.
SAMText-based alignment format. Easy to inspect but large and slow for big datasets.
BAMCompressed binary alignment format. Preferred for most downstream workflows.
Coordinate-sorted BAMAlignments sorted by genomic position. Required for indexing and many genome-browser workflows.
BAM index.bai or related index file that allows fast random access to genomic regions.
Keep intermediate files only when needed
SAM files can be very large. Once a sorted BAM file and logs have been validated, intermediate SAM files are often removed to save disk space.
Convert SAM to BAM
The core conversion command from the original SciBerg page converts a SAM file into a BAM file using multiple threads.
samtools view -b -@ [insert number of threads] INPUT.sam -o INPUT.bam
The script below converts one SAM file to a coordinate-sorted BAM file, creates an index, writes QC summaries, and optionally removes the intermediate BAM file.