Manipulation with FASTA and FASTQ files in Linux Bash/Shell.
This tutorial collects practical one-line commands and reusable shell snippets for common FASTA and FASTQ operations: counting reads, inspecting file ranges, converting FASTQ to FASTA, extracting motif-containing reads, counting FASTA records, cleaning headers, and converting multi-line FASTA files into single-line format.
Linux command-line tools are extremely useful for quick inspection and manipulation of sequencing files. Many everyday operations can be performed with standard tools such as cat, zcat, wc, sed, awk, grep, perl, and bc.
The examples below modernize the original SciBerg command list and add context, validation steps, and reusable snippets for more reproducible work.
FASTQCount reads, inspect records, convert to FASTA, and extract reads containing motifs.
FASTACount records, extract headers, clean descriptions, and convert multi-line records to one-line format.
Workflow hygieneAlways document filenames, commands, tool versions, and output files for reproducibility.
Important file-format reminder
A standard FASTQ record has four lines: header, sequence, separator, and quality string. That is why line counts are divided by four when estimating the number of reads in a valid FASTQ file.
FASTQ operations
Count reads in an uncompressed FASTQ file
echo $(cat fastq_file.fastq | wc -l)/4 | bc
Count reads in a gzip-compressed FASTQ file
echo $(zcat fastq_file.fastq.gz | wc -l)/4 | bc
A slightly more explicit version stores the line count first and then divides by four:
Use sed or awk to print a range of lines from a large FASTQ file without opening the entire file in a text editor.
# Print lines 530 to 640
sed -n '530,640p;641q' fastq_file.fastq
# Print lines 530 to 540
awk 'FNR>=530 && FNR<=540' fastq_file.fastq
wc -lCounts the number of lines in a file or stream.
zcatStreams a gzip-compressed file without creating an uncompressed copy.
sed -nPrints selected line ranges efficiently.
awkProcesses records and line ranges using programmable conditions.
Convert FASTQ to FASTA
FASTQ-to-FASTA conversion removes quality-score lines and outputs only sequence headers and sequences. This is useful for some motif-search, BLAST, reference-building, or quick inspection tasks.
Some tools or downstream scripts require each FASTA record to contain a single sequence line after the header.
awk '!/^>/ { printf "%s", $0; n = "\n" } /^>/ { print n $0; n = "" } END { printf "%s", n }' \
multi_line.fa \
> single_line.fa
Header-cleaning caution
Cleaning headers can break compatibility with annotation files if IDs are expected to match exactly. Preserve a copy of the original reference and document any header transformations.
Reusable workflow snippets
Summarize FASTQ and FASTA files in a project folder
#!/usr/bin/env bash
set -euo pipefail
echo -e "file\ttype\trecords"
for fq in *.fastq *.fastq.gz; do
[[ -e "$fq" ]] || continue
if [[ "$fq" == *.gz ]]; then
READS=$(echo "$(zcat "$fq" | wc -l) / 4" | bc)
else
READS=$(echo "$(cat "$fq" | wc -l) / 4" | bc)
fi
echo -e "${fq}\tFASTQ\t${READS}"
done
for fa in *.fa *.fasta; do
[[ -e "$fa" ]] || continue
RECORDS=$(grep -c '^>' "$fa")
echo -e "${fa}\tFASTA\t${RECORDS}"
done
Convert all FASTQ files in a folder to FASTA
#!/usr/bin/env bash
set -euo pipefail
mkdir -p fasta_output
for fq in *.fastq; do
[[ -e "$fq" ]] || continue
sample="${fq%.fastq}"
awk 'NR%4==1{a=substr($0,2);}NR%4==2{print ">"a"\n"$0}' \
"$fq" \
> "fasta_output/${sample}.fa"
done
Next steps
After basic FASTA/FASTQ manipulation, continue with quality control, trimming, reference preparation, read alignment, or quantification depending on the analysis goal.
Run FastQC before major downstream analysis steps.
Trim adapters or low-quality bases when required.
Prepare clean FASTA references and remove duplicates where appropriate.
Build aligner indexes after any reference modification.
Document every command and preserve raw data unchanged.
Privacy noticeWe process contact-form data only to respond to your enquiry. Please review our Privacy Policy for details.