Beginner Command-Line Tutorial

Introduction to Unix command line.

A beginner-friendly tutorial for scientists, students and bioinformatics users who want to understand the terminal, navigate folders, inspect files, combine commands, work safely with large datasets and prepare for reproducible NGS data analysis on Linux, macOS or remote servers.

1. What is the Unix command line?

The Unix command line is a text-based way to control a computer. You type commands into a terminal, and the shell interprets those commands. On Linux servers, HPC clusters and many bioinformatics workstations, the command line is the main interface for running analyses.

Terminal The application window where you type commands and see output.
Shell The program that reads your commands. Bash and Zsh are common shells.
Commands Programs or shell features that perform actions such as listing files, copying data or running analyses.
Why learn it? A single command can process thousands of files, repeat an analysis exactly, document your workflow and run tools that have no graphical interface.

2. Terminal, shell and prompt

When you open a terminal, you usually see a prompt. The prompt may show your username, computer name, current directory and a symbol such as $. You type commands after the prompt and press Enter.

Example prompt and command
andrey@workstation:~/projects$ pwd
/home/andrey/projects
Symbol / term Meaning Example
$ Typical prompt for a normal user. $ ls
# Often indicates a root or administrator shell. Be careful. # apt update
~ Your home directory. cd ~/projects
. The current directory. cp file.txt ./backup/
.. The parent directory. cd ..

3. Filesystem and paths

Unix-like systems organize files in a tree. The top of the tree is the root directory, written as /. Your personal files are usually in your home directory, for example /home/username on Linux or /Users/username on macOS.

Absolute path A full path from the root directory, such as /home/andrey/projects/sample.fastq.gz.
Relative path A path starting from the current directory, such as data/sample.fastq.gz or ../results.
Common filesystem locations
/                 # root directory
/home/username    # user home directory on Linux
/Users/username   # user home directory on macOS
/tmp              # temporary files
/mnt              # mounted disks or network locations on many Linux systems
/project          # project storage on some servers or clusters
/scratch          # temporary high-performance storage on some clusters

5. Creating, copying, moving and deleting files

File operations are powerful and sometimes irreversible. Learn them carefully on test files before using them on real data.

Command Purpose Example
mkdir Create directory. mkdir results
touch Create empty file or update timestamp. touch notes.txt
cp Copy files or directories. cp notes.txt notes_backup.txt
mv Move or rename files. mv notes.txt project_notes.txt
rm Remove files. rm old_file.txt
rm -r Remove directories recursively. rm -r old_folder
Be careful with rm: deleted files are usually not moved to a recycle bin. Avoid commands such as rm -rf * unless you fully understand where you are and what will be removed.

6. Viewing and summarizing text files

Many scientific files are plain text or compressed text: FASTQ, FASTA, SAM, VCF, GTF, BED, CSV and TSV. Unix commands let you inspect them quickly.

Command Purpose Example
cat Print entire file to screen. cat README.txt
less View long files page by page. less sample.vcf
head Show first lines. head -n 20 sample.fastq
tail Show last lines. tail -n 20 log.txt
wc Count lines, words or bytes. wc -l genes.txt
zcat Print compressed gzip files. zcat reads.fastq.gz | head
Create a small example file
cat > samples.tsv << 'EOF'
sample_id	group	batch
S1	control	A
S2	control	A
S3	treated	B
S4	treated	B
EOF

cat samples.tsv
head -n 2 samples.tsv
wc -l samples.tsv

7. Wildcards, quoting and tab completion

Wildcards help you select many files at once. Quoting protects spaces and special characters from being interpreted by the shell.

* Matches any number of characters. Example: ls *.fastq.gz
? Matches one character. Example: ls sample?.txt
Quotes Use quotes around file names with spaces: cat "my file.txt"
Tab completion Press Tab to autocomplete file names and reduce typing errors.
Wildcards are expanded before the command runs. Always test with ls before using wildcards in destructive commands.

8. Pipes and redirection

Pipes and redirection are central Unix ideas. A pipe sends the output of one command into another command. Redirection saves output to a file or reads input from a file.

Operator Meaning Example
| Pipe output to another command. cat samples.tsv | wc -l
> Write output to file, replacing existing content. ls > files.txt
>> Append output to file. date >> log.txt
2> Redirect error messages. command 2> errors.log
< Read input from file. sort < names.txt
Examples with pipes and redirection
cut -f 2 samples.tsv
cut -f 2 samples.tsv | sort
cut -f 2 samples.tsv | sort | uniq -c

cut -f 2 samples.tsv | sort | uniq -c > group_counts.txt
cat group_counts.txt

9. Text processing: grep, sort, uniq, cut, awk and sed

Unix text-processing tools are especially useful for logs, metadata tables, genomic intervals and annotation files.

Command Purpose Example
grep Search for text patterns. grep treated samples.tsv
sort Sort lines. sort names.txt
uniq Collapse repeated adjacent lines. sort names.txt | uniq -c
cut Extract columns or characters. cut -f 1 samples.tsv
awk Process columns and patterns. awk '$2=="treated"' samples.tsv
sed Stream editing and substitutions. sed 's/treated/case/g' samples.tsv
Text-processing examples
# Show treated samples
grep treated samples.tsv

# Count samples per group
tail -n +2 samples.tsv | cut -f 2 | sort | uniq -c

# Extract sample IDs from batch A
awk 'BEGIN{FS="\t"} NR>1 && $3=="A" {print $1}' samples.tsv

10. Permissions and executable files

Unix permissions control who can read, write or execute a file. Use ls -l to see permissions.

Inspect permissions
ls -l script.sh

# Example output:
# -rwxr-xr-x 1 user user 120 Jan 01 10:00 script.sh
Permission Meaning for files Meaning for directories
r Read file contents. List directory contents.
w Modify file. Create, delete or rename files inside directory.
x Execute file as a program or script. Enter directory with cd.
Make a script executable
chmod +x script.sh
./script.sh

11. Processes, jobs and stopping commands

A running command is a process. Long analyses may run for minutes, hours or days. Learn how to monitor and stop commands safely.

Command / shortcut Purpose Example
Ctrl-C Interrupt the current command. Stop a command that is running in the terminal.
Ctrl-Z Suspend the current command. Pause a foreground process.
ps List processes. ps aux | grep python
top / htop Monitor CPU and memory. htop
jobs List shell jobs. jobs
fg / bg Move jobs to foreground or background. fg %1

12. Working on remote servers

Many bioinformatics analyses run on remote Linux servers or HPC clusters. You usually connect with SSH and transfer files with SCP or rsync.

SSH and file transfer examples
# Connect to a remote server
ssh username@server.example.org

# Copy one file to the server
scp sample.fastq.gz username@server.example.org:/project/data/

# Copy a folder recursively
scp -r results username@server.example.org:/project/results/

# Synchronize folders efficiently
rsync -avh --progress data/ username@server.example.org:/project/data/
For long-running remote work, tools such as screen, tmux, SLURM job scripts or workflow engines are safer than leaving commands in an ordinary SSH session.

13. PATH, software and help pages

The shell uses the PATH variable to find programs. When you type samtools, the shell searches directories listed in PATH until it finds an executable called samtools.

Inspect software and PATH
echo $PATH
which bash
which python
which samtools

bash --version
python --version

man ls
ls --help
which Shows which executable will run when you type a command.
man Opens the manual page for many Unix commands. Press q to quit.

14. Writing simple shell scripts

A shell script stores commands in a file so that they can be repeated. Scripts are a major step toward reproducible analysis.

Create and run a script
cat > hello_unix.sh << 'EOF'
#!/usr/bin/env bash
set -euo pipefail

echo "Current directory:"
pwd

echo "Files:"
ls -lh
EOF

chmod +x hello_unix.sh
./hello_unix.sh

Good script habits

  • Use clear file and folder names.
  • Add comments explaining important steps.
  • Save scripts in a scripts/ folder.
  • Write logs to a logs/ folder.
  • Record software versions in the output report.
  • Test scripts on small files before running them on full datasets.

15. Command-line examples for bioinformatics

The command line is especially useful for inspecting sequencing files and metadata. The following examples illustrate common operations. Adapt them to your data and tools.

Inspect compressed FASTQ files
# Show the first FASTQ record from a compressed file
zcat sample_R1.fastq.gz | head -n 4

# Count reads in a compressed FASTQ file
zcat sample_R1.fastq.gz | echo $(( $(wc -l) / 4 ))
List FASTQ file sizes
ls -lh *.fastq.gz

# Save file sizes to a report
ls -lh *.fastq.gz > fastq_file_sizes.txt
Check a tab-separated sample sheet
# Show column names
head -n 1 samples.tsv

# Count samples by group
tail -n +2 samples.tsv | cut -f 2 | sort | uniq -c

# Find samples from one batch
awk 'BEGIN{FS="\t"} NR>1 && $3=="batch1" {print $1}' samples.tsv
FASTQ and BAM files can be very large. Prefer streaming commands, compression-aware tools and project scratch storage when working with NGS data.

16. Safe command-line habits

The command line is powerful because it does exactly what you tell it. Develop careful habits from the beginning.

Check location first Run pwd before moving, deleting or overwriting important files.
Preview wildcards Run ls pattern* before using the same wildcard with rm, mv or cp.
Do not overwrite accidentally Remember that > replaces files. Use >> only when you want to append.
Keep raw data read-only Store raw FASTQ data separately and avoid editing or deleting original files.
Use scripts and logs Commands saved in scripts are easier to review and reproduce than commands typed only once.
Back up important work Use external drives, institutional storage or version control where appropriate.

17. Mini exercises

Practice on a safe folder, not on real project data.

Exercise setup
mkdir -p ~/unix_practice/{data,results,scripts,logs}
cd ~/unix_practice

cat > data/samples.tsv << 'EOF'
sample_id	group	batch
S1	control	A
S2	control	A
S3	treated	B
S4	treated	B
S5	treated/C	B
EOF
  1. Use pwd and ls -lh to inspect the practice folder.
  2. Use head and cat to view data/samples.tsv.
  3. Use cut, sort and uniq -c to count samples per group.
  4. Use grep to find treated samples.
  5. Redirect the group counts to results/group_counts.txt.
  6. Create a script in scripts/ that repeats the analysis.

18. Beginner Unix command cheat sheet

Task Command Example
Show current directory pwd pwd
List files ls ls -lh
Change directory cd cd ~/projects
Create directory mkdir mkdir results
Copy file cp cp file.txt backup.txt
Move or rename mv mv old.txt new.txt
Remove file rm rm old.txt
View long file less less report.txt
First lines head head -n 20 file.txt
Last lines tail tail -n 20 log.txt
Search text grep grep BRCA variants.tsv
Count lines wc wc -l samples.tsv
Extract columns cut cut -f 1 samples.tsv
Sort lines sort sort names.txt
Unique values uniq sort names.txt | uniq -c
Find program which which python
Get help man or --help man ls

Frequently asked questions

What is the Unix command line?

The Unix command line is a text-based interface for controlling an operating system. Instead of clicking through menus, you type commands to navigate folders, inspect files, run software, automate tasks and process data.

Is Unix the same as Linux?

Unix is a family of operating-system ideas and standards. Linux is a Unix-like operating system widely used on servers, workstations, clusters and bioinformatics systems. Many command-line skills are shared across Linux, macOS and other Unix-like systems.

Which shell should beginners learn?

Bash is a good default for beginners because it is widely available on Linux and common on bioinformatics servers. Zsh is also popular, especially on macOS. Most basic commands in this tutorial work in both.

Can I damage files with the command line?

Yes. Commands such as rm, mv, chmod and commands using wildcards can change or delete files quickly. Always check your current directory, inspect file names and avoid running destructive commands until you understand them.

Why is the command line important for NGS data analysis?

Most NGS and bioinformatics tools are command-line programs. FASTQ files, BAM files, VCF files and large result tables are often processed more efficiently with shell commands and reproducible scripts than with graphical tools.

How do I get help for a Unix command?

Use commands such as man command, command --help or command -h. You can also search official documentation for Bash, GNU Coreutils, SAMtools, Nextflow and the specific tool you are using.