Spatial Omics Tutorial

Spatial transcriptomics data analysis: from tissue coordinates to biological context.

A practical tutorial for sequencing-based and imaging-based spatial transcriptomics projects. It covers experimental design, metadata, spatial expression matrices, histology images, quality control, normalization, tissue segmentation, cell-type mapping, deconvolution, spatial domains, spatially variable genes, differential expression, visualization, multi-omics integration and reproducible reporting.

1. Overview: what is spatial transcriptomics?

Spatial transcriptomics measures RNA abundance while preserving where each measurement came from in a tissue section. It links gene expression to coordinates, histology, tissue compartments and neighbouring cells. This makes it useful for cancer, immunology, neuroscience, developmental biology, organ atlases, pathology and biomarker discovery.

Capture spatial signalMeasure RNA at spots, pixels, cells or subcellular coordinates.
Connect to morphologyUse tissue images, masks and annotations to interpret spatial patterns.
Interpret biologyIdentify regions, niches, cell types, gradients and disease-associated programs.
Core principle: spatial transcriptomics is both transcriptomics and image-aware tissue analysis. The strongest conclusions combine RNA signal, spatial coordinates, morphology, metadata and biological validation.

2. Spatial transcriptomics platform types

Spatial technologies differ in resolution, gene coverage, tissue compatibility and output structure. The analysis should match the platform.

Platform categoryTypical dataAnalysis focus
Spot-based sequencingExpression counts per spatial spot, often with histology image.Spot QC, deconvolution, spatial domains and tissue-region analysis.
High-resolution bead or pixel-based sequencingCounts per small coordinate bin or bead.Aggregation, segmentation, spatial smoothing and cell-type mapping.
Targeted imaging-based transcriptomicsDecoded transcript molecules with x/y coordinates.Cell segmentation, molecule assignment, cell-level expression and neighbourhood analysis.
In situ sequencingTargeted or panel-based RNA signal in tissue coordinates.Image registration, spot/cell calling, panel interpretation and spatial statistics.
Region-of-interest profilingExpression per selected tissue region.ROI annotation, region-level differential expression and morphology-aware comparison.
Spatial multiomicsRNA plus protein, chromatin, morphology or other modalities.Multi-modal integration, region annotation and mechanism discovery.

3. Experimental design

Spatial transcriptomics studies require both molecular and histological design. Tissue section quality, region selection, replication and imaging are critical.

Questions to answer early

  • Which biological question is primary: tissue architecture, tumour microenvironment, disease region, developmental pattern, cell neighbourhood or biomarker discovery?
  • Is the method whole-transcriptome, targeted panel, imaging-based, spot-based or single-cell-resolution?
  • How many biological replicates, tissue sections and regions are available per condition?
  • Are tissue sections comparable in orientation, anatomical region and preservation?
  • Are pathologist or expert tissue annotations available?
  • Is a matched single-cell or single-nucleus RNA-seq reference available for deconvolution or mapping?
  • Which covariates should be recorded: batch, slide, section, capture area, staining, fixation, ischemia time or imaging settings?
A spatial dataset can show beautiful patterns that are driven by tissue section differences, capture quality, batch, anatomy or sampling. Design and metadata are essential for reliable interpretation.

4. Input files and reference resources

InputTypical formatUse
Expression matrixHDF5, MTX, CSV/TSV, h5ad, RDS or platform-specific object.Gene expression per spot, cell, pixel or region.
Spatial coordinatesCSV/TSV, JSON, parquet or object metadata.Links expression values to tissue positions.
Histology imageTIFF, SVS, PNG, JPEG or OME-TIFF.Tissue morphology, masks, annotations and visualization.
Segmentation masksTIFF, PNG, GeoJSON, HDF5 or platform-specific files.Cell boundaries or tissue compartments.
Gene annotationGTF/GFF3 or reference package.Gene naming, mitochondrial genes and annotation consistency.
Single-cell referenceh5ad, RDS, loom or count matrix with cell labels.Cell-type mapping and deconvolution.
Sample metadataTSV/CSV.Condition, replicate, batch, tissue region and slide information.

5. Metadata for spatial studies

Metadata must describe both molecular samples and tissue sections. Spatial studies often fail when section location or imaging metadata are missing.

Example spatial transcriptomics metadata
sample_id	donor_id	condition	tissue	section_id	slide_id	region	batch	image_file
S1	D001	control	colon	sec01	slideA	mucosa	A	S1_HE.tif
S2	D002	control	colon	sec01	slideA	mucosa	A	S2_HE.tif
S3	D003	disease	colon	sec02	slideB	lesion	B	S3_HE.tif
S4	D004	disease	colon	sec02	slideB	lesion	B	S4_HE.tif

Recommended metadata fields

  • Sample, donor, condition, replicate, section, slide, capture area and batch.
  • Tissue type, anatomical region, pathology annotation and staining method.
  • Image file names, resolution, scale factors and coordinate system.
  • Platform, chemistry, gene panel or whole-transcriptome reference.
  • Matched single-cell reference or complementary omics datasets, if available.

6. Raw processing and expression-matrix generation

Raw processing depends on the platform. Sequencing-based workflows may start from FASTQ files and spatial barcodes, while imaging-based workflows may start from decoded transcript tables and cell segmentation.

Starting dataProcessing goalTypical output
FASTQ filesAlign reads, assign spatial barcodes and count genes.Spatial count matrix, tissue positions and image alignment files.
Decoded transcript tableAssign molecules to cells or spatial bins.Cell-by-gene or bin-by-gene matrix with coordinates.
ROI-level expressionNormalize and annotate selected regions.Region-by-gene expression table and ROI metadata.
Multi-modal filesLink RNA to protein, morphology or chromatin data.Integrated spatial object and modality-specific matrices.
Keep raw matrices, filtered matrices and normalized matrices separate. Spatial coordinates and image scale factors should be preserved exactly.

7. Histology and image data

Images provide tissue context and help identify anatomical regions, artefacts, tissue boundaries and morphology-driven expression patterns.

Tissue maskDefines which spatial locations are on tissue versus background.
Image registrationLinks molecular coordinates to tissue image coordinates.
Pathology annotationLabels tumour, stroma, immune regions, necrosis or anatomical compartments.
Image featuresMorphology can be used for spatial domain detection or multimodal analysis.
Coordinate orientation and scale factors can differ between platforms and image formats. Always verify that expression overlays align with tissue morphology.

8. Quality control

Spatial QC should be performed at sample, image, spot/cell and gene levels. QC metrics should be plotted both as distributions and directly on tissue coordinates.

MetricMeaningInterpretation
Total counts per spot/cellRNA capture or molecule count.Low values may indicate poor tissue capture or background; high values may reflect dense tissue or artefacts.
Detected genesGene complexity per spatial unit.Low complexity can indicate low quality or background.
Mitochondrial fractionFraction of mitochondrial gene counts.High values may indicate stress, damage or tissue-specific biology.
Tissue coverageFraction of spatial units overlapping tissue.Poor coverage can limit interpretation.
Background signalExpression detected outside tissue or in negative controls.Can indicate ambient RNA, imaging artefacts or segmentation issues.
Spatial outliersLocations with unusual expression or QC metrics.May represent tissue artefacts, folds, necrosis, dust or biological regions.
Example Scanpy-style QC for spatial data
import scanpy as sc

adata = sc.read_visium("data/spatial/S1")
adata.var["mt"] = adata.var_names.str.startswith("MT-")
sc.pp.calculate_qc_metrics(adata, qc_vars=["mt"], inplace=True)

# Example dataset-specific filters
adata = adata[adata.obs["total_counts"] > 500, :]
adata = adata[adata.obs["n_genes_by_counts"] > 200, :]
adata = adata[adata.obs["pct_counts_mt"] < 30, :]

9. Normalization and transformation

Spatial data often require normalization for library size, capture efficiency and platform-specific biases. Avoid losing raw counts, because raw counts are needed for many statistical models.

ApproachUseCaution
Library-size normalizationExploratory visualization and clustering.Can be affected by tissue density and composition.
Log transformationStabilizes expression for visualization.Not always appropriate for count-based testing.
Variance-stabilizing modelsFeature selection and dimension reduction.Model assumptions should match platform and data type.
Spatial smoothingImprove visualization or region-level signal.Can blur boundaries and create artificial gradients if overused.

10. Spatial visualization

Spatial visualization is central to quality control and interpretation. Expression should be shown on tissue coordinates, not only as ordinary PCA or UMAP plots.

Example spatial plots with Scanpy
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

sc.pl.spatial(
    adata,
    color=["total_counts", "n_genes_by_counts", "pct_counts_mt"],
    img_key="hires"
)

sc.pl.spatial(
    adata,
    color=["EPCAM", "COL1A1", "PTPRC"],
    img_key="hires"
)

Useful spatial plots

  • QC metrics over tissue: counts, genes and mitochondrial fraction.
  • Marker genes over tissue: epithelial, immune, stromal or tissue-specific markers.
  • Spatial domains or clusters overlaid on histology.
  • Deconvolved cell-type abundance maps.
  • Differential-expression or pathway-score maps.

11. Tissue and cell segmentation

Segmentation assigns pixels, transcripts or expression measurements to tissue regions or cells. It is essential for imaging-based spatial transcriptomics and useful for morphology-aware analysis.

Segmentation typePurposeQuality checks
Tissue segmentationSeparate tissue from background.Check boundaries, folds, holes and non-tissue artefacts.
Region segmentationDefine tumour, stroma, immune regions or anatomical compartments.Review against histology and expert annotation.
Cell segmentationAssign molecules or pixels to individual cells.Check cell size, shape, nuclei alignment and transcript assignment.
Niche segmentationDefine neighbourhoods based on expression and spatial context.Check stability across sections and samples.
Cell segmentation errors can strongly affect cell-level spatial transcriptomics. Always inspect representative regions and quantify segmentation outliers.

12. Spot-level and region-level analysis

In spot-based platforms, each spot may contain multiple cells. Spot-level analysis is useful for identifying tissue regions and expression gradients, while deconvolution can estimate cell-type composition.

Spot clusteringGroups spatial locations by expression profiles.
Region markersIdentify genes enriched in anatomical or computational domains.
Pathway scoresMap pathway or gene-set activity across tissue.
Spatial smoothingCan help visualize gradients, but should not replace statistical testing.

13. Cell-type deconvolution

Deconvolution estimates which cell types contribute to each spatial spot. It is commonly used when spot size is larger than individual cells.

InputPurposeNotes
Spatial expression matrixObserved expression per spot.May contain mixed cell types.
Single-cell referenceCell-type expression signatures.Should match tissue, species, condition and technology where possible.
Marker genesSupport interpretation of deconvolution results.Markers should be specific and validated.
Spatial priorsNeighbourhood or morphology information.Can improve mapping but may introduce assumptions.
Deconvolution results are estimates, not direct measurements. Interpret them together with marker gene maps, histology and known tissue biology.

14. Spatial domain detection

Spatial domain detection identifies tissue regions with shared expression profiles, morphology or cell-type composition. Domains can correspond to tumour regions, stromal zones, immune niches, cortical layers, developmental compartments or injury areas.

Expression domainsClusters based on spatial expression profiles.
Morphology-aware domainsCombine image features with expression data.
Neighbour-aware domainsUse spatial adjacency to encourage coherent regions.
Expert reviewValidate domains against histology and marker genes.

15. Cell neighbourhood and niche analysis

At single-cell or near-single-cell resolution, spatial transcriptomics can analyze which cell types are near each other and how local neighbourhoods differ between conditions.

AnalysisQuestionInterpretation
Neighbour enrichmentWhich cell types are unusually close together?Suggests spatial organization or interaction hypotheses.
Niche detectionWhat recurring local cell-type mixtures exist?Identifies tissue microenvironments.
Distance-to-region analysisHow does expression change near boundaries?Useful for tumour-stroma, lesion-normal or anatomical gradients.
Ligand-receptor spatial analysisWhich neighbouring cells may communicate?Generates hypotheses; protein or perturbation validation may be needed.

16. Spatially variable genes

Spatially variable genes show non-random spatial expression patterns. They can reveal anatomical markers, gradients, local activation programs or disease-associated regions.

Cluster markersGenes enriched in spatial clusters or domains.
Spatial autocorrelationGenes whose expression is correlated across nearby locations.
Boundary genesGenes changing near tissue interfaces or anatomical transitions.
Gradient genesGenes changing along spatial axes or tissue structures.
Spatial variability can be driven by tissue composition, sequencing depth, section artefacts or morphology. Use QC and biological validation to support interpretation.

17. Differential expression and differential region analysis

Differential analysis can compare genes between tissue regions, spatial domains, cell types or biological conditions. The correct statistical unit is often the biological replicate or tissue section, not only individual spots.

ComparisonUse caseCaution
Region versus regionCompare tumour versus stroma or lesion versus normal tissue.Region labels should be consistent and reviewed.
Condition versus conditionCompare disease and control spatial expression.Use replicate-aware models where possible.
Domain marker testingIdentify genes defining spatial domains.Domains may contain different cell-type compositions.
Cell-type-specific spatial DECompare expression in mapped cell types across locations or conditions.Requires reliable cell-type mapping or segmentation.

18. Integration with single-cell RNA-seq

Single-cell or single-nucleus RNA-seq references are often used to annotate spatial data, estimate cell-type composition and transfer cell-state labels.

Prepare referenceAnnotate single-cell clusters and clean low-quality cells.
Map to spaceDeconvolve spots or transfer labels to spatial cells.
Validate in tissueCheck marker gene maps and histology consistency.
A single-cell reference from the wrong tissue, disease state or technology can mislead spatial annotation. Use matched references where possible.

19. Spatial multiomics

Spatial transcriptomics can be combined with protein imaging, immunofluorescence, H&E morphology, spatial ATAC, DNA variants, methylation, metabolomics or clinical/pathology data.

IntegrationQuestionInterpretation
RNA + histologyHow do expression patterns relate to morphology?Links molecular states to tissue architecture.
RNA + proteinDo transcripts match protein expression or immune markers?Improves cell-type and pathway interpretation.
RNA + scRNA-seqWhich cell types or states occupy spatial locations?Enables deconvolution and label transfer.
RNA + clinical metadataWhich spatial features associate with outcome or treatment response?Requires careful statistics and validation cohorts.

20. Statistical considerations

Spatial transcriptomics data contain dependencies that ordinary expression analysis may ignore. Nearby spots can be correlated, spots from the same tissue section are not independent, and tissue morphology can confound comparisons.

Spatial autocorrelationNeighbouring locations may share expression patterns.
Replicate structureUse sample or section-level replication for condition-level claims.
Multiple testingSpatial gene and region tests need correction for many hypotheses.
Compositional effectsRegion-level expression may reflect cell-type mixture changes.

21. Example spatial transcriptomics workflow

The following simplified workflow illustrates a common spot-based spatial transcriptomics route in Python. Real projects should adapt file paths, platform-specific import functions, QC thresholds and models.

Minimal spot-based workflow with Scanpy/Squidpy-style steps
import scanpy as sc
import squidpy as sq

# 1. Import spatial object
adata = sc.read_visium("data/spatial/S1")

# 2. QC metrics
adata.var["mt"] = adata.var_names.str.startswith("MT-")
sc.pp.calculate_qc_metrics(adata, qc_vars=["mt"], inplace=True)

# 3. Dataset-specific filtering
adata = adata[adata.obs["total_counts"] > 500, :]
adata = adata[adata.obs["n_genes_by_counts"] > 200, :]
adata = adata[adata.obs["pct_counts_mt"] < 30, :]

# 4. Preserve raw counts
adata.layers["counts"] = adata.X.copy()

# 5. Normalize and transform
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

# 6. Variable genes, PCA and clustering
sc.pp.highly_variable_genes(adata, n_top_genes=3000)
sc.tl.pca(adata)
sc.pp.neighbors(adata, n_neighbors=15, n_pcs=30)
sc.tl.leiden(adata, resolution=0.6)

# 7. Spatial neighbours and spatial statistics
sq.gr.spatial_neighbors(adata)
sq.gr.spatial_autocorr(adata, mode="moran")

# 8. Visualize spatial clusters and markers
sc.pl.spatial(adata, color=["leiden", "EPCAM", "COL1A1", "PTPRC"])

# 9. Save processed object
adata.write("results/objects/S1_spatial_processed.h5ad")

22. Deliverables and reporting

  • Raw and filtered spatial expression matrices.
  • Spatial coordinate files, image scale factors and registered histology images.
  • Sample-level and spot/cell-level QC reports.
  • Spatial plots of QC metrics, marker genes and tissue regions.
  • Annotated spatial objects such as h5ad, Seurat RDS or platform-specific files.
  • Tissue segmentation, region annotation or cell segmentation outputs when applicable.
  • Spatial domain, cluster and marker-gene tables.
  • Deconvolution or label-transfer results with reference details.
  • Spatially variable gene tables and differential analysis outputs.
  • High-resolution report figures suitable for manuscripts or internal reporting.
  • Methods section with platform, software versions, reference files, parameters and limitations.

23. Spatial transcriptomics analysis cheat sheet

StepCommon toolsMain outputs
Raw processingPlatform software, Space Ranger-style workflows, custom import scriptsSpatial count matrices, coordinates and image metadata.
Object handlingSeurat, Scanpy, AnnData, SpatialExperimentIntegrated analysis objects.
QCSeurat, Scanpy, Squidpy, Giotto, custom R/PythonCounts, genes, mitochondrial fraction and tissue-coverage plots.
Image analysisQuPath, napari, Cellpose, StarDist, SquidpyMasks, segmentation and image-derived features.
NormalizationSeurat, Scanpy, SCTransform-style workflows, scran-style methodsNormalized expression and variable genes.
Deconvolutioncell2location, RCTD, SPOTlight, stereoscope-style methodsCell-type abundance per spot or location.
Spatial domainsBayesSpace, Giotto, Squidpy, SpaGCN-style methodsTissue regions and spatial clusters.
Spatial genesSpatialDE-style methods, Moran's I, Squidpy, Seurat/Scanpy workflowsSpatially variable gene tables.
Differential analysisDESeq2/edgeR pseudobulk, mixed models, region-level modelsDifferential expression or differential region tables.
ReportingR Markdown, Quarto, notebooks, workflow reports, AI-assisted summariesQC, spatial figures, annotations, methods and interpretation.

Frequently asked questions

What is spatial transcriptomics?

Spatial transcriptomics measures gene expression while preserving spatial information from tissue sections or cells. It connects transcriptomic profiles to tissue architecture, cell neighborhoods and anatomical regions.

How is spatial transcriptomics different from single-cell RNA-seq?

Single-cell RNA-seq usually dissociates tissue and loses spatial context, while spatial transcriptomics retains coordinates. Spatial methods may have lower transcriptome coverage or lower cellular resolution depending on the platform.

What are the main types of spatial transcriptomics data?

Common categories include spot-based sequencing, bead-based sequencing, targeted imaging-based methods, in situ sequencing, region-of-interest profiling and single-cell spatial multiomics.

What are spots, pixels and cells in spatial data?

A spot or pixel is a measured spatial location with expression counts. Depending on platform resolution, a spot may contain many cells, a few cells, one cell or subcellular signal. Cell segmentation methods try to assign transcripts to individual cells.

What are the main QC metrics?

Important metrics include reads or molecules per spot/cell, detected genes, mitochondrial fraction, tissue coverage, background signal, spot/cell segmentation quality, spatial outliers and replicate consistency.

Do I need histology images for spatial transcriptomics analysis?

Most spatial workflows benefit strongly from histology or fluorescence images because images define tissue boundaries, morphology, anatomical regions and quality-control context.

What is deconvolution in spatial transcriptomics?

Deconvolution estimates the cell-type composition of each spatial spot, often using a single-cell RNA-seq reference. It is especially useful when each spot contains multiple cells.

What is spatial domain detection?

Spatial domain detection identifies tissue regions or niches with similar expression, morphology or cell-type composition. It can reveal anatomical compartments, tumour microenvironments or disease-associated regions.

Can AI help with spatial transcriptomics?

AI can help with image-aware QC, tissue segmentation, cell segmentation, annotation, spatial pattern discovery, integration with single-cell references, report drafting and interpretation, while the analysis should remain reproducible and reviewed by domain experts.