Quickstart

This guide walks through the three core steps of a ZMAP workflow: loading reference data, annotating a query dataset, and visualizing results.

Loading Reference Data

The load_zmap_h5ad() function downloads and caches ZMAP reference H5AD files. On Google Colab with Drive mounted, cache persists across sessions.

import zmap

# Default: processed_slim_tpm (best for visualization)
adata_ref = zmap.ref.load_zmap_h5ad()

# Symphony reference (required for label transfer)
adata_sym = zmap.ref.load_zmap_h5ad(kind="symphony")

Available presets:

"processed_slim_tpm" — TPM counts, best for plotting
"processed_slim" — raw counts only
"processed" — full dataset with intermediate layers
"raw" — unprocessed raw counts
"symphony" — Symphony reference for label transfer

Loading Consensus Markers

# Top 50 CellType markers as a dict
markers = zmap.ref.load_consensus_markers()

# Top 10 Tissue markers as a panel DataFrame (for dotplots)
panel = zmap.ref.load_consensus_markers(
    level="Tissue",
    n_per_group=10,
    format="panel",
)

Annotating a Query Dataset

The full pipeline — preprocess, embed, transfer labels, and plot — runs in a single call:

adata_query = zmap.predict.annotate_with_zmap(
    adata_query,
    query_raw_counts_source="counts",   # where raw counts live
    cluster_col="leiden",               # your cluster column
)

# Results are in adata_query.obs:
#   - ZMAP_CellType_predicted
#   - ZMAP_CellType_predicted_prob
#   - ZMAP_time_id

For finer control, the individual steps are also exposed:

# 1. Preprocess
zmap.predict.preprocess_adata_query(adata_query, counts_source="counts")

# 2. kNN label transfer (after Symphony embedding)
zmap.predict.predict_labels_kNN(
    adata_query, adata_ref,
    ref_label_col="ZMAP_CellType",
    n_neighbors=25,
)

# 2b. Optional: tissue-aware kNN (same step-4 summary interface)
zmap.predict.predict_label_tissue_kNN(
    adata_query, adata_ref,
    ref_label_col="ZMAP_CellType",
    tissue_mode="hard",
    knn_backend="auto",
)

# 3. Cluster-level summary
df = zmap.predict.aggregate_by_cluster(
    adata_query,
    cluster_col="leiden",
    label_space="ZMAP_CellType",
)

Dotplot Visualization

Two-panel gene view (cell types × timepoints and studies):

zmap.dotplot.gene_view(adata_ref, "sox2")

Sibling comparison dotplot:

zmap.dotplot.group_view(adata_ref, "hepatocyte")

Descendant / sub-cluster dotplot:

zmap.dotplot.group_descendants_vs_markers(
    adata_ref,
    parent="forebrain",
    parent_col="ZMAP_Tissue",
    child_col="ZMAP_Cluster",
)