zmap.predict — Label Transfer & Annotation
End-to-end pipeline and lower-level functions for transferring ZMAP reference labels to query single-cell datasets via kNN voting in Symphony/Harmony embedding space.
Full Pipeline
- zmap.predict.annotate_with_zmap(adata_query, *, query_raw_counts_source, adata_ref=None, ref_kind='symphony', ref_label_col='ZMAP_CellType', label_space=None, query_truth_col=None, query_label_col=None, cluster_col=None, do_preprocess=True, do_map_embedding=True, do_ingest=True, tissue_aware=False, evaluate=False, n_neighbors=25, marker_validation=True, preprocess_kwargs=None, predict_kwargs=None, verbosity=2, debug=False, print_summary=None, show_plots=None, save_outputs=True, output_dir='zmap_predict')[source]
End-to-end ZMAP annotation pipeline: preprocess → embed → transfer labels → plot.
This is the primary entry point for annotating a new single-cell dataset with ZMAP reference labels. It chains the following steps:
Preprocess — normalize raw counts to TPM + log1p (
preprocess_adata_query).Embed — map the query into the ZMAP Symphony PCA embedding and ingest into the reference UMAP (requires
symphonypy).Label transfer — kNN voting to assign cell-type, tissue, and time labels (
predict_labels_kNN; optional tissue-aware mode viapredict_labels_tissue_kNN).Summarize — store a simplified run summary in
adata_query.uns['zmap_labels'][<space>]['Run Summary Simple'].Plot — overlay query cells on the reference UMAP with on-data labels (
plot_embedding_with_ondata_labels).Map labels (optional) — cross-tabulate ZMAP labels against an existing query labeling (e.g. Leiden clusters) via
map_query_labels.
All run parameters are stored in
adata_query.uns['zmap_labels'][label_space]['_run_config']so that on-demand accessors (plot_qc,plot_embedding,plot_time,plot_overlap_matrix,show_summary) can reproduce pipeline outputs with justadata_query— no extra arguments needed.- Parameters:
adata_query (ad.AnnData) – Query dataset to annotate. Modified in-place.
query_raw_counts_source (str) – Where raw integer counts are stored in
adata_query. Pass"X"to useadata_query.X, or a layer name (e.g."counts") to useadata_query.layers[query_raw_counts_source]. Required — no default.adata_ref (ad.AnnData | None) – Pre-loaded ZMAP reference object. When
None, the reference is loaded automatically usingload_zmap_h5ad(kind=ref_kind).ref_kind (str) – Which reference preset to load when
adata_ref=None. Passed toload_zmap_h5ad. Use"symphony"for label transfer.ref_label_col (str) – Column in the reference
obswhose labels are transferred to the query. Also controls which UMAP overlay plot is generated.label_space (str | None) – Namespace for output columns and
unskeys. Defaults toref_label_col.query_truth_col (str | None) – Ground-truth label column in
adata_query.obs, used for evaluation metrics whenevaluate=True.query_label_col (str | None) – Column in
adata_query.obscontaining user-defined cluster or label IDs (e.g."leiden"). When provided, enables cluster-level consensus aggregation and the label-overlap matrix. Recommended for most workflows.cluster_col (str | None) – Deprecated alias for
query_label_col.do_preprocess (bool) – Run TPM normalization + log1p on the query before mapping. Set to
Falseifadata_query.Xis already log-normalized.do_map_embedding (bool) – Run Symphony embedding mapping. Requires
symphonypy. Set toFalseif the query already has aX_pca_harmonyembedding.do_ingest (bool) – Ingest the query into the reference UMAP after Symphony mapping. Only applies when
do_map_embedding=True.tissue_aware (bool) – Use tissue-aware kNN transfer (
predict_labels_tissue_kNN). Equivalent topredict_kwargs={"use_tissue_aware_knn": True, "auto_pseudo_tissue": True}. WhenTrue, any additional tissue-aware options can still be passed viapredict_kwargs.evaluate (bool) – Compute accuracy and evaluation metrics against
query_truth_col. Requiresquery_truth_colto be set. Equivalent topredict_kwargs={"evaluate": True, "plot_eval_curves": True}.n_neighbors (int) – Number of nearest neighbors for kNN label voting. With Gaussian distance weighting (the default), 25 is robust — distant neighbors are downweighted automatically, so the effective neighborhood adapts to local density.
marker_validation (bool) – Validate predicted labels by comparing DE markers against the ZMAP consensus marker ledger. Discovers the top 20 DE genes per predicted group and measures overlap with the top 100 reference markers. Results are stored in
adata_query.uns['zmap_labels'][label_space]['Marker Validation'].preprocess_kwargs (Mapping[str, Any] | None) – Extra keyword arguments forwarded to
preprocess_adata_query(e.g.{"strict_counts": True}).predict_kwargs (Mapping[str, Any] | None) – Extra keyword arguments forwarded to
predict_labels_kNN. For common options, prefer the top-leveltissue_awareandevaluateparameters instead of passing dicts manually.verbosity (int) –
Controls how much output is printed and displayed inline:
0— silent (no print, no inline plots).1— progress lines only ([ZMAP] Step complete (Xs)).2— compact summary + UMAP overlay + combined QC figure.3— full display: all tables viadisplay(), all plots including heatmap.
debug (bool) – If
True, re-raise exceptions from plotting and aggregation steps instead of catching them. Useful for development and troubleshooting.print_summary (bool | None) – Deprecated. Use
verbosityinstead. When explicitly set,Falsecaps verbosity at 0.show_plots (bool | None) – Deprecated. Use
verbosityinstead. When explicitly set,Falsecaps verbosity at 1.save_outputs (bool) – Save cell annotations CSV, cluster summary CSV, and all figures to
{output_dir}/{label_space}/.output_dir (str)
- Returns:
The annotated query dataset (same object, modified in-place). Key additions to
adata_query:.obs[f"{label_space}_predicted"]— transferred cell labels..obs[f"{label_space}_prob"]— label confidence (0–1)..obs["ZMAP_time_id_predicted"]— predicted time (hpf)..obsm["X_umap"]— UMAP coordinates (if ingested)..uns['zmap_labels']['_last_space']— most recent label_space..uns['zmap_labels'][label_space]['_run_config']— stored run parameters for zero-arg on-demand plot accessors..uns['zmap_labels'][label_space]['Run Summary Simple']— key/value run summary..uns['zmap_labels'][label_space]['Cell Annotations']— per-cell table..uns['zmap_labels'][label_space]['Cluster Summary']— cluster consensus table (only whenquery_label_colis provided)..uns['zmap_labels'][label_space]['Label Mapping']— label overlap matrix (only whenquery_label_colis provided)..uns['zmap_labels'][label_space]['Marker Validation']— DE marker overlap with ZMAP reference ledger (only whenmarker_validation=True).
- Return type:
ad.AnnData
Examples
Minimal usage:
>>> adata = zmap.predict.annotate_with_zmap( ... adata_query, ... query_raw_counts_source="counts", ... query_label_col="leiden", ... )
Tissue-aware mode:
>>> adata = zmap.predict.annotate_with_zmap( ... adata_query, ... query_raw_counts_source="counts", ... tissue_aware=True, ... )
Evaluation mode with ground-truth labels:
>>> adata = zmap.predict.annotate_with_zmap( ... adata_query, ... query_raw_counts_source="counts", ... query_truth_col="manual_annotation", ... evaluate=True, ... )
Re-plot any output with zero arguments:
>>> zmap.predict.plot_qc(adata) >>> zmap.predict.plot_embedding(adata) >>> zmap.predict.plot_overlap_matrix(adata) >>> zmap.predict.show_summary(adata)
Preprocessing
- zmap.predict.preprocess_adata_query(adata_query, *, counts_source, target_sum=1000000.0, inplace=True, integer_tol=0.001, strict_counts=False)[source]
Normalize raw counts in a query AnnData for ZMAP/Symphony label transfer.
Reads raw counts from the specified location, performs library-size normalization (TPM-style) followed by log1p, and writes the result into
adata.X. Preprocessing metadata is recorded inadata.uns['ZMAP_preprocessing']['query'].This function is called automatically by
annotate_with_zmapwhendo_preprocess=True. Call it manually only if you need fine-grained control over normalization before running the pipeline.- Parameters:
adata_query (
AnnData) – Query dataset. Modified in-place wheninplace=True.counts_source (
str) – Where raw integer counts are stored. Pass"X"to useadata.X, or a layer name (e.g."counts") to useadata.layers[counts_source]. This parameter is required and has no default — you must be explicit.target_sum (
float) – Library size each cell is normalized to before log1p. The default produces TPM-scale values (counts per million).inplace (
bool) – IfTrue, modifyadata_queryin-place and return it. IfFalse, operate on a copy and return the copy.integer_tol (
float) – Tolerance used when checking whether values are integer-like. Values deviating from the nearest integer by more than this amount count towards the non-integer fraction.strict_counts (
bool) – IfTrue, raise aValueErrorwhen the data contains NaN/inf, negative values, or appears non-integer-like (> 1% of non-zero values deviate from an integer). IfFalse, emit a warning instead.
- Returns:
The preprocessed AnnData (same object when
inplace=True).- Return type:
- Raises:
KeyError – If
counts_sourceis not"X"and is not found inadata.layers.TypeError – If the raw data is not numeric.
ValueError – If
strict_counts=Trueand data quality checks fail.
Notes
After this call,
adata.Xcontains log-normalized (TPM + log1p) values regardless of what was inadata.Xbefore. The original counts incounts_sourceare not modified.
kNN Label Transfer
- zmap.predict.predict_labels_tissue_kNN(adata_query, adata_ref, *, ref_label_col, label_space=None, query_truth_col=None, ref_basis='X_pca_harmony', query_basis='X_pca_harmony', label_suffix=None, time_labels='time_id', n_neighbors=25, metric='cosine', ref_latent_key=None, query_latent_key=None, k=None, knn_metric=None, tissue_col=None, tissue_mode='hard', ref_tissue_col='ZMAP_Tissue', query_tissue_col='ZMAP_Tissue', tissue_penalty_lambda=1.0, hard_fallback_min_cells=10, knn_backend='auto', knn_device='auto', knn_nprobe=None, knn_l2norm=False, class_prior_alpha=0.0, pseudo_tissue_k=None, pseudo_tissue_threshold=0.0, pseudo_tissue_margin_threshold=0.0, auto_pseudo_tissue=True, fallback_to_plain_knn=True, pseudo_tissue_unknown_label='unknown', reuse_knn_cache=True, confidence_threshold=None, margin_threshold=0.0, include_unassigned=False, run_time_prediction=False, time_col='time_group_id', time_order=None, time_topk=5, time_hard_topk=5, time_trim_extremes=1, time_tau=0.0, time_monotone_delta=0, time_monotone_gamma=1.0, omit_labels=['unknown', 'nan', 'unassigned'], class_balance=None, time_balance=None, balance_gamma=1, balance_eps=1e-09, vote_weighting='gaussian', vote_sigma=None, time_stat_function='trimmed_mean', time_trim_alpha=0.25, time_winsor_alpha=0.25, time_distance='gaussian', time_sigma=None, time_inv_eps=1e-06, time_inv_power=1.0, evaluate=False, plot_eval_curves=False, plot_mapping_qc=True, save_mapping_qc=True, show_qc_plots=True, p_thresh=0.8, d_thresh=None, min_cells_per_label=15, apply_filters=True, output_dir='zmap_predict')[source]
Tissue-aware variant of step-3 label transfer.
This function computes a tissue-aware neighbor graph from the step-2 embedding (query_basis), caches it into adata_query.uns[‘zmap_neighbors’], then reuses predict_labels_kNN(…) for voting/QC/summary so step-4 inputs remain unchanged.
- Parameters:
ref_label_col (str)
label_space (str | None)
query_truth_col (str | None)
ref_basis (str)
query_basis (str)
label_suffix (str | None)
time_labels (str)
n_neighbors (int)
metric (str)
ref_latent_key (str | None)
query_latent_key (str | None)
k (int | None)
knn_metric (str | None)
tissue_col (str | None)
tissue_mode (str)
ref_tissue_col (str)
query_tissue_col (str)
tissue_penalty_lambda (float)
hard_fallback_min_cells (int | None)
knn_backend (str)
knn_device (str)
knn_nprobe (int | None)
knn_l2norm (bool)
class_prior_alpha (float)
pseudo_tissue_k (int | None)
pseudo_tissue_threshold (float)
pseudo_tissue_margin_threshold (float)
auto_pseudo_tissue (bool)
fallback_to_plain_knn (bool)
pseudo_tissue_unknown_label (str)
reuse_knn_cache (bool)
confidence_threshold (float | None)
margin_threshold (float)
include_unassigned (bool)
run_time_prediction (bool)
time_col (str)
time_topk (int)
time_hard_topk (int)
time_trim_extremes (int)
time_tau (float)
time_monotone_delta (int)
time_monotone_gamma (float)
class_balance (str | None)
time_balance (str | None)
balance_gamma (float)
balance_eps (float)
vote_weighting (str | None)
vote_sigma (float | None)
time_stat_function (str)
time_trim_alpha (float)
time_winsor_alpha (float)
time_distance (str | None)
time_sigma (float | None)
time_inv_eps (float)
time_inv_power (float)
evaluate (bool)
plot_eval_curves (bool)
plot_mapping_qc (bool)
save_mapping_qc (bool)
show_qc_plots (bool)
p_thresh (float | None)
d_thresh (float | None)
min_cells_per_label (int)
apply_filters (bool)
output_dir (str)
- zmap.predict.predict_labels_kNN(adata_query, adata_ref, *, ref_label_col, label_space=None, query_truth_col=None, ref_basis='X_pca_harmony', query_basis='X_pca_harmony', label_suffix=None, time_labels='time_id', n_neighbors=25, metric='cosine', knn_backend='auto', knn_device='auto', knn_nprobe=None, omit_labels=['unknown', 'nan', 'unassigned'], class_balance=None, time_balance=None, balance_gamma=1, balance_eps=1e-09, vote_weighting='gaussian', vote_sigma=None, time_stat_function='trimmed_mean', time_trim_alpha=0.25, time_winsor_alpha=0.25, time_distance='gaussian', time_sigma=None, time_inv_eps=1e-06, time_inv_power=1.0, evaluate=False, plot_eval_curves=False, plot_mapping_qc=True, save_mapping_qc=True, show_qc_plots=True, p_thresh=0.8, d_thresh=None, min_cells_per_label=15, apply_filters=True, output_dir='zmap_predict', expected_cache_mode='none')[source]
Transfer cell-type labels from a reference to a query dataset using kNN voting.
Builds a kNN index over the reference embedding, votes on labels using distance-weighted nearest neighbors, and writes per-cell predictions and confidence scores into
adata_query.obs. Reference cells with excluded labels (omit_labels) are removed from the index before building it, ensuring clean 1/k probability steps in the vote tallies.Results are stored under
adata_query.uns['zmap_labels'][label_space].- Parameters:
adata_query (
anndata.AnnData) – Query dataset to annotate.adata_ref (
anndata.AnnData) – Reference dataset providing labels and the embedding basis.ref_label_col (
str) – Column inadata_ref.obscontaining the labels to transfer.label_space (
str|None) – Namespace used for output columns andunskeys. Defaults toref_label_colwhenNone.query_truth_col (
str|None) – Optional ground-truth label column inadata_query.obsused for evaluation metrics whenevaluate=True.ref_basis (
str) –obsmkey inadata_refcontaining the reference embedding.query_basis (
str) –obsmkey inadata_querycontaining the query embedding.label_suffix (
str|None) – Suffix appended to the predicted label column name inadata_query.obs.time_labels (
str) – Column inadata_ref.obscontaining numeric developmental time values for time-score aggregation.n_neighbors (
int) – Number of nearest neighbors used for voting.metric (
str) – Distance metric for the kNN index. Passed directly to the underlying nearest-neighbor library.omit_labels (
list[str] |None) – Labels inref_label_colto exclude from the kNN index entirely. Cells carrying these labels are removed before index construction.class_balance (
str|None) – Strategy for reweighting votes by class frequency.Noneapplies no reweighting;"global_inverse"upweights underrepresented classes.time_balance (
str|None) – Strategy for reweighting votes by time-point frequency. Options mirrorclass_balance.balance_gamma (
float) – Exponent applied to inverse-frequency weights. Higher values increase the strength of balancing.vote_weighting (
str|None) – Distance weighting scheme applied to neighbor votes during label transfer.Noneuses uniform 1/k voting (discrete probabilities);"gaussian"applies a Gaussian kernel (continuous probabilities, recommended);"inverse"uses inverse-distance weights. Gaussian weighting produces better-calibrated confidence scores, smoother ROC/PR curves, and makesd_threshunnecessary.vote_sigma (
float|None) – Bandwidth for the Gaussian kernel whenvote_weighting="gaussian". IfNone, uses the per-cell median neighbor distance (adaptive).time_stat_function (
str) – Aggregation function for predicting a continuous time score per cell. One of"mean","median","trimmed_mean","winsor_mean".time_trim_alpha (
float) – Trim fraction used whentime_stat_function="trimmed_mean". Must be in[0, 0.5).time_winsor_alpha (
float) – Winsorization fraction used whentime_stat_function="winsor_mean". Must be in[0, 0.5).time_distance (
str|None) – Distance weighting scheme applied to neighbors when computing the time score.Noneuses uniform weights;"gaussian"applies a Gaussian kernel;"inverse"uses inverse-distance weights.time_sigma (
float|None) – Bandwidth for the Gaussian kernel. IfNone, uses the per-cell median neighbor distance.evaluate (
bool) – Compute accuracy and other evaluation metrics againstquery_truth_col. Requiresquery_truth_colto be set.plot_eval_curves (
bool) – Plot confidence-threshold curves whenevaluate=True.plot_mapping_qc (
bool) – Plot per-cell confidence and distance QC distributions after prediction.save_mapping_qc (
bool) – Save QC plots to./zmap/predict/.show_qc_plots (
bool) – Callplt.show()for QC plots. Set toFalsewhen display is managed by a higher-level wrapper (e.g.annotate_with_zmap).p_thresh (
float|None) – Minimum vote probability required to assign a label. Cells below this threshold are marked as unassigned. Withvote_weighting="gaussian", this is the only filter needed.d_thresh (
float|None) – Deprecated. Maximum allowable mean distance to neighbors. Kept for backward compatibility but redundant whenvote_weightingis set, as distance information is already incorporated into the vote probabilities.min_cells_per_label (
int) – Minimum number of reference cells a label must have to be included in voting. Labels with fewer cells are treated asomit_labels.apply_filters (
bool) – Applyp_threshfilter to produce the final predicted label column. Set toFalseto retain raw predictions.knn_backend (str)
knn_device (str)
knn_nprobe (int | None)
balance_eps (float)
time_inv_eps (float)
time_inv_power (float)
output_dir (str)
expected_cache_mode (str)
- Returns:
Results are written directly into
adata_query:adata_query.obs[f"{label_space}_predicted"]— predicted labels.adata_query.obs[f"{label_space}_prob"]— top-label vote probability.adata_query.obs["ZMAP_time_id_predicted"]— predicted developmental time.adata_query.uns['zmap_labels'][label_space]— full run metadata.
- Return type:
Post-processing & Summaries
- zmap.predict.summarize_knn_run(adata_query, label_key)[source]
Return a concise summary table for a completed kNN label-transfer run.
Reads the run metadata stored in
adata_query.uns['zmap_labels'][label_key]and formats the key statistics as a two-columnDataFrame.- Parameters:
adata_query (
anndata.AnnData) – Query dataset that has been annotated bypredict_labels_kNNorannotate_with_zmap.label_key (
str) – Thelabel_spaceused when the prediction was run (matches the key underadata_query.uns['zmap_labels']).
- Returns:
Two-column table with columns
["Key", "Value"]containing:label_space— label namespace used.n_neighbors— number of neighbors in the kNN run.metric— distance metric used.p_thresh— probability threshold applied.n_assigned— number of cells that received a label.pct_assigned— percentage of cells that received a label.
- Return type:
pd.DataFrame- Raises:
KeyError – If
label_keyis not found inadata_query.uns['zmap_labels'], or if the run metadata is missing a"Run Summary"entry.
- zmap.predict.aggregate_by_cluster(adata_query, cluster_col, label_space, *, save_csv=True, output_dir='zmap_predict')[source]
Aggregate cell-level ZMAP annotations to cluster-level consensus calls.
For each cluster in
cluster_col, identifies the plurality label among all QC-assigned (non-NA) cells, computes the fraction of assigned cells carrying that label (consensus fraction), the mean per-cell kNN vote probability for those cells, and the margin over the second-ranked label. Also reports raw coverage counts so the user can assess per-cluster annotation quality (e.g., clusters where most cells were rejected).- Parameters:
adata_query (
AnnData) – Query dataset annotated bypredict_labels_kNNorannotate_with_zmap.cluster_col (
str) – Column inadata_query.obscontaining user-defined cluster IDs (e.g."leiden").label_space (
str) – Label namespace used during prediction (must matchadata_query.uns['zmap_labels'][label_space]). Used to derive the predicted-label and probability column names.save_csv (
bool) – Write the cluster summary table to./zmap/predict/{label_space}_cluster_summary.csv.output_dir (str)
- Returns:
One row per cluster, sorted by cluster ID, with columns:
cluster— cluster identifier.n_cells_total— total cells in cluster.n_cells_assigned— cells with a non-NA predicted label (passed QC).pct_assigned— percentage of cells that passed QC.top_label— plurality ZMAP label among assigned cells.top_fraction— fraction of assigned cells carrying the top label.mean_prob— mean kNN vote probability of top-label cells.margin—top_fraction−second_fraction;NaNwhen fewer than 2 distinct labels are present.second_label— second-ranked label;NaNwhen only one label is present.second_fraction— fraction of second-ranked label;NaNwhen only one label is present.
- Return type:
- Raises:
KeyError – If
cluster_color the predicted-label column derived fromlabel_spaceis not found inadata_query.obs.
Notes
The aggregation operates only on cells whose predicted label is non-NA (i.e., cells that passed QC filters in
predict_labels_kNN). Rejected cells are counted inn_cells_totalbut excluded from voting, so thattop_fractionandmarginreflect the confidence of the accepted predictions rather than being diluted by noise.mean_probreflects the mean per-cell kNN vote probability for top-label cells only, and is distinct fromtop_fraction.top_fractioncaptures cluster-level consensus (how unanimously assigned cells agree);mean_probcaptures how confident the kNN classifier was for those individual cells.
- zmap.predict.build_cell_annotations_table(adata_query, label_space, *, cluster_col=None, time_col='ZMAP_time_id_predicted', save_csv=True, output_dir='zmap_predict')[source]
Build a concise per-cell annotation table from a completed ZMAP run.
Extracts the annotation-relevant columns from
adata_query.obsinto a clean, self-contained DataFrame suitable for inspection, CSV export, or downstream analysis. Only annotation columns produced by ZMAP are included — the fullobsis not copied.- Parameters:
adata_query (
AnnData) – Annotated query dataset.label_space (
str) – Label namespace used during prediction (matchesadata_query.uns['zmap_labels'][label_space]).cluster_col (
str|None) – If provided, include this column (e.g."leiden") as the first data column so that cells can be linked back to user-defined clusters.time_col (
str) – Column inadata_query.obscontaining predicted developmental time. Must match the column written bypredict_labels_kNN(which depends ontime_labelsandlabel_suffix).save_csv (
bool) – Write the table to{output_dir}/{label_space}_cell_annotations.csv.output_dir (str)
- Returns:
One row per cell.
cell_idis the obs index (cell barcode). Additional columns are included when present inadata_query.obs:{cluster_col}— user-defined cluster ID (if provided).{label_space}_predicted— assigned label (NAif rejected).{label_space}_prob— kNN vote probability (0–1).{label_space}_reject_flag—Trueif cell failed QC.{label_space}_reason— which filter triggered rejection.{time_col}— predicted developmental time (hpf).
- Return type:
Visualization
- zmap.predict.plot_embedding_with_ondata_labels(adata_ref, adata_test, *, color_key='ZMAP_Tissue_predicted', basis='X_umap', filter_na=True, palette=None, palette_uns_key=None, show_time_strip=True, time_key='ZMAP_time_id', time_strip_width_ratio=0.03, time_strip_kwargs=None, figsize=(6, 6), dpi=200, ref_size=2, ref_alpha=0.3, test_size=2, test_alpha=1.0, cmap='jet', frameon=False, sort_order=True, legend_loc='on data', legend_fontsize=5, legend_fontweight='normal', show_labels=True, recolor_labels_from_palette=True, text_stroke_width=1.0, replace_underscores=True, linebreak_from='_', linebreak_to='\\n', adjust_expand=(1.2, 1.5), arrowprops=None, min_arrow_len=0, match_arrow_color_to_text=True, arrow_alpha=0.8, ref_kwargs=None, test_kwargs=None, show=False, save=True, return_ax=False, output_dir='zmap_predict')[source]
Plot a query dataset overlaid on the reference embedding, with on-data labels and an optional vertical time distribution strip.
Renders two layers: (1) the full reference embedding as a faint grey background for spatial context, and (2) the query cells colored by a predicted label column. Labels are drawn directly on the embedding using
adjustTextto minimize overlap. A vertical colorbar histogram of predicted developmental time (ZMAP_time_id) can optionally be added as a strip on the right side of the figure.- Parameters:
adata_ref (
anndata.AnnData) – Reference dataset, used only for the background embedding.adata_test (
anndata.AnnData) – Query dataset with predicted labels to overlay.color_key (
str) – Column inadata_test.obscontaining the categorical labels to color and annotate. Typically a_predictedcolumn frompredict_labels_kNN.basis (
str) –obsmkey used for the 2D embedding coordinates in both datasets.filter_na (
bool) – Drop query cells withNaNincolor_keybefore plotting.palette (
dict|None) – Explicit{label: color}mapping. WhenNone, the palette is resolved viasync_zmap_colors.palette_uns_key (
str|None) –unskey to look up the palette inadata_test. Inferred fromcolor_keywhenNone.show_time_strip (
bool) – Draw a vertical colorbar histogram ofadata_test.obs[time_key]on the right side of the figure.time_key (
str) – Column inadata_test.obscontaining predicted developmental time values (hours post-fertilization) for the time strip.time_strip_width_ratio (
float) – Width of the time strip as a fraction of the total figure width.time_strip_kwargs (
dict|None) – Additional keyword arguments forwarded toplot_colorbar_histogram.figsize (
tuple[float,float]) – Figure size in inches(width, height).dpi (
int) – Figure resolution.ref_size (
float) – Scatter point size for reference background cells.ref_alpha (
float) – Opacity of reference background points. Lower values push the reference further into the background.test_size (
float) – Scatter point size for query (projected) cells.test_alpha (
float) – Opacity of query overlay points.cmap (
str) – Colormap used for the reference background scatter.legend_loc (
str) – Where to place the category legend."on data"draws labels directly at centroid positions; other values follow matplotlib legend conventions. Ignored whenshow_labels=False(forced to"none").legend_fontsize (
float) – Font size and weight for on-data legend labels.legend_fontweight (
str) – Font size and weight for on-data legend labels.show_labels (
bool) – IfTrue, draw on-data text labels at category centroids withadjustTextrepositioning and optional arrow connectors. IfFalse, suppress all text labels and arrows — only the colored scatter is shown, which is useful for clean figures or when the number of categories is too large for readable labels.replace_underscores (
bool) – Replace underscores in label strings with line breaks for cleaner on-data annotation.adjust_expand (
tuple[float,float]) –(x_expand, y_expand)passed toadjustTextfor label placement.match_arrow_color_to_text (
bool) – Color annotation arrows to match their corresponding text label.ref_kwargs (
dict|None) – Extra keyword arguments forwarded to the referencesc.pl.embeddingcall. Explicitref_alphatakes priority overalphain this dict.test_kwargs (
dict|None) – Extra keyword arguments forwarded to the querysc.pl.embeddingcall. Explicittest_alphatakes priority overalphain this dict.show (
bool) – Callplt.show()after rendering.save (
bool) – Save the figure as PNG and PDF tooutput_dir.return_ax (
bool) – Return the mainmatplotlib.axes.Axesobject.frameon (bool)
sort_order (bool)
recolor_labels_from_palette (bool)
text_stroke_width (float)
linebreak_from (str)
linebreak_to (str)
arrowprops (dict | None)
min_arrow_len (float)
arrow_alpha (float)
output_dir (str)
- Returns:
(fig, ax_umap, ax_strip)whenreturn_ax=True, otherwiseNone.- Return type:
- zmap.predict.plot_colorbar_histogram(values, *, bins=100, hist_range=None, value_min=None, value_max=None, cmap='Greys', vmin=0.0, vmax=1.0, bar_height=1.0, y_min=0, y_max=120, fig_width=8, fig_height=0.6, xlabel='Predicted Time (hpf)', xlabel_size=15, tick_label_size=15, title=None, title_size=13, log=False, nan_policy='drop', box=True, box_lw=1.2, box_color='black', ax=None)[source]
Plot a colorbar-styled horizontal histogram strip for a distribution of values.
Renders a single thin bar in which each bin is colored by bin density using a colormap, giving a compact “colorbar histogram” suitable for showing developmental time distributions alongside UMAP embeddings.
Used internally by
plot_embedding_with_ondata_labelsto draw the vertical time strip, but can also be called standalone.- Parameters:
values (
array-like) – Numeric values to histogram (e.g. predicted time in hpf). Non-finite values are handled according tonan_policy.bins (
intorarray-like, default100) – Number of histogram bins, or explicit bin edges.hist_range (
tupleoffloatorNone, defaultNone) –(min, max)range for the histogram. Inferred from data whenNone.value_min (
floatorNone, defaultNone) – If provided, clip values to[value_min, value_max]before binning. Also setshist_rangewhen both are given andhist_rangeisNone.value_max (
floatorNone, defaultNone) – If provided, clip values to[value_min, value_max]before binning. Also setshist_rangewhen both are given andhist_rangeisNone.cmap (
str, default :py:class:``”Greys”:py:class:``) – Matplotlib colormap name used to color bins by density.vmin (
float, default0.0and1.0) – Colormap normalization range (applied to normalized bin counts).vmax (
float, default0.0and1.0) – Colormap normalization range (applied to normalized bin counts).bar_height (
float, default1.0) – Height of the histogram bar in data units.y_min (
float, default0and120) – Y-axis limits for the plot.y_maxdefaults toy_min + bar_heightwhen set toNone.y_max (
float, default0and120) – Y-axis limits for the plot.y_maxdefaults toy_min + bar_heightwhen set toNone.fig_width (
float, default8and0.6) – Figure size in inches. Only used whenax=None.fig_height (
float, default8and0.6) – Figure size in inches. Only used whenax=None.xlabel (
str, default :py:class:``”Predicted Time (hpf)”:py:class:``) – X-axis label.xlabel_size (
float, default15) – Font sizes for the axis label and tick labels.tick_label_size (
float, default15) – Font sizes for the axis label and tick labels.title (
strorNone, defaultNone) – Optional title drawn above the strip.title_size (
float, default13) – Font size for the title.log (
bool, defaultFalse) – IfTrue, applylog1pto bin counts before coloring.nan_policy (
str, default :py:class:``”drop”:py:class:``) – How to handle non-finite values. Currently only"drop"is supported.box (
bool, defaultTrue) – Draw a bounding box around the strip.box_lw (
floatandstr, default1.2and :py:class:``”black”:py:class:``) – Line width and color for the bounding box.box_color (
floatandstr, default1.2and :py:class:``”black”:py:class:``) – Line width and color for the bounding box.ax (
matplotlib.axes.AxesorNone, defaultNone) – Axes to draw into. IfNone, a new figure and axes are created.
- Returns:
The axes containing the colorbar histogram strip.
- Return type:
- zmap.predict.sync_zmap_colors(adata, obs_key='ZMAP_CellType', *, ref_adata=None, ref_obs_key=None, unknown_color='#BDBDBD')[source]
Synchronize a categorical color palette between a query and reference AnnData.
Ensures that
adata.uns[f"{obs_key}_colors"]is populated and aligned with the categories inadata.obs[obs_key]. The palette is sourced fromadata.unsdirectly if already present, or copied fromref_adataif provided.Called automatically by
plot_embedding_with_ondata_labels. Call manually when you need consistent colors across multiple plots or custom figure code.- Parameters:
adata (
anndata.AnnData) – Dataset whose color palette to set or update. Modified in-place.obs_key (
str, default :py:class:``”ZMAP_CellType”:py:class:``) – Column inadata.obswhose categories need a synchronized palette.ref_adata (
anndata.AnnDataorNone, defaultNone) – Reference dataset from which to copy the palette whenadatadoes not already have one. Looks forref_adata.uns[f"{ref_obs_key}_color_map"]orref_adata.uns[f"{ref_obs_key}_colors"].ref_obs_key (
strorNone, defaultNone) – Column inref_adata.obsto use as the color source. Defaults toobs_keywhenNone.unknown_color (
str, default :py:class:``”#BDBDBD”:py:class:``) – Hex color assigned to any category not found in the palette.
- Returns:
Ordered list of hex color strings, one per category in
adata.obs[obs_key].cat.categories.- Return type:
- Raises:
KeyError – If no palette is found in
adata.unsandref_adatais either not provided or does not contain a matching palette.
- zmap.predict.map_query_labels(adata_query, obs_A, obs_B, *, normalize='row', title=None, reorder_columns=True, reorder_rows=True, cmap=matplotlib.pyplot.cm.Blues, overlay_values=False, vmin=None, vmax=None, show_plot=True, return_df=False, figsize=8, save_plots=True, save_mapping=True, file_prefix=None, output_dir='zmap_predict')[source]
Compute and visualize the overlap between two label columns in a query AnnData.
Builds a contingency matrix comparing two categorical
obscolumns (e.g. ZMAP predicted labels vs. Leiden clusters), applies optional row- or column-wise normalization, and plots the result as a heatmap. Also computes a per-group best-match mapping table.- Parameters:
adata_query (
anndata.AnnData) – Annotated query dataset containing both label columns.obs_A (
str) – Column inadata_query.obsused as the reference labeling (appears as columns in the overlap matrix).obs_B (
str) – Column inadata_query.obsused as the query labeling (appears as rows in the overlap matrix).normalize (
strorNone, default :py:class:``”row”:py:class:``) –Normalization applied to the raw overlap counts before plotting. One of:
"row"— each row sums to 1 (fraction of obs_B in each obs_A)."column"— each column sums to 1 (fraction of obs_A in each obs_B).None— plot raw cell counts.
Trueis treated as"row"andFalseasNonefor backward compatibility.title (
strorNone, defaultNone) – Plot title. Auto-generated fromobs_Aandobs_BwhenNone.reorder_columns (
bool, defaultTrue) – Sort columns by the position of their best-matching row.reorder_rows (
bool, defaultTrue) – Sort rows by the position of their best-matching column.cmap (
matplotlib colormap, defaultplt.cm.Blues) – Colormap for the heatmap.overlay_values (
bool, defaultFalse) – Overlay numeric values in each heatmap cell.vmin (
floatorNone, defaultNone) – Colormap normalization limits.vmax (
floatorNone, defaultNone) – Colormap normalization limits.show_plot (
bool, defaultTrue) – Display the plot immediately.return_df (
bool, defaultFalse) – Return the best-match mapping table as apd.DataFrame.figsize (
float, default8) – Figure size (passed as both width and height in inches).save_plots (
bool, defaultTrue) – Save PNG and PDF of the heatmap to./zmap/predict/.save_mapping (
bool, defaultTrue) – Save the best-match mapping table as a CSV to./zmap/predict/.file_prefix (str | None)
output_dir (str)
- Returns:
When
return_df=True, a per-group best-match table mapping each obs_B label to its most-overlapping obs_A label.Noneotherwise.- Return type:
pd.DataFrameorNone