Skip to content

Runs, Storage & Compare

Understand the data structures pybenchx produces and how to work with the .pybenchx/ storage layout programmatically.

Objects live in pybench.run_model.

from pybench.run_model import Run, RunMeta, VariantResult, StatSummary
  • RunMeta: metadata captured once per invocation (tool_version, started_at, durations, profile, git info, environment strings, GC state).
  • VariantResult: per-benchmark statistics plus optional raw samples.
  • StatSummary: aggregate metrics (mean/median/stdev/min/max/p75/p99/p995).
  • Run: the container saved to disk; it binds RunMeta, the suite signature, and a list of VariantResult instances.

All dataclasses are serializable with dataclasses.asdict. Reporters and the storage layer rely on that contract.

suite_signature is produced by pybench.suite_sig.suite_signature_from_cases. It hashes the loaded cases to detect when you rename or remove benchmarks across runs. Keep it if you want robust comparisons.

pybench.run_store centralizes reading and writing .pybenchx/ artifacts.

from pathlib import Path
from pybench.run_store import (
get_root,
save_run,
save_baseline,
load_run,
load_baseline,
list_recent_runs,
list_baselines,
clean_old_runs,
get_storage_stats,
default_export_path,
)
  • get_root(base=None) returns the resolved .pybenchx/ directory (creating it when missing).
  • ensure_dirs(root) mirrors the directory bootstrap the CLI uses; normally get_root already calls it for you.

Directory layout:

.pybenchx/
├── runs/ # Timestamped history (auto-saved and labeled runs)
├── baselines/ # Named JSON baselines
├── exports/ # Reporter outputs (markdown/csv/json/chart/...)
└── exports/chart/

A README is auto-generated to help teammates understand the folder if it ends up in version control.

  • save_run(run, label=None) stores the JSON representation under runs/ using a timestamped filename (<ts>_<branch>@<sha>.<profile>.json).
  • save_baseline(run, name) stores the run as <name>.json under baselines/.
  • auto_save_run(run) mirrors the CLI behavior and is called after every execution; labeled runs are never deleted by clean_old_runs.
  • load_run(path) loads a run from disk and reconstructs RunMeta, VariantResult, and StatSummary objects.
  • load_baseline(name_or_path) resolves either a named baseline in .pybenchx/baselines or a direct path.
  • list_recent_runs(limit=10) returns Path objects sorted by modification time.
  • list_baselines() returns sorted baseline names.
  • get_latest_run() grabs the freshest auto-saved run.
  • clean_old_runs(keep=100) deletes the oldest auto-saved runs, preserving labeled artifacts and the newest keep history entries.
  • get_storage_stats() reports counts, disk usage, and age bounds for both runs and baselines—great for CLI status commands or dashboards.

default_export_path(run, fmt, base_name=None) returns the location reporters use by default. Formats map to:

SpecDirectoryExtension
json.pybenchx/exports/json/.json
md/markdown.pybenchx/exports/markdown/.md
csv.pybenchx/exports/csv/.csv
chart.pybenchx/exports/chart/.html

Pass a base_name if you want deterministic filenames (otherwise the run’s timestamp-based stem is used).

The CLI surfaces pybench compare and --compare/--vs. Underneath lives pybench.compare:

from pybench.compare import diff, parse_fail_policy, violates_policy
  • diff(current_run, baseline_run, alpha=0.05) returns a DiffReport with per-variant deltas, p-values, and a suite_changed flag when the benchmark lineup differs.
  • parse_fail_policy("mean:5%,p99:10%") converts the CLI string into a dictionary for enforcement.
  • violates_policy(report, policy) returns True when a regression exceeds your thresholds (and is statistically significant).

Combine them with pybench.run_store.load_baseline() to implement custom regression gates.

from pybench.compare import diff, parse_fail_policy, violates_policy
baseline_name, baseline = load_baseline("main")
report = diff(current_run, baseline)
policy = parse_fail_policy("mean:5%,p99:10%")
if violates_policy(report, policy):
raise SystemExit("regression detected")

Reporters consume Run objects and optionally write to disk.

from pybench.reporters.table import render_table
from pybench.reporters.json import render_json

Each reporter exports a render_* function (or a Reporter class) that takes a Run plus configuration such as the export path. See Reporters & Exports for the full list and extension points.

  • Persist nightly baselines by calling save_baseline(run, "main") after CI passes.
  • Build dashboards by loading runs from .pybenchx/runs/ and uploading the JSON to your metrics system.
  • Enforce regression thresholds in custom scripts by calling compare_runs and exiting with a non-zero code when a policy fails.