Runs, Storage & Compare
Understand the data structures pybenchx produces and how to work with the .pybenchx/ storage layout programmatically.
Run data model
Section titled “Run data model”Objects live in pybench.run_model.
from pybench.run_model import Run, RunMeta, VariantResult, StatSummaryRunMeta: metadata captured once per invocation (tool_version,started_at, durations, profile, git info, environment strings, GC state).VariantResult: per-benchmark statistics plus optional raw samples.StatSummary: aggregate metrics (mean/median/stdev/min/max/p75/p99/p995).Run: the container saved to disk; it bindsRunMeta, the suite signature, and a list ofVariantResultinstances.
All dataclasses are serializable with dataclasses.asdict. Reporters and the storage layer rely on that contract.
Suite signatures
Section titled “Suite signatures”suite_signature is produced by pybench.suite_sig.suite_signature_from_cases. It hashes the loaded cases to detect when you rename or remove benchmarks across runs. Keep it if you want robust comparisons.
Storage helpers
Section titled “Storage helpers”pybench.run_store centralizes reading and writing .pybenchx/ artifacts.
from pathlib import Pathfrom pybench.run_store import ( get_root, save_run, save_baseline, load_run, load_baseline, list_recent_runs, list_baselines, clean_old_runs, get_storage_stats, default_export_path,)Root discovery
Section titled “Root discovery”get_root(base=None)returns the resolved.pybenchx/directory (creating it when missing).ensure_dirs(root)mirrors the directory bootstrap the CLI uses; normallyget_rootalready calls it for you.
Directory layout:
.pybenchx/├── runs/ # Timestamped history (auto-saved and labeled runs)├── baselines/ # Named JSON baselines├── exports/ # Reporter outputs (markdown/csv/json/chart/...)└── exports/chart/A README is auto-generated to help teammates understand the folder if it ends up in version control.
Saving runs
Section titled “Saving runs”save_run(run, label=None)stores the JSON representation underruns/using a timestamped filename (<ts>_<branch>@<sha>.<profile>.json).save_baseline(run, name)stores the run as<name>.jsonunderbaselines/.auto_save_run(run)mirrors the CLI behavior and is called after every execution; labeled runs are never deleted byclean_old_runs.
Loading and listing
Section titled “Loading and listing”load_run(path)loads a run from disk and reconstructsRunMeta,VariantResult, andStatSummaryobjects.load_baseline(name_or_path)resolves either a named baseline in.pybenchx/baselinesor a direct path.list_recent_runs(limit=10)returnsPathobjects sorted by modification time.list_baselines()returns sorted baseline names.get_latest_run()grabs the freshest auto-saved run.
Housekeeping
Section titled “Housekeeping”clean_old_runs(keep=100)deletes the oldest auto-saved runs, preserving labeled artifacts and the newestkeephistory entries.get_storage_stats()reports counts, disk usage, and age bounds for both runs and baselines—great for CLI status commands or dashboards.
Export paths
Section titled “Export paths”default_export_path(run, fmt, base_name=None) returns the location reporters use by default. Formats map to:
| Spec | Directory | Extension |
|---|---|---|
json | .pybenchx/exports/json/ | .json |
md/markdown | .pybenchx/exports/markdown/ | .md |
csv | .pybenchx/exports/csv/ | .csv |
chart | .pybenchx/exports/chart/ | .html |
Pass a base_name if you want deterministic filenames (otherwise the run’s timestamp-based stem is used).
Comparing runs programmatically
Section titled “Comparing runs programmatically”The CLI surfaces pybench compare and --compare/--vs. Underneath lives pybench.compare:
from pybench.compare import diff, parse_fail_policy, violates_policydiff(current_run, baseline_run, alpha=0.05)returns aDiffReportwith per-variant deltas, p-values, and asuite_changedflag when the benchmark lineup differs.parse_fail_policy("mean:5%,p99:10%")converts the CLI string into a dictionary for enforcement.violates_policy(report, policy)returnsTruewhen a regression exceeds your thresholds (and is statistically significant).
Combine them with pybench.run_store.load_baseline() to implement custom regression gates.
from pybench.compare import diff, parse_fail_policy, violates_policy
baseline_name, baseline = load_baseline("main")report = diff(current_run, baseline)policy = parse_fail_policy("mean:5%,p99:10%")
if violates_policy(report, policy): raise SystemExit("regression detected")Working with reporters
Section titled “Working with reporters”Reporters consume Run objects and optionally write to disk.
from pybench.reporters.table import render_tablefrom pybench.reporters.json import render_jsonEach reporter exports a render_* function (or a Reporter class) that takes a Run plus configuration such as the export path. See Reporters & Exports for the full list and extension points.
Suggested workflows
Section titled “Suggested workflows”- Persist nightly baselines by calling
save_baseline(run, "main")after CI passes. - Build dashboards by loading runs from
.pybenchx/runs/and uploading the JSON to your metrics system. - Enforce regression thresholds in custom scripts by calling
compare_runsand exiting with a non-zero code when a policy fails.