Runs, Storage & Compare
Understand the data structures pybenchx produces and how to work with the .pybenchx/
storage layout programmatically.
Run data model
Section titled “Run data model”Objects live in pybench.run_model
.
from pybench.run_model import Run, RunMeta, VariantResult, StatSummary
RunMeta
: metadata captured once per invocation (tool_version
,started_at
, durations, profile, git info, environment strings, GC state).VariantResult
: per-benchmark statistics plus optional raw samples.StatSummary
: aggregate metrics (mean/median/stdev/min/max/p75/p99/p995).Run
: the container saved to disk; it bindsRunMeta
, the suite signature, and a list ofVariantResult
instances.
All dataclasses are serializable with dataclasses.asdict
. Reporters and the storage layer rely on that contract.
Suite signatures
Section titled “Suite signatures”suite_signature
is produced by pybench.suite_sig.suite_signature_from_cases
. It hashes the loaded cases to detect when you rename or remove benchmarks across runs. Keep it if you want robust comparisons.
Storage helpers
Section titled “Storage helpers”pybench.run_store
centralizes reading and writing .pybenchx/
artifacts.
from pathlib import Pathfrom pybench.run_store import ( get_root, save_run, save_baseline, load_run, load_baseline, list_recent_runs, list_baselines, clean_old_runs, get_storage_stats, default_export_path,)
Root discovery
Section titled “Root discovery”get_root(base=None)
returns the resolved.pybenchx/
directory (creating it when missing).ensure_dirs(root)
mirrors the directory bootstrap the CLI uses; normallyget_root
already calls it for you.
Directory layout:
.pybenchx/├── runs/ # Timestamped history (auto-saved and labeled runs)├── baselines/ # Named JSON baselines├── exports/ # Reporter outputs (markdown/csv/json/chart/...)└── exports/chart/
A README is auto-generated to help teammates understand the folder if it ends up in version control.
Saving runs
Section titled “Saving runs”save_run(run, label=None)
stores the JSON representation underruns/
using a timestamped filename (<ts>_<branch>@<sha>.<profile>.json
).save_baseline(run, name)
stores the run as<name>.json
underbaselines/
.auto_save_run(run)
mirrors the CLI behavior and is called after every execution; labeled runs are never deleted byclean_old_runs
.
Loading and listing
Section titled “Loading and listing”load_run(path)
loads a run from disk and reconstructsRunMeta
,VariantResult
, andStatSummary
objects.load_baseline(name_or_path)
resolves either a named baseline in.pybenchx/baselines
or a direct path.list_recent_runs(limit=10)
returnsPath
objects sorted by modification time.list_baselines()
returns sorted baseline names.get_latest_run()
grabs the freshest auto-saved run.
Housekeeping
Section titled “Housekeeping”clean_old_runs(keep=100)
deletes the oldest auto-saved runs, preserving labeled artifacts and the newestkeep
history entries.get_storage_stats()
reports counts, disk usage, and age bounds for both runs and baselines—great for CLI status commands or dashboards.
Export paths
Section titled “Export paths”default_export_path(run, fmt, base_name=None)
returns the location reporters use by default. Formats map to:
Spec | Directory | Extension |
---|---|---|
json | .pybenchx/exports/json/ | .json |
md /markdown | .pybenchx/exports/markdown/ | .md |
csv | .pybenchx/exports/csv/ | .csv |
chart | .pybenchx/exports/chart/ | .html |
Pass a base_name
if you want deterministic filenames (otherwise the run’s timestamp-based stem is used).
Comparing runs programmatically
Section titled “Comparing runs programmatically”The CLI surfaces pybench compare
and --compare/--vs
. Underneath lives pybench.compare
:
from pybench.compare import diff, parse_fail_policy, violates_policy
diff(current_run, baseline_run, alpha=0.05)
returns aDiffReport
with per-variant deltas, p-values, and asuite_changed
flag when the benchmark lineup differs.parse_fail_policy("mean:5%,p99:10%")
converts the CLI string into a dictionary for enforcement.violates_policy(report, policy)
returnsTrue
when a regression exceeds your thresholds (and is statistically significant).
Combine them with pybench.run_store.load_baseline()
to implement custom regression gates.
from pybench.compare import diff, parse_fail_policy, violates_policy
baseline_name, baseline = load_baseline("main")report = diff(current_run, baseline)policy = parse_fail_policy("mean:5%,p99:10%")
if violates_policy(report, policy): raise SystemExit("regression detected")
Working with reporters
Section titled “Working with reporters”Reporters consume Run
objects and optionally write to disk.
from pybench.reporters.table import render_tablefrom pybench.reporters.json import render_json
Each reporter exports a render_*
function (or a Reporter
class) that takes a Run
plus configuration such as the export path. See Reporters & Exports for the full list and extension points.
Suggested workflows
Section titled “Suggested workflows”- Persist nightly baselines by calling
save_baseline(run, "main")
after CI passes. - Build dashboards by loading runs from
.pybenchx/runs/
and uploading the JSON to your metrics system. - Enforce regression thresholds in custom scripts by calling
compare_runs
and exiting with a non-zero code when a policy fails.