Internals
Notes for maintainers, extension authors, and curious users who want to understand how pybenchx is stitched together.
Architecture in one glance
Section titled “Architecture in one glance”bench files → discovery → variant planning → runner → reporters + store → exports/compare
- Discovery (
pybench.discovery
) finds benchmark files under the paths you pass to the CLI (**/*bench.py
by default) and imports them with a stable synthetic module name so repeated runs do not pollutesys.modules
. - Registry (
pybench.bench_model
) collects decorated functions (Bench
/bench
) intoCase
objects. Parameter grids expand into concreteVariantSpec
rows before execution. - Runner (
pybench.runner
) prepares each variant (calibration, warmup, repeats) and producesVariantResult
objects with raw samples and derived statistics. - Timing helpers (
pybench.timing
,pybench.profiles
) pick the right profile, budget, and sample counts; context mode measurements useBenchContext
to isolate the timed region. - Storage (
pybench.run_store
) persists runs/baselines to.pybenchx/
and exposes commands (list
,stats
,clean
) used by the CLI. - Reporters (
pybench.reporters.*
) render tabular output, JSON/CSV/Markdown payloads, or interactive charts. They all consume the same normalizedRun
object. - Compare (
pybench.compare
) loads historical runs and computes deltas, including Mann–Whitney U significance tests and threshold enforcement (--fail-on
).
Data model
Section titled “Data model”Case
: metadata attached to each decorated function (name
,group
,mode
,params
,baseline
, etc.). Parameters are optional and copied per variant so mutation inside benchmarks is safe.VariantSpec
: the concrete combination of arguments derived from aCase
after applying CLI overrides (-P repeat=5
).VariantResult
: statistics and timing samples for aVariantSpec
. Holds mean,p75/p95/p99
, speedups versus baseline, and optional raw samples when exports are requested.Run
: the container saved to disk—includes metadata (profile, CLI args, git info when available) plus the list ofVariantResult
s.
From CLI to measurements
Section titled “From CLI to measurements”- Command parsing (
pybench.cli
) handlespybench run/list/stats/clean
.run
builds aRunnerConfig
combining CLI flags, default profile, and overrides. - Suite discovery loads all requested modules, letting decorators register cases into the global
_ALL_BENCHES
registry. - Variant planning (
utils.prepare_variants
) expands parameters, applies include/exclude filters (-k
,--group
), and detects whetherBenchContext
is used. - Calibration (
runner.calibrate_n
) approximates how many loop iterations fit inside the profile budget, honoring--budget
,--max-n
, and--min-time
. - Execution (
runner.run_single_repeat
) performs warmups (unless disabled), repeatedly calls the benchmark, times it withperf_counter_ns
, and records results. GC collection and optionalgc.freeze()
keep noise down. - Post-processing sorts results, computes baselines (
suite_sig.choose_baseline
), and attaches speedup labels (≈ same
,faster
,slower
). - Outputs go to the terminal table plus any reporters requested via
--export
. When--save
/--save-baseline
are provided, theRun
is serialized into.pybenchx/
and future comparisons can reference it.
Profiles and calibration knobs
Section titled “Profiles and calibration knobs”Profiles are declared in pybench.profiles
. Each contains default repeat
, warmup
, and a time budget per variant. The CLI can tweak runtime behavior:
--profile smoke|thorough|custom.json
selects built-ins or your own JSON profile.--budget
overrides the total target nanoseconds per variant;--max-n
guards against runaway loops.--min-time
ensures even fast benchmarks run long enough to be meaningful.--sequential
forces single-thread execution even if we add parallel runners later.
Calibration seeds the function once to detect context usage. If BenchContext
calls start()/end()
, only that region is timed; otherwise, pybenchx falls back to full-function timing.
Storage layout (.pybenchx/
)
Section titled “Storage layout (.pybenchx/)”.pybenchx/├── baselines/ # Named baselines saved via --save-baseline├── runs/ # Timestamped run artifacts saved via --save├── exports/ # Files generated by --export└── cache/ # Helper data (e.g., discovery cache) when enabled
pybench list
reads these directories and surfaces human-friendly summaries.pybench stats
calculates disk usage, retention, and age so you know when to prune.pybench clean --keep N
keeps the newest N entries per category and deletes the rest safely.
Artifacts are plain JSON (zstd-compressed when available). They can be inspected or checked into CI logs if desired.
Reporters and extending output
Section titled “Reporters and extending output”Reporters live under pybench/reporters/
. Every reporter implements a simple interface:
class Reporter: def render(self, run: Run, *, save_path: Path | None) -> None: ...
_ansi
backs the CLI table (speedup badges, color heuristics).markdown
,csv
, andjson
emit static artifacts.chart
builds an HTML/JS bundle powered by Chart.js for interactive inspection.
Adding a reporter usually means serializing Run
and emitting to save_path
(defaulting under .pybenchx/exports/
).
Comparing runs
Section titled “Comparing runs”pybench compare
and pybench run --compare
rely on compare.py
:
- Loads baseline run(s) from disk or from a path supplied on the command line.
- Aligns variants by suite signature (name/group/params) so renames are detected early.
- Computes deltas for mean and percentiles, then performs a Mann–Whitney U test for significance.
- Applies thresholds from
--fail-on
; if a metric exceeds its budget, the command exits non-zero to gate CI.
Quick comparisons (--vs main|last
) defer to run_store
helpers that resolve the baseline name inside .pybenchx/baselines/
.
Utilities worth knowing
Section titled “Utilities worth knowing”pybench.utils
bundles formatting helpers (fmt_time_ns
,fmt_speedup
), percentile math, and environment detection.pybench.params
implements cartesian expansion and CLI override plumbing.pybench.suite_sig
produces deterministic identifiers for groups/cases so history remains aligned even as code moves around.
Understanding these modules makes it easier to extend pybenchx—for example, adding new CLI flags, tuning calibration logic, or teaching reporters to emit bespoke dashboards.