Behavior
How pybenchx runs, measures, and keeps noise in check.
Profiles
Section titled “Profiles”Profiles define repeat counts, warmup passes, and the calibration budget per variant.
| profile | repeat | warmup | calibration | best for |
|---|---|---|---|---|
| smoke | 3 | 0 | off | quick iteration during development |
| thorough | 30 | 2 | ~1 s/variant | pre-merge validation, CI checks |
Use --profile smoke while exploring; switch to --profile thorough (or a custom JSON profile) when you want stable numbers.
Custom tuning knobs
Section titled “Custom tuning knobs”--budget 50mssets the target time per variant across repeats.--max-n 1_000_000caps the number of loop iterations.--min-time 2msenforces a floor so fast functions still run long enough.--repeat/--warmupoverride profile settings for quick what-if experiments.
Profiles live in pybench/profiles.py; they are simple dataclasses, so shipping a new profile boils down to adding a JSON file with the same fields.
Calibration and timing pipeline
Section titled “Calibration and timing pipeline”calibrate_nruns the benchmark in small bursts, growingnexponentially until the target budget is met. A refinement step probes ±20% to avoid overshooting.- Context mode calls the benchmark once to detect whether
BenchContext.start()/end()is used; if not, pybenchx reverts to whole-function timing. - Warmups run before sampling (unless disabled with
--no-warmup) to warm caches and JITs. - Each repeat wraps the tight loop with
perf_counter_ns(orperf_counteras a fallback) for precise timings. - Raw samples feed percentile calculations (
p50/p75/p95/p99) via linear interpolation.
Accuracy & robustness
Section titled “Accuracy & robustness”- Clock – monotonic, high-resolution (
perf_counter_nson Python ≥3.7). - GC – the runner performs
gc.collect()before measurements and may freeze GC during timed sections (Python ≥3.11) to reduce pauses. - Environment – colors only print on TTY; use
--no-colorin CI logs. The CLI also respectsPYBENCH_DISABLE_GCwhen you want to opt out of GC tweaks entirely. - Noise hints – the table highlights outliers (
p99vs mean) so you can spot unstable cases quickly.
Error handling and exits
Section titled “Error handling and exits”- Exceptions inside benchmarks bubble up with context; failing cases never poison other variants.
--fail-faststops the run on the first error.--fail-on mean:7%,p99:12%applies thresholds during comparisons and exits non-zero when budgets are exceeded—ideal for CI.
Reporting highlights
Section titled “Reporting highlights”- The CLI table groups benchmarks by
groupand annotates baselines with a star (★). ≈ sameshows up when the relative mean differs by ≤1% and the statistical test agrees.- All reporters consume the same
Runobject; adding a format is as easy as implementing a new reporter class.