Skip to content

Getting Started

Terminal window
pip install pybenchx
# or
uv pip install pybenchx

Create examples/hello_bench.py:

from pybench import bench
@bench(name="hello", n=1_000, repeat=10)
def hello():
return sum(range(50))

Run:

Terminal window
pybench run examples/

Filter and tweak at runtime:

Terminal window
pybench run examples/ -k hello -P repeat=5 -P n=10_000
from pybench import Bench, BenchContext
suite = Bench("math")
@suite.bench(name="baseline", baseline=True, repeat=10)
def baseline(b: BenchContext):
setup = list(range(100))
b.start()
sum(setup) # timed
b.end()

Why: context mode excludes per-iteration setup from timing, improving signal.

  • Directories expand to **/*bench.py.
  • Name cases with name=. Use group= to cluster related cases.
@suite.bench(name="join-basic", group="strings")
  • Set baseline=True on one case per group. Others show “vs base”.
  • Without an explicit baseline, a case whose name includes “baseline/base” may be used.
Terminal window
pybench run examples/ --profile thorough # ~1s per variant, repeat=30
pybench run examples/ --profile smoke # no calibration, repeat=3 (default)
Terminal window
# Save a run and a baseline
pybench run examples/ --save runA
pybench run examples/ --save-baseline main
# Export the latest run to Markdown, CSV, and an interactive chart
pybench run examples/ --export md:bench.md
pybench run examples/ --export csv:bench.csv
pybench run examples/ --export chart:bench.html
# Compare against a named baseline and enforce thresholds
pybench run examples/ --compare main --fail-on mean:7%,p99:12%
# Quick comparisons against history
pybench run examples/ --vs main
pybench run examples/ --vs last
Terminal window
$ pybench run examples/ --compare main --fail-on mean:5%,p99:10%
comparing against: main
strings/join-baseline: Δ=+0.00% p=n/a [same]
strings/join_split: Δ=-2.10% p=0.12 [same]
strings/join_plus: Δ=+98.50% p=0.000 [worse]
thresholds violated
  • Δ: variação percentual do tempo médio (positivo = mais lento/regressão).

  • p: p-valor (Mann–Whitney U, aproximação). same/better/worse exige significância (α=0.05) além do limiar de 1%.

  • --fail-on mean:%,p99:% aplica limites por métrica; p99 usa delta real do P99.

  • Artifacts live under .pybenchx/ (created automatically).

  • P-values (Mann–Whitney U, approx) are shown in compare output; p99 policy uses actual P99 delta.

  • Use pybench list, pybench stats, and pybench clean --keep N to inspect and prune history.

  • Read CLI for options (--no-color, --sort, --budget, --max-n).
  • See API for parameterization (params={...}) and suites.
  • Explore Examples for common patterns.