Getting Started

Install

pip install pybenchx
# or
uv pip install pybenchx

First benchmark (function mode)

Create examples/hello_bench.py:

from pybench import bench

@bench(name="hello", n=1_000, repeat=10)
def hello():
    return sum(range(50))

Run:

pybench run examples/

Filter and tweak at runtime:

pybench run examples/ -k hello -P repeat=5 -P n=10_000

Isolating the hot path (context mode)

from pybench import Bench, BenchContext

suite = Bench("math")

@suite.bench(name="baseline", baseline=True, repeat=10)
def baseline(b: BenchContext):
    setup = list(range(100))
    b.start()
    sum(setup)  # timed
    b.end()

Why: context mode excludes per-iteration setup from timing, improving signal.

Discovery and naming

Directories expand to **/*bench.py.
Name cases with name=. Use group= to cluster related cases.

@suite.bench(name="join-basic", group="strings")

Baseline and groups

Set baseline=True on one case per group. Others show “vs base”.
Without an explicit baseline, a case whose name includes “baseline/base” may be used.

Profiles you’ll use

pybench run examples/ --profile thorough  # ~1s per variant, repeat=30
pybench run examples/ --profile smoke     # no calibration, repeat=3 (default)

Save, export, and compare

# Save a run and a baseline
pybench run examples/ --save runA
pybench run examples/ --save-baseline main

# Export the latest run to Markdown, CSV, and an interactive chart
pybench run examples/ --export md:bench.md
pybench run examples/ --export csv:bench.csv
pybench run examples/ --export chart:bench.html

# Compare against a named baseline and enforce thresholds
pybench run examples/ --compare main --fail-on mean:7%,p99:12%

# Quick comparisons against history
pybench run examples/ --vs main
pybench run examples/ --vs last

Compare output example

$ pybench run examples/ --compare main --fail-on mean:5%,p99:10%
comparing against: main
strings/join-baseline: Δ=+0.00% p=n/a [same]
strings/join_split:    Δ=-2.10% p=0.12 [same]
strings/join_plus:     Δ=+98.50% p=0.000 [worse]
❌ thresholds violated

Δ: variação percentual do tempo médio (positivo = mais lento/regressão).
p: p-valor (Mann–Whitney U, aproximação). same/better/worse exige significância (α=0.05) além do limiar de 1%.
--fail-on mean:%,p99:% aplica limites por métrica; p99 usa delta real do P99.
Artifacts live under .pybenchx/ (created automatically).
P-values (Mann–Whitney U, approx) are shown in compare output; p99 policy uses actual P99 delta.
Use pybench list, pybench stats, and pybench clean --keep N to inspect and prune history.

Next steps

Read CLI for options (--no-color, --sort, --budget, --max-n).
See API for parameterization (params={...}) and suites.
Explore Examples for common patterns.