Behavior & Accuracy
Understand what PyBenchx measures and how it keeps numbers stable.
Timing model
Section titled “Timing model”- Clock:
time.perf_counter_ns()
when available (monotonic, high resolution). - Units: all internal measurements are in nanoseconds; the table formats them as ns/µs/ms/s.
- Mean time: values in the table are per-call means (not totals per repeat).
Per-variant calibration (n)
Section titled “Per-variant calibration (n)”- Goal: run each repeat for roughly a time budget (
--budget
, split across repeats). - How:
_calibrate_n
growsn
exponentially until close to the target, then refines with ±20% probes. - Cap:
--max-n
bounds the calibratedn
. - Never lower than code-specified
n
: the finaln
for a variant ismax(case.n, calibrated_n)
.
Repeats and warmup
Section titled “Repeats and warmup”warmup
executes untimed iterations to warm caches/CPU.repeat
controls how many per-call means are collected; more repeats lower variance.- Profiles:
- fast: ~150ms per variant,
repeat=10
- thorough: ~1s per variant,
repeat=30
- smoke: no calibration,
repeat=3
,warmup=0
- fast: ~150ms per variant,
Garbage collector
Section titled “Garbage collector”- Before running: force a collection; CLI may call
gc.freeze()
when present. - During runs: GC is disabled in
run_case
to reduce interruptions; re-enabled at the end.
Context mode detection
Section titled “Context mode detection”- If your function’s first parameter is
BenchContext
(or namedb/_b/ctx/context
or annotated withBenchContext
), it runs in “context” mode. - Context mode measures only the time between
start()
andend()
on each iteration; you can do setup before/after without polluting timings. - If the function doesn’t actually call
start()/end()
, PyBenchx falls back to timing the whole call (like function mode). - Tip: avoid using instance methods for context mode because the first parameter would be
self
; prefer free functions where the first parameter can beBenchContext
.
What counts as time
Section titled “What counts as time”- Function mode: everything inside the function call.
- Context mode: only the accumulated deltas between
start()
andend()
. Multiple pairs per iteration are allowed; nested calls are ignored—only the elapsed sums.
Errors and robustness
Section titled “Errors and robustness”- If calibration or warmup raises, the CLI catches and continues where reasonable; the variant may run with the code-specified
n
. - Discovery/import errors for a file stop that file’s cases from loading.
Table math
Section titled “Table math”- Percentiles: linear interpolation on sorted per-call means.
- iter/s: computed as
1e9 / mean_ns
with K/M suffixes. - vs base:
- Choose baseline by
baseline=True
or by name heuristic (contains “baseline”/“base”). - “≈ same” appears when the relative mean difference is ≤ 1%.
- Choose baseline by
Sorting, color, and header
Section titled “Sorting, color, and header”- Sorting:
--sort group|time
within groups;--desc
to invert. - Color: enabled only on TTY by default; disable with
--no-color
. - Header shows CPU, Python, clock resolution, total runtime, and the active profile/mode.
Recommendations
Section titled “Recommendations”- Use context mode to isolate hot code from setup.
- Use
--profile fast
while iterating;--profile thorough
when publishing. - Keep variants comparable within the same
group
and define a clear baseline. - Pin CPU frequency/turbo and run on quiet machines when seeking highly stable numbers.