ADR 006: Self-Profiling — One Command Away and Load-Test Integrity¶

Status: Proposed Date: 2026-05-29

Context¶

The engineering policy states that profiling must be one command away: a developer or agent should be able to profile a test, profile normal runtime usage, record profiler output, and inspect the result through documented commands without inventing a workflow. ADR 005 makes regressions detectable; this ADR makes them investigable.

Profiling a load generator carries a constraint ordinary profiling does not. The tool’s own overhead competes with the latency it measures, so a profiler that distorts timing can turn a diagnosis into a fiction. The questions a load-test profile must answer — is the generator CPU-bound, is it starved on the GIL, is the cost in scheduling or in metric reduction — depend on distinguishing wall time from CPU time from GIL-held time, at low or zero overhead.

Decision¶

Profiling rampa is one documented command away, defaults to zero-runtime-dependency standard-library tooling, and never distorts what a load test measures.

A developer can, with one command: profile a test, profile a normal run, and profile the native accelerator; record the result in a standard, shareable format; and be shown how to inspect it. The default tools add no runtime dependency. Profiling honors the measurement-integrity rules of ADR 003.

Scope¶

This ADR governs rampa’s profiling workflow and tooling defaults: the one-command targets, the default tools and output formats, the memory-growth check, and the integrity rules profiling must observe. It does not cover regression detection (ADR 005) or the test harness (ADR 004).

Requirements¶

1. One command, standard output, both paths¶

Documented just/script targets profile a test and profile a normal run, against both the pure-Python and native paths, emit a standard format (pstats, collapsed stacks, speedscope, or gecko), and print the command to inspect the result afterward. The developer never reconstructs the workflow.

mypy’s profiling script is the model: one command builds, runs perf record, and then prints the exact command to analyze the captured profile (misc/profile_check.py). orjson ships profiling as a literal one-liner (script/profile). ty exposes a single --profile/TY_PROFILE flag that writes a flame profile, keeping spans available in a release build so no debug redeploy is needed (crates/ty/src/logging.rs).

2. Standard library first¶

The default profilers add no runtime dependency. On rampa’s supported Python 3.14 runtime, use cProfile/pstats for deterministic call profiles (Lib/cProfile.py). CPython’s stdlib sampling profiler, python -m profiling.sampling, is a Python 3.15+ option for attachable wall, CPU, and GIL-mode sampling, including baseline-diff flamegraphs (Lib/profiling/sampling/). Richer third-party tools (py-spy, austin, memray, scalene) are documented as options, never required to install or run the package.

3. Memory-growth check¶

A memory check guards the paths a long-running generator stresses — connection pools, metric buffers, sub-sinks. pytest-memray’s @pytest.mark.limit_memory is a ready-made check (bloomberg/pytest-memray, bloomberg/memray). This complements ADR 004’s repeat-and-diff leak check: one bounds peak allocation, the other proves the count flatlines.

4. Integrity (binds ADR 003)¶

Profiling must not change what a load test measures. Prefer low- or zero-overhead sampling, and keep instrumentation zero-overhead when inactive — the principle behind CPython’s PEP 669 sys.monitoring, which instruments bytecode in place and costs nothing when no tool is attached. The wall-vs-CPU-vs-GIL distinction is documented for every profiling mode, because for an I/O-bound generator a CPU profile and a wall profile answer different questions. Profile a release-shaped build with symbols retained rather than a slow debug build.

5. The native path¶

Profiling the Rust accelerator is documented and does not require a debug redeploy: a profiling build profile (maturin --profile profiling), perf/flamegraph/samply, and, where a build is the target, a profile-guided-optimization loop. maturin implements the PGO instrument-train-rebuild cycle (PyO3/maturin); pydantic-core wires a build-profiling target and a full PGO loop into its Makefile.

Profiling record¶

A pull request that adds or changes a profiling workflow records:

command:                 the documented just/script target
scope:                   a test | a normal run | the native accelerator
both paths:              python-only + native
tool:                    cProfile | profiling.sampling (Python 3.15+) | py-spy | austin | memray | scalene
overhead mode:           wall | cpu | gil
output format:           pstats | collapsed | speedscope | gecko | flamegraph
baseline diff:           none | diff-flamegraph vs named baseline
build:                   release-shaped, symbols kept (no debug redeploy)
integrity:               zero-overhead-when-off; does not alter measurement

Pull request checklist¶

[ ] Profiling a test and a run is one documented command, for both paths.
[ ] The default path uses standard-library tooling (no new runtime dependency).
[ ] A memory-growth check guards connection / buffer / metrics paths.
[ ] Profiling output is a standard, shareable format, and the inspect command is printed.
[ ] Profiling does not alter what the load test measures (wall/cpu/gil documented).

Consequences¶

Positive¶

The “profiling one command away” policy becomes real tooling, not a sentence.
Stdlib defaults keep the package dependency-free on Python 3.14, with attach, GIL-mode, and baseline-diff profiling available on Python 3.15+.
The wall/CPU/GIL distinction lets a diagnosis of an I/O-bound generator be correct.
A memory-growth check catches the leaks long runs expose before they ship.

Tradeoffs¶

The richest stdlib sampler (profiling.sampling) requires Python 3.15+; the supported Python 3.14 runtime uses cProfile plus optional third-party tools.
Maintaining one-command targets for both paths and the native build is ongoing upkeep.
A memory-growth check adds a dependency to the dev/test extra (not the runtime).

Risks¶

Profiler distortion: a high-overhead profiler misattributes load-test cost. Mitigation: prefer sampling, document the overhead mode, profile release-shaped builds.
Tool sprawl: requiring many profilers raises the barrier. Mitigation: stdlib default, the rest optional.

Relationship to ADR 002, 003, 004, and 005¶

This ADR realizes the engineering policy’s “profiling one command away” and binds it to ADR 003’s measurement-integrity rules. It investigates the regressions ADR 005 detects, profiles the harness ADR 004 provides, and informs the per-path measurement ADR 002 requires before native code.

Prior art¶

CPython (python/cpython) — stdlib sampling profiler with attach, wall/CPU/GIL modes, and baseline-diff flamegraphs (@v3.15.0a1), deterministic cProfile (@v3.14.5), and PEP 669 zero-overhead-when-off instrumentation: Lib/profiling/sampling/, Lib/cProfile.py.
ty / ruff (astral-sh/ruff@0.15.14) — one --profile flag → flame profile, spans kept in release builds: crates/ty/src/logging.rs.
mypy (python/mypy@v1.20.2) — one command profiles and prints the analysis command; runtime memory profiling: misc/profile_check.py, mypy/memprofile.py.
orjson (ijl/orjson@3.11.9) — profiling as a literal one-liner: script/profile.
Profiler tools — py-spy (benfred/py-spy@v0.4.2, attach + GIL mode + speedscope), austin (p403n1x87/austin@v4.0.0, time + memory), memray (bloomberg/memray@v1.19.2) with the limit_memory check (bloomberg/pytest-memray@v1.8.0), and scalene (plasma-umass/scalene, no release tags; line-level CPU + memory).
maturin (PyO3/maturin@v1.13.3) and pydantic-core (pydantic/pydantic-core@v2.41.5) — profiling build profile and a PGO instrument-train-rebuild loop: PyO3/maturin, pydantic-core/Makefile.

Final position¶

A regression you cannot profile in one command is a regression you will not investigate. rampa keeps profiling one command away, dependency-free by default, and faithful to what a load test measures — wall, CPU, or GIL, never distorted by the act of looking.