abdiff: A/B differential regression hunter for rsync

testsuite/abdiff.py runs the same benign transfer with two rsync binaries
(A = build under test, B = a baseline) and compares the OUTCOME -- exit code,
stderr, --stats "Literal data", the destination tree (content + full metadata),
the --itemize list, and (with --cost) peak process-group RSS. For benign input
the two must be indistinguishable; any divergence is a regression candidate.
It is a developer tool, NOT a runtests.py test (does not end in _test.py).

Capabilities:
- Scenario sweeps over options / path shapes / file types / sizes / modes /
  selection / placement / wire / transports, plus domain-knowledge pairwise +
  combo sweeps and a stochastic fuzzer/role matrix.
- Transport lanes: local, ssh split (lsh.sh), stdio-pipe daemon, a REAL TCP
  daemon (bound port + greeting/handshake/auth challenge-response), and the
  restricted rrsync wrapper (support/rrsh.sh; each binary paired with its own
  version's rrsync via --rrsync-a/--rrsync-b, since rrsync ships in the script).
- Stability gate: each binary is run N times and escalated on a candidate diff;
  nondeterministic scenarios are quarantined FLAKY, never reported as regressions.
- Parallel (-j, default 20) with a per-run findings log; --loop runs until
  --timelimit (or Ctrl-C), feeding the pool a half-random / half-systematic
  stream of new combinations. As root an "all" run also folds in the root-only
  sweeps (priv, daemonchroot).
- General coverage levers: a cost oracle (--cost, peak RSS over the whole process
  group), transport lifted as an orthogonal axis, a resume/redo sweep, and
  type-transition / nanosecond-mtime / scale (--scale N) fixtures.

Documented in testsuite/README.md.
This commit is contained in:
Andrew Tridgell
2026-06-11 08:25:50 +10:00
parent 8b042907e5
commit 4ef775fa97
3 changed files with 2895 additions and 0 deletions

21
support/rrsh.sh Executable file
View File

@@ -0,0 +1,21 @@
#!/bin/sh
# abdiff helper: a "remote shell" that emulates an sshd forced-command of
# `rrsync DIR`. rsync invokes a remote shell as:
# <shell-words...> [ssh-opts] <host> <rsync --server ...>
# so when used as -e "sh rrsh.sh <RRSYNC> <DIR>" rsync calls us as:
# sh rrsh.sh <RRSYNC> <DIR> [opts] lh rsync --server ...
# We hand the server command to rrsync via SSH_ORIGINAL_COMMAND (exactly as
# sshd would) and exec the restricted wrapper, so abdiff can A/B the rrsync
# path itself. Only the pretend hosts "lh"/"localhost" are accepted.
RRSYNC="$1"; DIR="$2"; shift 2
while [ $# -gt 0 ]; do
case "$1" in
-l) shift 2 ;;
lh|localhost) shift; break ;;
-*) shift ;;
*) break ;;
esac
done
SSH_ORIGINAL_COMMAND="$*"
export SSH_ORIGINAL_COMMAND
exec "$RRSYNC" "$DIR"

View File

@@ -184,3 +184,53 @@ Each target must be provisioned with the build toolchain its workflow installs
(autoconf, automake, a C compiler, perl, a python3 markdown module such as
cmarkgfm or commonmark unless the flags pass `--disable-md2man`, and the dev
libraries its configure flags enable). A missing piece shows up as `BUILD-FAIL`.
## Differential regression hunting (abdiff.py)
`testsuite/abdiff.py` is a developer tool — **not** a `*_test.py`, so `runtests.py`
ignores it. It hunts *regressions* by running the **same benign transfer** with
two rsync binaries (`A` = the build under test, `B` = a baseline) and comparing
the OUTCOME. The oracle is: for a benign input, a correctness/behaviour change
between the builds must be **invisible**, so A and B must produce an identical
result. Any divergence is a regression candidate to investigate and, if real,
minimize into a `*_test.py`.
It compares exit code, stderr (error markers + normalised text), `--stats`
"Literal data", the destination tree (content + full metadata: mode/uid/gid/
mtime/size/symlink target/xattrs/ACLs/hardlink grouping), the `--itemize` list,
and — with `--cost` — peak process-group RSS (a resource-regression oracle that
functional comparison misses). A **stability gate** runs each binary several
times and escalates on a candidate diff; nondeterministic scenarios are
quarantined `FLAKY`, never reported as regressions.
Run it from the build directory (so `./rsync` and `old_versions/` resolve):
```sh
testsuite/abdiff.py # default: ./rsync vs old_versions/rsync_3.4.1
testsuite/abdiff.py --sweep all -j5 # broad single pass, 5-way parallel
testsuite/abdiff.py --loop --timelimit 3600 --cost # hunt for an hour, resource oracle on
testsuite/abdiff.py --list --sweep all # list scenarios without running
```
Each finding is classed `DIFF` (regression candidate), `ALLOW` (an intentional,
documented behaviour change listed in the tool's allowlist), `BETTER` (A succeeds
where B fails), `FLAKY`, or `TIMEOUT`. Findings are printed and appended to a
per-run `abdiff-log_<TIME>.txt` (and the curated `--findings` log).
Key options: `-j N` parallelism; `--sweep NAME|all`; `--loop` (endless
random + systematic-combo stream) bounded by `--timelimit SECS`; `--cost`
(+`--scale N` for the large-tree fixtures); `--repeat N` (stability samples);
`--rsync-a`/`--rsync-b` the two binaries. Run **as root** to fold in the
owner/device/specials/fake-super and chroot-daemon sweeps automatically.
Transport lanes (a feature broken only over the wire is invisible to a local
copy): local, an ssh split (`support/lsh.sh`), a stdio-pipe daemon, a **real TCP
daemon** (bound port + greeting/handshake, and an auth challenge-response
variant), and the restricted **rrsync** wrapper (`support/rrsh.sh`). rrsync's
behaviour ships in the *script*, so pair each binary with its own version's
rrsync via `--rrsync-a`/`--rrsync-b` (give B's rrsync, e.g. one extracted from
that release's `support/rrsync`).
Cross-version baselines are the static binaries already in `old_versions/`;
`old_versions/build_static.sh` builds more from a git tag (and you can grab a
matching `support/rrsync` from the same tag for the rrsync lane).

2824
testsuite/abdiff.py Normal file
View File

File diff suppressed because it is too large Load Diff