abdiff: A/B differential regression hunter for rsync

testsuite/abdiff.py runs the same benign transfer with two rsync binaries (A = build under test, B = a baseline) and compares the OUTCOME -- exit code, stderr, --stats "Literal data", the destination tree (content + full metadata), the --itemize list, and (with --cost) peak process-group RSS. For benign input the two must be indistinguishable; any divergence is a regression candidate. It is a developer tool, NOT a runtests.py test (does not end in _test.py). Capabilities: - Scenario sweeps over options / path shapes / file types / sizes / modes / selection / placement / wire / transports, plus domain-knowledge pairwise + combo sweeps and a stochastic fuzzer/role matrix. - Transport lanes: local, ssh split (lsh.sh), stdio-pipe daemon, a REAL TCP daemon (bound port + greeting/handshake/auth challenge-response), and the restricted rrsync wrapper (support/rrsh.sh; each binary paired with its own version's rrsync via --rrsync-a/--rrsync-b, since rrsync ships in the script). - Stability gate: each binary is run N times and escalated on a candidate diff; nondeterministic scenarios are quarantined FLAKY, never reported as regressions. - Parallel (-j, default 20) with a per-run findings log; --loop runs until --timelimit (or Ctrl-C), feeding the pool a half-random / half-systematic stream of new combinations. As root an "all" run also folds in the root-only sweeps (priv, daemonchroot). - General coverage levers: a cost oracle (--cost, peak RSS over the whole process group), transport lifted as an orthogonal axis, a resume/redo sweep, and type-transition / nanosecond-mtime / scale (--scale N) fixtures. Documented in testsuite/README.md.
2026-06-13 00:27:12 -04:00 · 2026-06-11 08:25:50 +10:00
parent 8b042907e5
commit 4ef775fa97
3 changed files with 2895 additions and 0 deletions
--- a/support/rrsh.sh
+++ b/support/rrsh.sh
@@ -0,0 +1,21 @@
+#!/bin/sh
+# abdiff helper: a "remote shell" that emulates an sshd forced-command of
+# `rrsync DIR`.  rsync invokes a remote shell as:
+#     <shell-words...> [ssh-opts] <host> <rsync --server ...>
+# so when used as  -e "sh rrsh.sh <RRSYNC> <DIR>"  rsync calls us as:
+#     sh rrsh.sh <RRSYNC> <DIR> [opts] lh rsync --server ...
+# We hand the server command to rrsync via SSH_ORIGINAL_COMMAND (exactly as
+# sshd would) and exec the restricted wrapper, so abdiff can A/B the rrsync
+# path itself.  Only the pretend hosts "lh"/"localhost" are accepted.
+RRSYNC="$1"; DIR="$2"; shift 2
+while [ $# -gt 0 ]; do
+    case "$1" in
+        -l) shift 2 ;;
+        lh|localhost) shift; break ;;
+        -*) shift ;;
+        *) break ;;
+    esac
+done
+SSH_ORIGINAL_COMMAND="$*"
+export SSH_ORIGINAL_COMMAND
+exec "$RRSYNC" "$DIR"
--- a/testsuite/README.md
+++ b/testsuite/README.md
@@ -184,3 +184,53 @@ Each target must be provisioned with the build toolchain its workflow installs
 (autoconf, automake, a C compiler, perl, a python3 markdown module such as
 cmarkgfm or commonmark unless the flags pass `--disable-md2man`, and the dev
 libraries its configure flags enable). A missing piece shows up as `BUILD-FAIL`.
+
+## Differential regression hunting (abdiff.py)
+
+`testsuite/abdiff.py` is a developer tool — **not** a `*_test.py`, so `runtests.py`
+ignores it. It hunts *regressions* by running the **same benign transfer** with
+two rsync binaries (`A` = the build under test, `B` = a baseline) and comparing
+the OUTCOME. The oracle is: for a benign input, a correctness/behaviour change
+between the builds must be **invisible**, so A and B must produce an identical
+result. Any divergence is a regression candidate to investigate and, if real,
+minimize into a `*_test.py`.
+
+It compares exit code, stderr (error markers + normalised text), `--stats`
+"Literal data", the destination tree (content + full metadata: mode/uid/gid/
+mtime/size/symlink target/xattrs/ACLs/hardlink grouping), the `--itemize` list,
+and — with `--cost` — peak process-group RSS (a resource-regression oracle that
+functional comparison misses). A **stability gate** runs each binary several
+times and escalates on a candidate diff; nondeterministic scenarios are
+quarantined `FLAKY`, never reported as regressions.
+
+Run it from the build directory (so `./rsync` and `old_versions/` resolve):
+
+```sh
+testsuite/abdiff.py                       # default: ./rsync vs old_versions/rsync_3.4.1
+testsuite/abdiff.py --sweep all -j5       # broad single pass, 5-way parallel
+testsuite/abdiff.py --loop --timelimit 3600 --cost   # hunt for an hour, resource oracle on
+testsuite/abdiff.py --list --sweep all    # list scenarios without running
+```
+
+Each finding is classed `DIFF` (regression candidate), `ALLOW` (an intentional,
+documented behaviour change listed in the tool's allowlist), `BETTER` (A succeeds
+where B fails), `FLAKY`, or `TIMEOUT`. Findings are printed and appended to a
+per-run `abdiff-log_<TIME>.txt` (and the curated `--findings` log).
+
+Key options: `-j N` parallelism; `--sweep NAME|all`; `--loop` (endless
+random + systematic-combo stream) bounded by `--timelimit SECS`; `--cost`
+(+`--scale N` for the large-tree fixtures); `--repeat N` (stability samples);
+`--rsync-a`/`--rsync-b` the two binaries. Run **as root** to fold in the
+owner/device/specials/fake-super and chroot-daemon sweeps automatically.
+
+Transport lanes (a feature broken only over the wire is invisible to a local
+copy): local, an ssh split (`support/lsh.sh`), a stdio-pipe daemon, a **real TCP
+daemon** (bound port + greeting/handshake, and an auth challenge-response
+variant), and the restricted **rrsync** wrapper (`support/rrsh.sh`). rrsync's
+behaviour ships in the *script*, so pair each binary with its own version's
+rrsync via `--rrsync-a`/`--rrsync-b` (give B's rrsync, e.g. one extracted from
+that release's `support/rrsync`).
+
+Cross-version baselines are the static binaries already in `old_versions/`;
+`old_versions/build_static.sh` builds more from a git tag (and you can grab a
+matching `support/rrsync` from the same tag for the rrsync lane).
--- a/testsuite/abdiff.py
+++ b/testsuite/abdiff.py