From 4ef775fa970b8290cc9b1756e09c113ff794242b Mon Sep 17 00:00:00 2001 From: Andrew Tridgell Date: Thu, 11 Jun 2026 08:25:50 +1000 Subject: [PATCH] abdiff: A/B differential regression hunter for rsync testsuite/abdiff.py runs the same benign transfer with two rsync binaries (A = build under test, B = a baseline) and compares the OUTCOME -- exit code, stderr, --stats "Literal data", the destination tree (content + full metadata), the --itemize list, and (with --cost) peak process-group RSS. For benign input the two must be indistinguishable; any divergence is a regression candidate. It is a developer tool, NOT a runtests.py test (does not end in _test.py). Capabilities: - Scenario sweeps over options / path shapes / file types / sizes / modes / selection / placement / wire / transports, plus domain-knowledge pairwise + combo sweeps and a stochastic fuzzer/role matrix. - Transport lanes: local, ssh split (lsh.sh), stdio-pipe daemon, a REAL TCP daemon (bound port + greeting/handshake/auth challenge-response), and the restricted rrsync wrapper (support/rrsh.sh; each binary paired with its own version's rrsync via --rrsync-a/--rrsync-b, since rrsync ships in the script). - Stability gate: each binary is run N times and escalated on a candidate diff; nondeterministic scenarios are quarantined FLAKY, never reported as regressions. - Parallel (-j, default 20) with a per-run findings log; --loop runs until --timelimit (or Ctrl-C), feeding the pool a half-random / half-systematic stream of new combinations. As root an "all" run also folds in the root-only sweeps (priv, daemonchroot). - General coverage levers: a cost oracle (--cost, peak RSS over the whole process group), transport lifted as an orthogonal axis, a resume/redo sweep, and type-transition / nanosecond-mtime / scale (--scale N) fixtures. Documented in testsuite/README.md. --- support/rrsh.sh | 21 + testsuite/README.md | 50 + testsuite/abdiff.py | 2824 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 2895 insertions(+) create mode 100755 support/rrsh.sh create mode 100644 testsuite/abdiff.py diff --git a/support/rrsh.sh b/support/rrsh.sh new file mode 100755 index 00000000..83025f2f --- /dev/null +++ b/support/rrsh.sh @@ -0,0 +1,21 @@ +#!/bin/sh +# abdiff helper: a "remote shell" that emulates an sshd forced-command of +# `rrsync DIR`. rsync invokes a remote shell as: +# [ssh-opts] +# so when used as -e "sh rrsh.sh " rsync calls us as: +# sh rrsh.sh [opts] lh rsync --server ... +# We hand the server command to rrsync via SSH_ORIGINAL_COMMAND (exactly as +# sshd would) and exec the restricted wrapper, so abdiff can A/B the rrsync +# path itself. Only the pretend hosts "lh"/"localhost" are accepted. +RRSYNC="$1"; DIR="$2"; shift 2 +while [ $# -gt 0 ]; do + case "$1" in + -l) shift 2 ;; + lh|localhost) shift; break ;; + -*) shift ;; + *) break ;; + esac +done +SSH_ORIGINAL_COMMAND="$*" +export SSH_ORIGINAL_COMMAND +exec "$RRSYNC" "$DIR" diff --git a/testsuite/README.md b/testsuite/README.md index 5c15f5de..4883a5e8 100644 --- a/testsuite/README.md +++ b/testsuite/README.md @@ -184,3 +184,53 @@ Each target must be provisioned with the build toolchain its workflow installs (autoconf, automake, a C compiler, perl, a python3 markdown module such as cmarkgfm or commonmark unless the flags pass `--disable-md2man`, and the dev libraries its configure flags enable). A missing piece shows up as `BUILD-FAIL`. + +## Differential regression hunting (abdiff.py) + +`testsuite/abdiff.py` is a developer tool — **not** a `*_test.py`, so `runtests.py` +ignores it. It hunts *regressions* by running the **same benign transfer** with +two rsync binaries (`A` = the build under test, `B` = a baseline) and comparing +the OUTCOME. The oracle is: for a benign input, a correctness/behaviour change +between the builds must be **invisible**, so A and B must produce an identical +result. Any divergence is a regression candidate to investigate and, if real, +minimize into a `*_test.py`. + +It compares exit code, stderr (error markers + normalised text), `--stats` +"Literal data", the destination tree (content + full metadata: mode/uid/gid/ +mtime/size/symlink target/xattrs/ACLs/hardlink grouping), the `--itemize` list, +and — with `--cost` — peak process-group RSS (a resource-regression oracle that +functional comparison misses). A **stability gate** runs each binary several +times and escalates on a candidate diff; nondeterministic scenarios are +quarantined `FLAKY`, never reported as regressions. + +Run it from the build directory (so `./rsync` and `old_versions/` resolve): + +```sh +testsuite/abdiff.py # default: ./rsync vs old_versions/rsync_3.4.1 +testsuite/abdiff.py --sweep all -j5 # broad single pass, 5-way parallel +testsuite/abdiff.py --loop --timelimit 3600 --cost # hunt for an hour, resource oracle on +testsuite/abdiff.py --list --sweep all # list scenarios without running +``` + +Each finding is classed `DIFF` (regression candidate), `ALLOW` (an intentional, +documented behaviour change listed in the tool's allowlist), `BETTER` (A succeeds +where B fails), `FLAKY`, or `TIMEOUT`. Findings are printed and appended to a +per-run `abdiff-log_