`make` alone does not build the CHECK_PROGS test helpers (tls, trimslash,
t_chmod_secure, ...), so runtests.py exited immediately with "missing
test helper program(s)", produced no valgrind logs, and the scan step
failed every job with "the suite did not run". Use `make check-progs`,
which builds rsync plus the helpers and symlink fixtures without running
the suite.
rsync groups the "sent/received N bytes" summary numbers using the
locale's thousands separator (e.g. de_DE uses '.'), which broke the
[\d,]+ parser and failed the test for testers in non-C locales. Run the
peer client under LC_ALL=C so the output is deterministic.
Reported-by: Michael Mess <michael@michaelmess.de>
Add a .github/workflows/valgrind.yml that runs the full suite under
valgrind in a 2x2 matrix (user/root x pipe/tcp transport) and gates on
memory errors. It uses --leak-check=no: rsync intentionally leaves
file-list/socket/option memory unfreed at exit, so a leak check is
inherently noisy; the gate flags uninitialised reads, invalid
read/write, bad frees and uninit syscall params instead.
Add testsuite/valgrind.supp covering the known-benign reports (rwrite
strlcpy over-read on a non-NUL-terminated peer message, atomic_create/
delete_item st_mode read under fakeroot, libfakeroot msgsnd padding,
plus popt/xxhash leaks for manual --leak-check audits). runtests.py
--valgrind now loads it automatically.
The hardening in c44c90e9 added a check in simple_recv_token() rejecting
any uncompressed literal-run length > CHUNK_SIZE (32k). That assumption
breaks interoperability: other rsync implementations -- e.g. the acrosync
library used by the iOS "PhotoBackup" app -- use a 64k block size and
send literal runs of 65536 bytes, which 3.4.3+ now rejects with
"invalid uncompressed token length 65536".
The check was unnecessary: simple_recv_token() already reads the run
CHUNK_SIZE bytes at a time via the residue loop (n = MIN(CHUNK_SIZE,
residue)), so read_buf() never writes past the static CHUNK_SIZE buffer
regardless of the wire-supplied length. Drop the check to restore
interop; the compressed-token integer-overflow fix from c44c90e9 (the
MAX_TOKEN_INDEX / rx_token caps) is left unchanged.
Fixes#1002
Reported-by: Jack Whitham
FreeBSD and OpenBSD return EFTYPE (errno 79) when chmod-ing a sticky bit
onto a regular file as non-root, rather than EPERM/EACCES. Catch OSError
and check errno against the expected skip set so the test skips correctly
on those platforms instead of erroring out.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
testsuite/abdiff.py runs the same benign transfer with two rsync binaries
(A = build under test, B = a baseline) and compares the OUTCOME -- exit code,
stderr, --stats "Literal data", the destination tree (content + full metadata),
the --itemize list, and (with --cost) peak process-group RSS. For benign input
the two must be indistinguishable; any divergence is a regression candidate.
It is a developer tool, NOT a runtests.py test (does not end in _test.py).
Capabilities:
- Scenario sweeps over options / path shapes / file types / sizes / modes /
selection / placement / wire / transports, plus domain-knowledge pairwise +
combo sweeps and a stochastic fuzzer/role matrix.
- Transport lanes: local, ssh split (lsh.sh), stdio-pipe daemon, a REAL TCP
daemon (bound port + greeting/handshake/auth challenge-response), and the
restricted rrsync wrapper (support/rrsh.sh; each binary paired with its own
version's rrsync via --rrsync-a/--rrsync-b, since rrsync ships in the script).
- Stability gate: each binary is run N times and escalated on a candidate diff;
nondeterministic scenarios are quarantined FLAKY, never reported as regressions.
- Parallel (-j, default 20) with a per-run findings log; --loop runs until
--timelimit (or Ctrl-C), feeding the pool a half-random / half-systematic
stream of new combinations. As root an "all" run also folds in the root-only
sweeps (priv, daemonchroot).
- General coverage levers: a cost oracle (--cost, peak RSS over the whole process
group), transport lifted as an orthogonal axis, a resume/redo sweep, and
type-transition / nanosecond-mtime / scale (--scale N) fixtures.
Documented in testsuite/README.md.
A standalone dev tool (run directly, not via runtests.py) for catching
performance regressions between rsync releases. Given two rsync binaries it
builds one deterministic test tree -- heavy-tailed file sizes, a directory
spine, symlinks, hard links and a spread of permission modes, modelled on the
gentestdata generator -- then runs the two binaries ALTERNATELY for N loops,
timing each transfer, and reports the mean and standard deviation per binary.
Each loop times a full copy into an emptied destination and an incremental
no-op against an already-synced one (rsync's scan/file-list/stat overhead,
where many regressions hide); --mode selects. The first run of each binary is
dropped to reduce page-cache impact, the run order alternates to cancel drift,
and a B-vs-A slowdown is flagged only when it exceeds the run-to-run noise.
The file used ".align 16" intending 16-byte alignment (GNU/ELF semantics).
On macOS the Mach-O assembler reads ".align N" as 2^N, so it requested
64KB alignment for __TEXT,__text, producing:
ld: warning: reducing alignment of section __TEXT,__text from 0x10000
to 0x1000 because it exceeds segment maximum alignment
The linker clamps it back, so it was harmless, but .balign 16 means
16 bytes on every target and silences the warning.
When built with --enable-roll-asm, get_checksum1() called the AVX2 asm
routine get_checksum1_avx2_asm() unconditionally. Unlike the intrinsic
path (get_checksum1_avx2_64), which is function-multiversioned with a
target("default") fallback and so resolves safely on any CPU, the asm
routine is a single AVX2-only symbol with no fallback. On an x86-64 host
without AVX2 (an older CPU, or a VM that does not expose AVX2) the first
block checksum executes a VEX-encoded instruction and dies with SIGILL,
which surfaces as "connection unexpectedly closed (0 bytes received so
far)" and a code-12 protocol error.
Gate the asm call on a cached __builtin_cpu_supports("avx2") check, the
same signal the intrinsic resolver uses. When AVX2 is absent we skip it
and the SSSE3/SSE2/scalar steps (safe everywhere) do the work. Apply the
same guard in the simdtest harness so it can run on non-AVX2 hosts too.
Run the clang static analyzer over a check-progs build, publish the HTML report
as an artifact, and print the bug count to the run summary. INFORMATIONAL only:
it does not pass --status-bugs, so it surfaces new analyzer findings without
going red on the existing (overwhelmingly false-positive) reports.
Runs on push/PR to master and via workflow_dispatch. No cron: it is
informational and its output only changes with the code (push/PR) or the clang
version, so a daily run on an unchanged tree would add noise without value.
Add a clang AddressSanitizer + UndefinedBehaviorSanitizer workflow that builds
rsync with -fsanitize=address,undefined -fno-sanitize-recover=undefined -DNDEBUG
and runs the full test suite over both the stdio-pipe and TCP daemon transports.
UBSAN_OPTIONS=halt_on_error=1 together with -fno-sanitize-recover=undefined makes
any undefined behaviour fatal, so this job gates: the tree must stay UBSan-clean.
The remaining findings are fixed in code (hashtable/mdfour shifts, xattrs, and
log.c's file_struct, kept aligned via rounding.h); only byteorder.h's intentional
unaligned accessors are suppressed, with no_sanitize. -DNDEBUG builds as a release
does (assert() compiled out) so ASan covers the production code paths.
Runs on push/PR to master and via workflow_dispatch, plus a weekly cron to
catch breakage from a moving ubuntu-latest/clang toolchain (push/PR already
cover every code change, so daily would just re-run an unchanged tree).
The cherry-picked #428 wrapped no_sanitize attributes on read_varint() and
read_varlong() in `#ifndef CAREFUL_ALIGNMENT`, but byteorder.h always
#defines CAREFUL_ALIGNMENT (to 0 or 1), so that guard is never true and the
attributes were dead code.
They are also unnecessary: both functions read the assembled value through
an aligned union member (union { char b[5]; int32 x; }), not an unaligned
cast, so UBSan's alignment check never fires there (verified: the ASan+UBSan
suite is clean without them). Remove the whole block rather than fix the
guard. (The byteorder.h annotations from #428, which are real and correctly
placed inside the !CAREFUL_ALIGNMENT branch, are kept.)
rsync sets CAREFUL_ALIGNMENT for architectures which do not support
unaligned access. Disable UBSAN for functions which may use unaligned
accesses when CAREFUL_ALIGNMENT is set.
Bug: https://github.com/WayneD/rsync/issues/427
Signed-off-by: Sam James <sam@gentoo.org>
(cherry picked from commit 11c1e934e8)
log_delete() builds a struct file_struct inside a char buffer offset by the
(EXTRA_LEN-granular) extra data. The EXTRA_ROUNDING block that rounds that
offset up to the struct's alignment (exactly as flist.c does for its pool
allocations) was dead code here: log.c never included rounding.h, so
EXTRA_ROUNDING was undefined and the rounding never ran, leaving the
file_struct pointer potentially under-aligned. That trips UBSan's alignment
check and would fault on strict-alignment arches.
Include rounding.h (and add the Makefile dependency) so the existing rounding
actually applies -- fixing the alignment at the source rather than suppressing
the sanitizer.
Three pre-existing issues UBSan flags during the xattr tests:
* xattr_lookup_hash(): the summed hashlittle2() values overflow the
signed int64 accumulator (UB). Accumulate in uint64_t and convert back
at return -- the key is only used for hash-table equality, so the value
is unchanged.
* rsync_xal_get(): for an empty list (count == 0) the loop init
`rxa += count-1` forms `items - 1` on a NULL `items` (UB). Guard with
`if (count)`.
* rsync_xal_store(): `memcpy(dst, xalp->items, 0)` passes a NULL source for
an empty list (UB). Guard with `if (xalp->count)`.
UBSan flags two spots that shift a value into the top bits of a word via a
signed operand:
* lib/mdfour.c copy64(): `in[i] << 24` promotes the uchar to int, so a
byte >= 128 overflows int (UB). Cast each byte to uint32.
* hashtable.c NON_ZERO_64(): `(int64)(x) << 32` overflows int64 whenever
x's high bit is set. Shift as uint64_t (covers all four call sites).
Behavior-preserving -- only the intermediate type changes; the resulting
bit pattern is identical.
In a git worktree .git is a file (a gitdir pointer), not a directory,
so os.path.isdir('.git') wrongly aborted with "no .git dir" when the
release was run from a worktree. Use os.path.exists() so it works from
both a normal checkout and a linked worktree.
Every platform build (the BSD/Solaris/macOS/cygwin/almalinux/ubuntu jobs),
coverage, the version-mix job and the android static build ran on a daily cron
*in addition to* push and pull_request to master. Since push/PR already cover
every code change, the cron only adds drift coverage -- catching breakage from a
moving runner image or toolchain that no commit triggers. Those images do not
change daily, so a daily run mostly re-tests an unchanged tree.
Move them all to a weekly cron (Mondays, keeping each job's existing time) to
keep that drift coverage at roughly a seventh of the Actions spend and log
noise. fleettest was already weekly. Per-change CI on push/PR is unchanged, and
workflow_dispatch still allows an on-demand run.
A run killed without a parent-death backstop can strand a TOCTOU path-flipper
(a busy `python -c` rename loop that pins a CPU) and an orphaned test rsyncd
(--no-detach --address=127.0.0.1) that squats its fixed port -- the wedge the
claim_ports() bind-probe now reports and points at --cleanup. Sweep both, best
effort, before removing the run dirs.
Each sweep counts the pattern, kills it (with a `sudo -n` retry for a process a
root-running test left), then re-counts after a settle: KILLED reports what
actually died, and a process that survives (pkill blocked, no passwordless sudo,
missing/limited pkill) is reported as SURVIVED and fails the run instead of
falsely claiming success.
Run-dir removal falls back to `sudo -n rm` so a dir whose contents a root test
owns is removed instead of failing with "Permission denied" (the failure mode
seen on the ubuntu/mac targets); only a dir that survives even sudo is failed.
The kill patterns use the pgrep self-exclusion trick ('r[e]name', 'det[a]ch')
so they match a real process's "rename"/"detach" but not the literal pattern in
the cleanup shell's own argv -- run_on() passes the whole script as the remote
argv, so without it --cleanup would signal itself. The patterns are host-global
(not scoped to one run), so --cleanup is documented to run between runs, not
during one.
claim_ports() takes a POSIX byte-range lock per port, which serializes
concurrent live test runs. But the kernel drops that lock the instant the
holding process dies, even if the run left an orphaned rsync --daemon still
bound to the port -- which happens when a run is SIGKILLed on a platform with
no parent-death backstop (rsyncfns only arms PR_SET_PDEATHSIG, Linux-only, so
the BSDs/Solaris/macOS can strand a daemon). A later run then wins the freed
lock while the socket is still squatted and dies with a cryptic "bind() failed:
Address already in use" / "did not see server greeting".
After taking each lock, actually bind the port (SO_REUSEADDR, so a port merely
in TIME_WAIT is not a false positive; only a live squatter fails) and close it
immediately. On failure stop with an actionable message naming the port and the
likely orphaned daemon. Closes the gap that masked the OpenBSD daemon-auth wedge.
When --testsuite-repo provides the suite, the build tree (--repo) need not
carry runtests.py -- it may be an older release whose shell testsuite predates
the Python runtests.py (e.g. a 3.4.1 backport branch built and tested with the
current suite). Check runtests.py in TESTSUITE_REPO and only require the build
tree to be rsync source (rsync.h).
--repo couples the built source and the test suite that exercises it.
--testsuite-repo PATH overlays runtests.py + testsuite/ from a second tree onto
the staged build tree, and sources the expected-skip workflows from it, so one
can build an older release (e.g. a 3.4.x stable branch) and run the current
comprehensive suite against that binary. Defaults to --repo, so the existing
single-tree behaviour is unchanged.
The shell testsuite was removed in 1f689ec0 (rewritten in Python); only
*_test.py remain, yet collect_tests still globbed *.test and _testbase mapped
foo.test and foo_test.py to the same canonical name. Harmless on a master tree
(no .test files), but when an older tree's *.test files are present -- e.g.
fleettest --testsuite-repo building a 3.4.x release whose shell suite still
exists -- both glob to the same test name and scratch dir and race under -j,
producing spurious failures. Drop .test discovery entirely.
The regression test honestly skips when it cannot force the receiver's
output mkstemp() to fail -- as root (root bypasses DAC) and on Cygwin
(chmod 0555 does not deny the owner a write). The ubuntu, ubuntu-22.04,
almalinux and macOS jobs run `make check` as root, and Cygwin can't
enforce the unwritable directory, so the test skips on all of them.
runtests.py fails a run on any skip-set mismatch, so add the test to
those jobs' RSYNC_EXPECT_SKIPPED lists; the BSD/Solaris jobs run as root
too but enforce no expected-skip set, so they need no change.
Also tighten the pass condition. The post-chmod writability probe already
guarantees the receiver discards (mkstemp must fail), so an exit 0 would
mean the file actually transferred and the discard path was never
exercised -- a silent false-pass. Require exactly exit 23 (the forced
discard leaves the file untransferred); 12 remains the pre-fix crash.
Drives a real sender<->receiver pair (client sender -> daemon receiver,
both the binary under test in the default pipe transport) so the receiver
actually takes the recv_files discard path -- a local `rsync a b` does
not. The basis and source share a leading block so the generator emits
real sums and the receiver gets a block MATCH; the destination directory
is made unwritable so the receiver's output mkstemp() fails and it
discards the delta. Pre-fix the receiver SIGSEGVs in full_fname(NULL),
which the client sees as a protocol-data-stream error (code 12); post-fix
it drains the delta and reports a benign code 23 (or 0).
Skips (exit 77) when run as root, since root bypasses DAC and the
unwritable destination would not make mkstemp() fail -- so the discard
path, and the bug, would never be reached.
Verified red-on-buggy / green-on-fixed against the 0d0399bb receiver.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
receive_data() crashed a receiver that was merely DISCARDING a file's
delta stream. discard_receive_data() calls receive_data() with
fname == NULL and fd == -1, so size_r == 0 and mapbuf == NULL. A normal
block-MATCH token (against a block the basis and source share) then
reaches the !mapbuf branch added in 31fbb17d ("receiver: fix absolute
--partial-dir delta resume"), which calls full_fname(fname). full_fname()
dereferences its argument unconditionally (util1.c: `if (*fn == '/')`),
so fname == NULL faults there -> receiver SIGSEGV.
This is a normal-operation crash with a stock cooperating sender, not an
adversarial one. The generator hands the sender real block sums whenever
the basis is readable and we're in delta mode; the receiver only decides
to discard afterwards, when its output cannot be produced -- e.g. the
destination directory is not writable (mkstemp fails), the basis turns
out to be a directory, or a --partial-dir resume is skipped. A MATCH
token arriving during that discard hit the NULL deref.
The 31fbb17d branch is correct only for a REAL output transfer (fd != -1,
fname valid): there, a block match with no mapped basis is a genuine
protocol inconsistency (the generator promised a basis the receiver could
not open), and honoring it would silently omit those bytes from the
verification checksum or leave a hole, so hard-erroring -- and
full_fname(fname) -- is right. It conflated that with the discard path.
The discriminator is fd, not mapbuf: on the discard path fd == -1 always;
on the real-output inconsistency fd != -1. Scope the "no basis file"
protocol error to fd != -1 (where fname is non-NULL and full_fname is
safe) and, on the discard path (fd == -1), absorb the matched bytes
benignly (offset += len; continue) -- symmetric with the literal-token
handling just above, and restoring the pre-31fbb17d behavior. The
real-transfer inconsistency check is preserved unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A slow or heavily-loaded fleet box can occasionally flake a concurrency-
sensitive test (e.g. a daemon/lsh test under -j8 on a nested-VM Solaris box).
Rather than dropping the whole target to a lower -j, add a per-target
"max_retry" property: after a run, each failed test is re-run on its own up to
max_retry more times, and any that then pass are dropped from the failure list.
Recovered tests are listed in a new "RECOVERED" report section, so a flake is
surfaced, never silently hidden.
Applies to every pass for the target (pipe, tcp, protoNN, nonroot). Default 0
keeps the current no-retry behaviour.
A target can list older "protocols" (e.g. [30, 29]) in the fleet config;
each runs as an extra stdio-pipe pass with runtests --protocol=N, the fleet
analogue of a workflow's check30/check29 steps. The passes reuse the same
parsed RSYNC_EXPECT_SKIPPED list as the default pipe run and appear as protoNN
columns in the report and --timing breakdown. Targets without the key run only
the default protocol and show "-" there.
The example config's ubuntu-2604 target (mirroring ubuntu-build.yml, which has
check30/check29 steps) now sets protocols: [30, 29].
Covers both halves: a --mkpath file-to-file --dry-run must succeed and
match the real run (the #880 abort), and a plain file-to-file --dry-run
onto an existing differing destination must still itemize the real change
rather than report it as brand new. Both compare "--dry-run -i" output
against the real run.
Co-authored-by: Stiliyan Tonev (Bark) <stiliyan21@gmail.com>
A single-file --mkpath copy whose destination parent does not exist
failed under --dry-run: make_path() only *reports* the directories it
would create in a dry run, so change_dir#3 then tried to chdir into a
parent that isn't there and aborted with "change_dir#3 ... failed".
When the parent is genuinely missing in a dry run, skip the chdir and
mark the destination as not-yet-present (dry_run++), exactly as the
multi-file/dir-creation path already does, so the generator doesn't
probe the missing tree. Gating it on the missing-parent case keeps an
ordinary file-to-file dry run chdir'ing into and itemizing against an
existing destination.
Fixes: #880
Co-authored-by: Stiliyan Tonev (Bark) <stiliyan21@gmail.com>
doc/rsync.sgml is a 1996-2002 DocBook user manual (with README-SGML
describing the docbook-utils build) that was long ago superseded by the
markdown man pages. It is unmaintained and referenced by nothing in the
build. This empties doc/.