Compare commits

...

8 Commits

Author SHA1 Message Date
Andrew Tridgell
66712a90b3 rsync-web: updates for the 3.4.4 release 2026-06-08 14:45:03 +10:00
Andrew Tridgell
b780749ffb release.py: accept a git worktree in require_top_of_checkout()
In a git worktree .git is a file (a gitdir pointer), not a directory,
so os.path.isdir('.git') wrongly aborted with "no .git dir" when the
release was run from a worktree. Use os.path.exists() so it works from
both a normal checkout and a linked worktree.
2026-06-08 14:44:37 +10:00
Andrew Tridgell
d25c5e4b11 ci: move the daily scheduled jobs to weekly
Every platform build (the BSD/Solaris/macOS/cygwin/almalinux/ubuntu jobs),
coverage, the version-mix job and the android static build ran on a daily cron
*in addition to* push and pull_request to master. Since push/PR already cover
every code change, the cron only adds drift coverage -- catching breakage from a
moving runner image or toolchain that no commit triggers. Those images do not
change daily, so a daily run mostly re-tests an unchanged tree.

Move them all to a weekly cron (Mondays, keeping each job's existing time) to
keep that drift coverage at roughly a seventh of the Actions spend and log
noise. fleettest was already weekly. Per-change CI on push/PR is unchanged, and
workflow_dispatch still allows an on-demand run.
2026-06-08 10:25:38 +10:00
Andrew Tridgell
1ddfe17d65 fleettest: --cleanup also kills stray flippers/daemons and root-owned dirs
A run killed without a parent-death backstop can strand a TOCTOU path-flipper
(a busy `python -c` rename loop that pins a CPU) and an orphaned test rsyncd
(--no-detach --address=127.0.0.1) that squats its fixed port -- the wedge the
claim_ports() bind-probe now reports and points at --cleanup. Sweep both, best
effort, before removing the run dirs.

Each sweep counts the pattern, kills it (with a `sudo -n` retry for a process a
root-running test left), then re-counts after a settle: KILLED reports what
actually died, and a process that survives (pkill blocked, no passwordless sudo,
missing/limited pkill) is reported as SURVIVED and fails the run instead of
falsely claiming success.

Run-dir removal falls back to `sudo -n rm` so a dir whose contents a root test
owns is removed instead of failing with "Permission denied" (the failure mode
seen on the ubuntu/mac targets); only a dir that survives even sudo is failed.

The kill patterns use the pgrep self-exclusion trick ('r[e]name', 'det[a]ch')
so they match a real process's "rename"/"detach" but not the literal pattern in
the cleanup shell's own argv -- run_on() passes the whole script as the remote
argv, so without it --cleanup would signal itself. The patterns are host-global
(not scoped to one run), so --cleanup is documented to run between runs, not
during one.
2026-06-08 09:41:59 +10:00
Andrew Tridgell
6e6b4135ab testsuite: verify a claimed test port is actually bindable
claim_ports() takes a POSIX byte-range lock per port, which serializes
concurrent live test runs. But the kernel drops that lock the instant the
holding process dies, even if the run left an orphaned rsync --daemon still
bound to the port -- which happens when a run is SIGKILLed on a platform with
no parent-death backstop (rsyncfns only arms PR_SET_PDEATHSIG, Linux-only, so
the BSDs/Solaris/macOS can strand a daemon). A later run then wins the freed
lock while the socket is still squatted and dies with a cryptic "bind() failed:
Address already in use" / "did not see server greeting".

After taking each lock, actually bind the port (SO_REUSEADDR, so a port merely
in TIME_WAIT is not a false positive; only a live squatter fails) and close it
immediately. On failure stop with an actionable message naming the port and the
likely orphaned daemon. Closes the gap that masked the OpenBSD daemon-auth wedge.
2026-06-08 09:41:59 +10:00
Andrew Tridgell
c2b8e4532b fleettest: require runtests.py in --testsuite-repo, not the build tree
When --testsuite-repo provides the suite, the build tree (--repo) need not
carry runtests.py -- it may be an older release whose shell testsuite predates
the Python runtests.py (e.g. a 3.4.1 backport branch built and tested with the
current suite).  Check runtests.py in TESTSUITE_REPO and only require the build
tree to be rsync source (rsync.h).
2026-06-08 06:29:49 +10:00
Andrew Tridgell
7b66c0665f fleettest: add --testsuite-repo to run another tree's suite against this build
--repo couples the built source and the test suite that exercises it.
--testsuite-repo PATH overlays runtests.py + testsuite/ from a second tree onto
the staged build tree, and sources the expected-skip workflows from it, so one
can build an older release (e.g. a 3.4.x stable branch) and run the current
comprehensive suite against that binary. Defaults to --repo, so the existing
single-tree behaviour is unchanged.
2026-06-08 06:29:49 +10:00
Andrew Tridgell
49f8dd1ca4 runtests: stop discovering obsolete *.test shell tests
The shell testsuite was removed in 1f689ec0 (rewritten in Python); only
*_test.py remain, yet collect_tests still globbed *.test and _testbase mapped
foo.test and foo_test.py to the same canonical name. Harmless on a master tree
(no .test files), but when an older tree's *.test files are present -- e.g.
fleettest --testsuite-repo building a 3.4.x release whose shell suite still
exists -- both glob to the same test name and scratch dir and race under -j,
producing spurious failures. Drop .test discovery entirely.
2026-06-08 06:29:49 +10:00
17 changed files with 242 additions and 52 deletions

View File

@@ -18,7 +18,7 @@ on:
- '.github/workflows/*.yml'
- '!.github/workflows/almalinux-8-build.yml'
schedule:
- cron: '42 8 * * *'
- cron: '42 8 * * 1'
jobs:
test:

View File

@@ -21,7 +21,7 @@ on:
- '.github/workflows/*.yml'
- '!.github/workflows/android-static-build.yml'
schedule:
- cron: '42 8 * * *'
- cron: '42 8 * * 1'
workflow_dispatch:
env:

View File

@@ -12,7 +12,7 @@ on:
- '.github/workflows/*.yml'
- '!.github/workflows/coverage.yml'
schedule:
- cron: '42 9 * * *'
- cron: '42 9 * * 1'
workflow_dispatch:
jobs:

View File

@@ -12,7 +12,7 @@ on:
- '.github/workflows/*.yml'
- '!.github/workflows/cygwin-build.yml'
schedule:
- cron: '42 8 * * *'
- cron: '42 8 * * 1'
jobs:
test:

View File

@@ -12,7 +12,7 @@ on:
- '.github/workflows/*.yml'
- '!.github/workflows/freebsd-build.yml'
schedule:
- cron: '42 8 * * *'
- cron: '42 8 * * 1'
jobs:
test:

View File

@@ -12,7 +12,7 @@ on:
- '.github/workflows/*.yml'
- '!.github/workflows/macos-build.yml'
schedule:
- cron: '42 8 * * *'
- cron: '42 8 * * 1'
jobs:
test:

View File

@@ -12,7 +12,7 @@ on:
- '.github/workflows/*.yml'
- '!.github/workflows/netbsd-build.yml'
schedule:
- cron: '42 8 * * *'
- cron: '42 8 * * 1'
jobs:
test:

View File

@@ -12,7 +12,7 @@ on:
- '.github/workflows/*.yml'
- '!.github/workflows/openbsd-build.yml'
schedule:
- cron: '42 8 * * *'
- cron: '42 8 * * 1'
jobs:
test:

View File

@@ -12,7 +12,7 @@ on:
- '.github/workflows/*.yml'
- '!.github/workflows/solaris-build.yml'
schedule:
- cron: '42 8 * * *'
- cron: '42 8 * * 1'
jobs:
test:

View File

@@ -16,7 +16,7 @@ on:
- '.github/workflows/*.yml'
- '!.github/workflows/ubuntu-22.04-build.yml'
schedule:
- cron: '42 8 * * *'
- cron: '42 8 * * 1'
jobs:
test:

View File

@@ -12,7 +12,7 @@ on:
- '.github/workflows/*.yml'
- '!.github/workflows/ubuntu-build.yml'
schedule:
- cron: '42 8 * * *'
- cron: '42 8 * * 1'
jobs:
test:

View File

@@ -33,7 +33,7 @@ on:
- '.github/workflows/*.yml'
- '!.github/workflows/ubuntu-version-mix.yml'
schedule:
- cron: '52 8 * * *'
- cron: '52 8 * * 1'
jobs:
version-mix:

View File

@@ -104,8 +104,8 @@ def require_samba_host():
def require_top_of_checkout():
if not os.path.isfile('packaging/release.py'):
die("Run this script from the top of your rsync checkout.")
if not os.path.isdir('.git'):
die("There is no .git dir in the current directory.")
if not os.path.exists('.git'):
die("There is no .git in the current directory (run from the top of a git checkout or worktree).")
def replace_or_die(regex, repl, txt, die_msg):
@@ -636,6 +636,8 @@ If you have a 'samba' remote configured (git.samba.org:/data/git/rsync.git):
Then upload the tarball + .asc to the GitHub release for {v_ver},
and announce on rsync-announce@, rsync@, and Discord.
NOTE! Also update the PPAs if needed
""")

View File

@@ -26,6 +26,58 @@ License</A> and is currently being maintained by
<img src="badge.svg">
</a></div>
<h3>Rsync version 3.4.4 released</h3>
<i class=date>June 8th, 2026</i>
<p>Rsync version 3.4.4 has been released. This is a regression fix
release for the issues that have been reported with the 3.4.3
security release. Many thanks to everyone who reported the issues
(see <a href="https://download.samba.org/pub/rsync/NEWS#3.4.4">NEWS.md</a>
for credits).<p>
The 3.4.3 release had so many issues for two main reasons:
<ul>
<li>the 3.4 testsuite did not have broad enough coverage to
catch the regressions notices by users
<li>the nature of a security release prevents wide beta testing,
resulting in not enough manual testing in disparate environments
</ul>
To fix this for future releases we have greatly expanded the test
suite for 3.5 (currently in master) and grown the development
team, especially with more people with security expertise.
Thanks for your patience!
<p>See the <a href="https://download.samba.org/pub/rsync/NEWS#3.4.4">3.4.4 NEWS</a> for a detailed changelog.
The latest manpages are also available for:<ul>
<li><a href="https://download.samba.org/pub/rsync/rsync.1"><b>rsync</b>(1)</a>
<li><a href="https://download.samba.org/pub/rsync/rsync-ssl.1"><b>rsync-ssl</b>(1)</a>
<li><a href="https://download.samba.org/pub/rsync/rsyncd.conf.5"><b>rsyncd.conf</b>(5)</a>
<li><a href="https://download.samba.org/pub/rsync/rrsync.1"><b>rrsync</b>(1)</a>
</ul>
<p>The source tar is available here:
<b><a href="https://download.samba.org/pub/rsync/src/rsync-3.4.4.tar.gz">rsync-3.4.4.tar.gz</a>
(<a href="https://download.samba.org/pub/rsync/src/rsync-3.4.4.tar.gz.asc">signature</a>)</b>,
and the diffs from version 3.4.3 are available here:
<b><a href="https://download.samba.org/pub/rsync/src-diffs/rsync-3.4.3-3.4.4.diffs.gz">rsync-3.4.3-3.4.4.diffs.gz</a>
(<a href="https://download.samba.org/pub/rsync/src-diffs/rsync-3.4.3-3.4.4.diffs.gz.asc">signature</a>)</b>.
<p>Patch sets are also available for the older stable series, for
distributors not yet able to move to 3.4.4. Each is GPG signed. The
<i>full</i> set applies to a pristine release tarball; the <i>update</i>
set has only the patches added since the previous security release:<ul>
<li>rsync 3.2.7:
<b><a href="https://download.samba.org/pub/rsync/src/rsync-3.2.7-full.tar.gz">rsync-3.2.7-full.tar.gz</a>
(<a href="https://download.samba.org/pub/rsync/src/rsync-3.2.7-full.tar.gz.asc">signature</a>)</b>,
<b><a href="https://download.samba.org/pub/rsync/src/rsync-3.2.7-update.tar.gz">rsync-3.2.7-update.tar.gz</a>
(<a href="https://download.samba.org/pub/rsync/src/rsync-3.2.7-update.tar.gz.asc">signature</a>)</b>
<li>rsync 3.4.1:
<b><a href="https://download.samba.org/pub/rsync/src/rsync-3.4.1-full.tar.gz">rsync-3.4.1-full.tar.gz</a>
(<a href="https://download.samba.org/pub/rsync/src/rsync-3.4.1-full.tar.gz.asc">signature</a>)</b>,
<b><a href="https://download.samba.org/pub/rsync/src/rsync-3.4.1-update.tar.gz">rsync-3.4.1-update.tar.gz</a>
(<a href="https://download.samba.org/pub/rsync/src/rsync-3.4.1-update.tar.gz.asc">signature</a>)</b>
</ul>
<h3>Rsync version 3.4.3 released</h3>
<i class=date>May 20th, 2026</i>

View File

@@ -191,35 +191,31 @@ _PY_TEST_SUFFIX = '_test.py'
def _is_test_path(path):
base = os.path.basename(path)
return base.endswith('.test') or base.endswith(_PY_TEST_SUFFIX)
return os.path.basename(path).endswith(_PY_TEST_SUFFIX)
def _testbase(path):
"""Strip the test extension to get the canonical test name."""
base = os.path.basename(path)
if base.endswith('.test'):
return base[:-len('.test')]
if base.endswith(_PY_TEST_SUFFIX):
return base[:-len(_PY_TEST_SUFFIX)]
return base
def collect_tests(suitedir, patterns):
"""Collect test scripts (.test or _test.py) matching the given patterns."""
"""Collect test scripts (_test.py) matching the given patterns."""
if not patterns:
candidates = (glob.glob(os.path.join(suitedir, '*.test'))
+ glob.glob(os.path.join(suitedir, '*' + _PY_TEST_SUFFIX)))
candidates = glob.glob(os.path.join(suitedir, '*' + _PY_TEST_SUFFIX))
tests = sorted(p for p in candidates if _is_test_path(p))
else:
seen = set()
tests = []
for pat in patterns:
# Accept either bare name ("mkpath"), explicit extension, or glob.
if pat.endswith('.test') or pat.endswith('.py'):
if pat.endswith('.py'):
pats = [pat]
else:
pats = [pat + '.test', pat + _PY_TEST_SUFFIX]
pats = [pat + _PY_TEST_SUFFIX]
for p in pats:
for m in sorted(glob.glob(os.path.join(suitedir, p))):
if _is_test_path(m) and m not in seen:

View File

@@ -31,8 +31,11 @@ without interfering: each pushes, builds and tests in isolation. The run dir is
removed when the run ends -- on success or failure, and best-effort on
Ctrl-C/kill (pass --keep to retain it for inspection). A run that is hard-killed
(SIGKILL), or signalled mid-push, or whose ssh dies during cleanup can leave a
stray <builddir>-<id> behind; sweep those with `fleettest.py --cleanup`
(optionally scoped with --targets). Because each
stray <builddir>-<id> behind -- plus an orphaned path-flipper or test rsyncd on
platforms without a parent-death backstop; sweep all of those (root-owned files
included, via sudo -n) with `fleettest.py --cleanup` (optionally scoped with
--targets). Run --cleanup between runs, not during one: its process kills are
host-global and would also catch a concurrent run's flipper/daemon. Because each
run starts from a fresh dir, every build is a full configure + build.
PROVISIONING: each target must have the build toolchain its workflow's prepare
@@ -83,7 +86,11 @@ from pathlib import Path
# source tree these point at, so it must be run from inside an rsync checkout
# or given --repo PATH.
REPO = Path.cwd()
WORKFLOWS = REPO / ".github" / "workflows"
# Source tree providing the test suite (runtests.py + testsuite/). Defaults to
# REPO; --testsuite-repo decouples it so one tree is built and another's suite is
# run against the result.
TESTSUITE_REPO = REPO
WORKFLOWS = TESTSUITE_REPO / ".github" / "workflows"
# Fleet config (overridable with --fleet): ~/.fleettest.json is tried first, then
# fleettest.json next to this script. The example template sits next to the
@@ -767,33 +774,94 @@ def _on_signal(signum, frame):
os._exit(130 if signum == signal.SIGINT else 143)
# sweep() counts a pattern, kills it (best effort; sudo -n retry for processes a
# root-running test left), then RE-counts after a settle so we report what
# actually died (KILLED = before-after) and flag any survivor (SURVIVED, sets
# fail) instead of claiming success when pkill couldn't reach it. The patterns
# use the pgrep self-exclusion trick -- 'r[e]name'/'det[a]ch' match a real
# process's "rename"/"detach" but not the bracketed literal in this script's own
# argv (run_on passes the whole script as the remote argv), so we never signal
# ourselves. @BASE@ is substituted with the target's run-dir prefix.
_CLEANUP_SCRIPT = r'''fail=0
sweep() {
command -v pgrep >/dev/null 2>&1 || return 0
before=$(pgrep -f "$2" 2>/dev/null | wc -l | tr -d ' ')
[ "$before" -gt 0 ] 2>/dev/null || return 0
pkill -f "$2" 2>/dev/null
sudo -n pkill -f "$2" 2>/dev/null
sleep 1
after=$(pgrep -f "$2" 2>/dev/null | wc -l | tr -d ' ')
killed=$((before - after))
[ "$killed" -gt 0 ] 2>/dev/null && echo "KILLED $killed stray $1(s)"
if [ "$after" -gt 0 ] 2>/dev/null; then
echo "SURVIVED $after stray $1(s)"
fail=1
fi
}
sweep flipper 'r[e]name.*r[e]name.*r[e]name'
sweep daemon 'det[a]ch --address=127.0.0.1'
for d in @BASE@-*; do
[ -e "$d" ] || continue
if rm -rf -- "$d" 2>/dev/null || sudo -n rm -rf -- "$d" 2>/dev/null; then
echo "REMOVED $d"
else
echo "FAILED $d"
fail=1
fi
done
exit $fail
'''
def cleanup_remnants(targets: list[Target]) -> int:
"""--cleanup mode: remove every <base>-* run dir on each target, reporting
what each removed. Returns a process exit code. Only suffixed run dirs are
swept -- a bare <base> is left alone."""
"""--cleanup mode: on each target, kill the stray processes a killed run can
leave behind, then remove every <base>-* run dir, reporting what went.
Returns a process exit code. Only suffixed run dirs are swept -- a bare
<base> is left alone.
A run that is SIGKILLed (or whose ssh drops) can strand two kinds of process
on platforms without a parent-death backstop: the TOCTOU path-flipper (a
busy `python -c` rename loop that pins a CPU) and an orphaned test rsyncd
(`--no-detach --address=127.0.0.1`, which then squats its fixed port -- the
very wedge claim_ports()' bind-probe now reports). Both are killed best
effort (sudo -n retry for root-owned ones); a kill is verified by re-counting
afterwards, and a process that survives is reported and fails the run.
CAVEAT: the kill patterns are host-global, not scoped to a particular run, so
--cleanup assumes no *other* fleettest run is active on the target -- it
would also kill a concurrent run's flipper/daemon (and any manual `rsync
--daemon --no-detach --address=127.0.0.1`). Run it between runs, not during
one. Run dirs whose contents a root test owns are removed via a `sudo -n rm`
fallback; only a dir that survives even that is a failure."""
rc = 0
for t in targets:
base = t.builddir
if _unsafe_builddir(base):
log(f"[{t.name}] skipped (unsafe builddir {base!r})")
continue
# Echo each match before removing it so the harness can report what
# went; an unmatched glob stays literal and is skipped by the -e test.
script = (f'set -e\n'
f'for d in {base}-*; do\n'
f' [ -e "$d" ] || continue\n'
f' echo "$d"\n'
f' rm -rf -- "$d"\n'
f'done\n')
r = run_on(t, script, timeout=120)
removed = [ln for ln in r.out.splitlines() if ln.strip()]
if r.rc != 0:
# Structured markers (KILLED/SURVIVED/REMOVED/FAILED) keep the report
# clean even though run_on() folds stderr into stdout.
r = run_on(t, _CLEANUP_SCRIPT.replace("@BASE@", base), timeout=120)
lines = r.out.splitlines()
removed = [ln.split(" ", 1)[1] for ln in lines if ln.startswith("REMOVED ")]
failed = [ln.split(" ", 1)[1] for ln in lines if ln.startswith("FAILED ")]
killed = [ln.replace("KILLED ", "killed ", 1)
for ln in lines if ln.startswith("KILLED ")]
survived = [ln.replace("SURVIVED ", "still alive: ", 1)
for ln in lines if ln.startswith("SURVIVED ")]
msgs = killed[:]
if removed:
msgs.append("removed: " + " ".join(removed))
if survived:
rc = 1
log(f"[{t.name}] cleanup error (rc={r.rc}): {r.out.strip()[:200]}")
elif removed:
log(f"[{t.name}] removed: {' '.join(removed)}")
else:
log(f"[{t.name}] nothing to remove")
msgs += survived
if failed:
rc = 1
msgs.append("could not remove (even with sudo): " + " ".join(failed))
if r.rc not in (0, 1):
rc = 1
msgs.append(f"cleanup error rc={r.rc}: {r.out.strip()[:160]}")
log(f"[{t.name}] " + ("; ".join(msgs) if msgs else "nothing to remove"))
return rc
@@ -809,24 +877,44 @@ def main() -> int:
ap.add_argument("--keep", action="store_true",
help="keep each run's build dir (default: remove it at exit)")
ap.add_argument("--cleanup", action="store_true",
help="remove stray <builddir>-* run dirs on the targets, then exit")
help="kill stray flippers/test daemons and remove stray "
"<builddir>-* run dirs (root-owned via sudo -n) on the "
"targets, then exit; run between runs, not during one "
"(kills are host-global)")
ap.add_argument("--jobs", type=int, help="override -j for both transports")
ap.add_argument("--timing", action="store_true",
help="report per-target wall-clock (push/build/test) to find "
"the slowest target")
ap.add_argument("--repo", help="rsync source tree to build (default: cwd)")
ap.add_argument("--testsuite-repo",
help="rsync tree to take runtests.py + testsuite/ from "
"(default: --repo). Build one tree and run another's test "
"suite against it, e.g. --repo ../rsync-v3.4 --testsuite-repo .")
ap.add_argument("--fleet", help="fleet config JSON (default: ~/.fleettest.json, "
"else fleettest.json next to this script)")
ap.add_argument("--list", action="store_true", help="list targets and exit")
args = ap.parse_args()
global REPO, WORKFLOWS
global REPO, WORKFLOWS, TESTSUITE_REPO
REPO = Path(args.repo).resolve() if args.repo else Path.cwd()
WORKFLOWS = REPO / ".github" / "workflows"
if not args.cleanup and not (REPO / "runtests.py").is_file():
print(f"{REPO} is not an rsync source tree (no runtests.py); "
f"run from inside a checkout or pass --repo", file=sys.stderr)
return 2
TESTSUITE_REPO = Path(args.testsuite_repo).resolve() if args.testsuite_repo else REPO
# The expected-skip lists travel with the suite, so read workflows from the
# tree that provides the tests.
WORKFLOWS = TESTSUITE_REPO / ".github" / "workflows"
if not args.cleanup:
# The Python test suite (runtests.py + testsuite/) comes from
# TESTSUITE_REPO, so that is where runtests.py must live. The build tree
# (REPO) only has to be a buildable rsync source -- it may be an older
# release whose runtests.py predates the Python suite, or lacks it.
if not (TESTSUITE_REPO / "runtests.py").is_file():
print(f"{TESTSUITE_REPO} has no runtests.py; run from inside a "
f"checkout or pass --testsuite-repo a tree with the Python "
f"test suite", file=sys.stderr)
return 2
if not (REPO / "rsync.h").is_file():
print(f"{REPO} is not an rsync source tree (no rsync.h); "
f"run from inside a checkout or pass --repo", file=sys.stderr)
return 2
if args.fleet:
config_path = Path(args.fleet).resolve()
@@ -905,6 +993,19 @@ def main() -> int:
print(f"git archive failed: {ar.stderr}", file=sys.stderr)
return 2
# --testsuite-repo: overlay another tree's runtests.py + testsuite/ onto
# the built source (merge, no delete). Build REPO's rsync, but run
# TESTSUITE_REPO's suite against it. The leftover .test files from REPO
# are ignored by a Python runtests.py (it globs *_test.py).
if TESTSUITE_REPO != REPO:
ov = subprocess.run(
f"git -C {TESTSUITE_REPO} archive HEAD -- runtests.py testsuite "
f"| tar -x -C {staging}",
shell=True, capture_output=True, text=True)
if ov.returncode != 0:
print(f"testsuite overlay archive failed: {ov.stderr}", file=sys.stderr)
return 2
# Tests that opt into the non-root pass (same for every target).
args.nonroot_tests = discover_nonroot_tests(Path(staging) / "testsuite")

View File

@@ -175,6 +175,42 @@ def _open_lock_file() -> int:
return fd
def _probe_bindable(port: int) -> 'None':
"""Confirm `port` is actually free once we hold its claim_ports() lock.
The byte-range lock only coordinates *live* test drivers, and the kernel
releases it the instant the holding process dies -- even if that driver left
an orphaned daemon still bound to the port. That happens when a run is
SIGKILLed (or its ssh drops) on a platform with no parent-death backstop:
rsyncfns only arms PR_SET_PDEATHSIG, which is Linux-only, so on the
BSDs/Solaris/macOS a killed fleettest run can strand its rsyncd, which then
squats the fixed test port forever. A later run wins the (now-free) lock but
the socket is still taken, and the daemon dies with a cryptic "bind() failed:
Address already in use" / the client "did not see server greeting".
So actually try to bind it. SO_REUSEADDR is used so a port merely in
TIME_WAIT (recently and cleanly closed) is NOT a false positive; only a
live bound/listening socket -- a real squatter -- makes the bind fail, and
then we stop here with an actionable message instead of failing obscurely
later. The probe socket is closed immediately, freeing the port for the
daemon that is about to bind it.
"""
s = _socket.socket(_socket.AF_INET, _socket.SOCK_STREAM)
s.setsockopt(_socket.SOL_SOCKET, _socket.SO_REUSEADDR, 1)
try:
s.bind(('127.0.0.1', port))
except OSError as e:
test_fail(
f"port {port} was claimed for this run but something is still bound "
f"to 127.0.0.1:{port} ({e.strerror}). The claim_ports() lock only "
"serializes live test runs, so a still-bound port almost always "
"means an orphaned 'rsync --daemon' from a previously killed run "
f"(find it with `fstat | grep {port}` / `netstat -an | grep {port}` "
"and kill it, or run `fleettest.py --cleanup`), then retry.")
finally:
s.close()
def claim_ports(*ports: int) -> 'None':
"""Reserve the given TCP port numbers for the rest of this process.
@@ -210,6 +246,9 @@ def claim_ports(*ports: int) -> 'None':
# F_SETLKW via fcntl.lockf(LOCK_EX, length, start): exclusive
# byte-range lock on byte `port`, blocking until acquired.
fcntl.lockf(_port_lock_fd, fcntl.LOCK_EX, 1, port)
# The lock only proves no other live test run owns the port; an orphaned
# daemon from a killed run can still squat it (see _probe_bindable).
_probe_bindable(port)
# --- standalone rsyncd helpers ---------------------------------------------