tailscale

mirror of https://github.com/tailscale/tailscale.git synced 2026-07-20 04:22:12 -04:00

Author	SHA1	Message	Date
Simon Law	da8cd5cc7f	ipn/ipnlocal: fix documentation typo, NodeAttrCacheNetworkMaps (#19851 ) Updates #cleanup Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 22:19:10 -07:00
Simon Law	988615dbad	ipn/ipnlocal,tstest/integration: pause the control client consistently (#19846 ) There are two places where tailscaled transitions into a paused state: 1. tailscaled’s controlclient is initially created, 2. tailscale down, or the GUI equivalent, commands it to. This patch unifies the implementation of both scenarios into LocalBackend.shouldPauseControlClientLocked to prevent the implementation from drifting. The flaky tstest/integration.TestNoControlConnWhenDown test exposed this mismatch, but only by accident. This patch also changes TestNode.MustDown so that it runs `tailscale down` and then waits for the testcontrol server to finish handling any associated /machine/map requests. Fixes #19831 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 17:58:44 -07:00
Adrian Dewhurst	5d8f401956	net/dns: fix handling non-IP single split DNS Fixes #19834 Change-Id: I4d48efed00cd080b14c6fd713ff21e53a5a6ee3c Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>	2026-05-22 20:45:58 -04:00
Brad Fitzpatrick	5295e3e119	ipn/{ipnstate,ipnlocal}: add integer NodeID to PeerStatus In `aa5da2e5f2` we made the IPN bus include deltas, including the PeersRemoved, sending a slice of integer NodeIDs that were removed. But when updating xcode, I realized there was no way to map those integers to the stable node IDs used in other places. I was consdering changing the just-added ipn.Notify.PeersRemoved from an IntID to a string StableID, but then it doesn't match the MapResponse wire protocol, which we've tried to match so far. Instead, just add the integer ID as well. Callers can use whichever world they want, having both. It's a little regrettable that we still have two worlds of IDs, but oh well. Neither is really suitable to a hypothetical future fully federated world of control servers anyway, so we'll need a third type later anyway, so just live with the two we have for now. Updates #12542 Change-Id: Ib8fd48a265e1da1f8779152f141f624a7f7260e9 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-22 08:16:55 -07:00
Amal Bansode	e32b9bde1d	control/controlclient: fix deadlock in map session change queue processing (#19828 ) Holding an exclusive lock while writing to the unbuffered changequeue chan is likely going to deadlock when the run() path may try to grab the same lock before reading from the chan to drain it (on map session close). This causes the client to stop processing new map responses and TSMP disco key advertisements. There is a good probability of inducing this deadlock using the old code and new test added in this commit: TestUpdateDiscoForNodeCallback/test_deadlock. Also fix an unintentional regression in how the client responds to a mapResponse sleep command. `85bb5f84a5` moved the processing of mapResponses into a new goroutine, serialized via mapSession's changequeue. Thus, controlclient stopped sleeping in the same goroutine servicing mapResponses/control connections. This commit brings us back to sleeping synchronously in the same goroutine as controlclient. Updates #12639 Signed-off-by: Amal Bansode <amal@tailscale.com> Signed-off-by: Claus Lensbøl <claus@tailscale.com> Co-authored-by: Claus Lensbøl <claus@tailscale.com>	2026-05-22 07:13:18 -07:00
Simon Law	fd2405ca8f	tstest/integration: mark TestNoControlConnWhenDown as a flaky test (#19832 ) Updates #19831 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-21 17:36:09 -07:00
Simon Law	7dabebc691	net/traffic: switch rendezvous hashing from SHA256 to FNV-1a (#19821 ) In PR tailscale/corp#30448, we originally decided to break ties using SHA256 for our rendezvous hashing algorithm. Now that we’ve had some experience with it, we think that FNV-1a is a better choice. It distributes bits evenly, it’s much faster, and it doesn’t need to be cryptographically secure. The FNV designers recommend FNV-1a over the deprecated FNV-1. This PR makes the switch and updates the related tests, since changing the algorithm changes which stable pick gets selected. As of 2026-05, this is the best time to make this change, since there are almost no clients in the wild with traffic steering enabled. Updates #17366 Updates tailscale/corp#29964 Updates tailscale/corp#29966 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-21 10:11:59 -07:00
Brad Fitzpatrick	aa5da2e5f2	ipn/ipnlocal, control/controlclient: process node adds/removes in constant time For large tailnets (~50k+ nodes) with frequent peer churn (ephemeral GitHub Actions workers etc.), tailscaled used to rebuild the full netmap and fan it out on the IPN bus on every MapResponse that added or removed a peer. There were two O(N) costs per delta: the full netmap rebuild + every Notify.NetMap encode to every bus watcher. This change tackles both: 1. Plumb O(1) peer add/remove through the delta path. PeersChanged and PeersRemoved no longer prevent the delta happy path; instead, they mutate the per-node-backend peer map in place. 2. Restrict ipn.Notify.NetMap emission to the platforms whose host GUIs still depend on it (Windows, macOS, iOS) and migrate in-tree consumers off it everywhere else: - Migrate reactive consumers (containerboot, kube agents, sniproxy, tsconsensus, etc.) off Notify.NetMap to the previously-added Notify.SelfChange signal so they no longer have to subscribe to the full netmap. - Add ipn.NotifyNoNetMap so GUI clients on "legacy-emit" platforms that have already migrated can opt out of the per-watcher NetMap encode. - Gate Notify.NetMap emission on the producer side by a compile- time GOOS check, so the supporting code is dead-code-eliminated on Linux and other geese where no GUI consumer needs it. Re-running BenchmarkGiantTailnet from tstest/largetailnet, which was added along with baseline numbers on unmodified main in `ad5436af0d`, the per-delta cost (one peer add+remove pair) is now ~O(1) regardless of tailnet size N: N no-watcher (ms/op) bus-watcher (ms/op) before now factor before now factor 10000 32 0.11 300x 166 0.13 1300x 50000 222 0.11 2000x 865 0.13 6700x 100000 504 0.12 4100x 1765 0.13 13400x 250000 1551 0.12 12500x 4696 0.15 32400x Updates #12542 Change-Id: I94e34b37331d1a8ec74c299deffadf4d061fda9e Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-21 09:26:19 -07:00
Brad Fitzpatrick	2703f91174	wgengine/magicsock: fix data race in TestSetDERPMapDoReStun SetDERPMap spawns a goroutine that calls ReSTUN, which logs via the test logger. If the test returns before that goroutine logs, the goroutine races with testing cleanup. Use tstest.WhileTestRunningLogger so the goroutine's logf call becomes a no-op once the test finishes. Fixes #19829 Change-Id: I1097f98e40ffd1c5dd7fb7a715c918255853e3c6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-21 08:51:50 -07:00
Simon Law	7ebca58042	net/traffic,ipn/ipnlocal: extract traffic steering utilities (#19682 ) The traffic package contains helpers for evaluating traffic steering scores and picking appropriate nodes. These were extracted from ipnlocal.suggestExitNodeUsingTrafficSteering so they can be reused by the new routecheck package to probe exit nodes in priority order. Updates #17366 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-21 08:28:27 -07:00
Fran Bull	dbe92f98b5	feature/conn25: set assignment expiry based on dns response TTL Updates tailscale/corp#39975 Signed-off-by: Fran Bull <fran@tailscale.com>	2026-05-21 07:25:29 -07:00
Brad Fitzpatrick	f3a117e813	net/tsdial: run happy eyeballs across A and AAAA in UserDial When tailscaled is running in userspace-networking mode behind an exit node (e.g. as a SOCKS5 proxy), it resolves a hostname and then dials a single resolved IP through the tunnel. If the name has both A and AAAA, Go's net.Resolver merges them and we pick ips[0], which on an IPv6-native host is usually AAAA. If the exit node has no IPv6 egress (or vice versa), the dial fails silently through the tunnel and the user sees a hang. Resolve all candidates and race connect attempts across address families with a 300ms happy-eyeballs delay, matching Go's net.Dialer default and the existing pattern in net/dnscache (commit `ee0a03b14`). First success wins; losers are cancelled and any conns they produce are closed. A failBoost channel wakes the launcher when a connect fails fast (e.g. ICMP "no route" via the tunnel) so we don't sit on the 300ms timer when the answer is already known. userDialResolve is refactored into userDialResolveAll (returns the full candidate list) plus a thin single-IP wrapper for callers like UserDialPlan that don't race. UserDial's per-IP dispatch (netstack vs peer dialer vs SystemDial vs std) is extracted to dialOneUser so each candidate can route correctly on its own merits. Also fix serveDial in localapi to pass the original hostname to UserDial rather than a pre-resolved IP, so the race can fire. This fix is single-ended: it works against any exit node, including old ones, with no protocol changes. The trade-off versus filtering on the exit-node side via PeerAPI DoH is that every dial through an unreachable-family exit node costs one failed connect attempt per cache window, rather than zero, which is acceptable given the simplicity. Fixes #19792 Fixes #13257 Change-Id: I9d7645d0034caf3ee22ecdd8070798353f77e94b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-20 18:35:55 -07:00
James Tucker	36c52ef383	tstest/integration/testcontrol: fix serveMap read-modify-write race serveMap cloned s.nodes[nk], mutated the clone outside the mutex, then wrote it back via updateNodeLocked. A concurrent UpdateNode, SetNodeCapMap, or other writer landing between the clone and the writeback would be silently clobbered. Mutate the live node under the mutex instead. Surfaces in tsnet's TestListenService as a flaky ErrUntaggedServiceHost panic: the test calls control.UpdateNode to attach a tag, a concurrent updateRoutine map request from the host races, and the host's next netmap arrives with Tags=[]. Updates #19822 Change-Id: I6c5ebd5e5bf79a40316f53f627157230773cb469 Signed-off-by: James Tucker <james@tailscale.com>	2026-05-20 18:29:58 -07:00
Aria Stewart	61277e3ad4	Construct IPv6 ingress URLs correctly Fixes #19338 Signed-off-by: Aria Stewart <aredridel@dinhe.net>	2026-05-20 17:21:35 -07:00
M. J. Fromberger	c09407002f	ipn/ipnlocal/netmapcache: add UpdateSelfOnly method (#19818 ) Some netmap updates are guaranteed to affect only the "static" parts of the netmap, and so should not require us to walk through all the peers and user profiles when updating the cache. To support this, the new UpdateSelfOnly method updates only the Self node and other tailnet settings that are not dependent on the peers and profiles. Use this when updating the cache on DERP home changes. Updates #12542 Change-Id: Ifed522b29d579fb76e010b4ff738cc4e0a72d27f Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-05-20 16:29:04 -07:00
Simon Law	93dbd33ef7	ipn/ipnlocal: stub system interfaces for TestShouldUseOneCGNATRoute (#19807 ) The TestShouldUseOneCGNATRoute test fails when the underlying system interfaces don’t match what the underlying assumptions of the test. That assumption was that there would only ever be one CGNAT interface: the Tailscale one. This breaks on Linux when border0 is installed because border0 also creates an interface with a CGNAT route. This patch stubs netmon.RegisterInterfaceGetter to replace the system interfaces and netmon.SetTailscaleInterfaceProps to identify the test data that defines the Tailscale interface. This patch also tests the control knob override for CGNAT for every combination of operating system and system interfaces, instead of just a couple of combinations. Fixes #19731 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-20 16:00:14 -07:00
Brad Fitzpatrick	04ae61fe4b	tstest/integration/jswasmtest: add headless-Chromium tests for @tailscale/connect Add Go tests that drive a real headless Chromium (via chromedp) against the built cmd/tsconnect/pkg/ artifact and verify the @tailscale/connect public API surface end-to-end. The package has not been republished in three years, in part because no test exercises the produced artifact at runtime — only tsc --noEmit and a Go build run in CI. TestCreateIPN loads pkg.js into the browser, calls createIPN with a junk auth key, and asserts that pkg.createIPN / pkg.runSSHSession are functions and that createIPN() returns an IPN with the documented run/login/logout/ssh/fetch methods. No control-plane traffic. TestFetchTailnetPeer stands up a full local tailnet (testcontrol + DERP + a tsnet.Server peer) and verifies that the browser-side WASM client can join over WebSocket-noise to the same control, connect to DERP over WSS, and then ipn.fetch() an HTTP service hosted on the tsnet peer through the tailnet. The test asserts the response body matches a known string. Browser state transitions are logged: NoState -> NeedsLogin -> Starting -> Running. Tests are opt-in via --run-headless-browser-tests (matching the existing --run-vm-tests pattern in tstest/natlab/vmtest) so they never fire in casual `go test ./...` runs. When the flag is set, a test is skipped if cmd/tsconnect/pkg/ has not been built, and fails with t.Error if no chromium binary is found on $PATH (honoring $CHROME_BIN as an override). findChromium also falls back to /Applications/Google Chrome.app and /Applications/Chromium.app on darwin, since macOS Chrome's executable lives inside an .app bundle and is not on $PATH by default. The .github/workflows/test.yml wasm job is extended to install google-chrome-stable and run the tests with the flag after build-pkg. To prevent silently testing a stale pkg/main.wasm (built from an older checkout than the rest of the test invocation), build-pkg now writes pkg/build-info.json recording the sha256 of the raw (pre-wasm-opt) go-build output. The test does its own `go build` of cmd/tsconnect/wasm with the same -tags/-trimpath/-ldflags (factored into a new cmd/tsconnect/wasmbuild package shared by both call sites) and t.Fatalfs with a "rebuild" instruction on mismatch. Cost is near-zero because the Go build cache from the prior build-pkg makes the rebuild a cache hit. The new wasmbuild package also replaces cmd/tsconnect's hardcoded -tags string with a minimal-feature-set computation. wasmbuild.Keep names the small set of feature/featuretags entries the browser client actually needs (netstack, logtail, dns, health, c2n, ipnbus); wasmbuild.Tags() emits a ts_omit_<f> for every other omittable feature in feature/featuretags.Features, with transitive deps expanded via featuretags.Requires. An init() panics if Keep references a feature unknown to feature/featuretags so a rename there fails loudly. Net effect on size: 32M raw / 9.4M brotli before this change, 25M raw / 4.4M brotli after — vs the last-published 1.39.98 at 21M / 3.8M. The transitive package-import graph is unchanged (176 tailscale.com/* packages either way): featuretags omits eliminate dead code via `const HasX = false`, not imports. Trimming the import graph would require a separate, larger refactor splitting interface packages by build tag. Writing TestFetchTailnetPeer surfaced several real issues, all fixed here: * cmd/tsconnect built the wasm with the nethttpomithttp2 tag, but control/ts2021 (since commit `1d93bdce2`, "control/controlclient: remove x/net/http2, use net/http", Oct 2025) requires HTTP/2 from net/http's bundled implementation. With nethttpomithttp2 set, the bundle is excluded and the wasm client cannot speak HTTP/2 to any control plane, including production. Drop the tag. Wasm size grows ~1 MB raw / ~300 KB brotli (more than offset by the feature pruning above). The last published @tailscale/connect (1.39.98, early 2023) pre-dates the regression, which is why no consumer has reported the breakage. * tstest/integration/testcontrol.Server's /ts2021 noise upgrade endpoint rejected anything but POST. WebSocket clients (the only transport available to browser-WASM) come in as GET. Allow both; the controlhttp AcceptHTTP path dispatches on the Upgrade header, so the websocket library still enforces GET for WS upgrades. This matches production, where the same controlhttpserver.AcceptHTTP routes purely on the Upgrade header without checking method. * derp/derphttp's urlString built the DERP URL from node.HostName only, dropping node.DERPPort. Non-WS clients use a separate code path (connectToHost) that honors DERPPort, but WebSocket-only clients (browser-WASM) went through urlString and so could not reach a DERP running on any port other than 443. Include the port when it differs from the scheme default. Also move addWebSocketSupport from cmd/derper (where it was main-only) to derp/derpserver.AddWebSocketSupport so tstest/integration.RunDERPAndSTUN can wrap its DERP handler with WebSocket support — without that, the test DERP would not accept the browser's wss connection. Fixes #9394 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: Iff9cdee303e3b239924249b5bffb2fd04e02f391	2026-05-20 10:48:29 -07:00
Brad Fitzpatrick	95d874e9b4	cmd/testwrapper: surface race reports and skip retries when detected A data race in a package matters more than any individual test result. Two related problems: 1. Where go test's race detector text ("WARNING: DATA RACE" plus the goroutine stack traces) lands in JSON output is timing- dependent: it can be attributed to a test that ends up reporting PASS (e.g. when the racing goroutines outlive the test that spawned them and TSan prints during a different test's window). testwrapper's main loop only flushes the logs of failed tests, so the race report ends up stuck in a passing test's buffer and is silently dropped. The race builders just see a bare "FAIL\nFAIL\tpkg\ttime". 2. If the failing test in such a package happens to be marked flaky, testwrapper retries it. That is the worst possible response to a race: the flaky test might not even be the racy code, and a second run without the racy goroutines could "succeed" while hiding the real bug. Address both: scan every output line for the race detector's first- line marker. Track whether the package observed a race at all, on the pkgFinished testAttempt. When a race was seen, fold every per- test log buffer into the package-level logs (so the full report surfaces from the existing pkg-fail flush path), and drop any flaky-test retry plans for that package so we fail immediately instead of running another attempt. Two new tests: - TestRaceSuppressesFlakyRetry verifies that a flaky test alongside a racy test does NOT get retried. - TestRaceAttributedToPassingTest verifies that a race attributed by test2json to a passing test still surfaces in the output. Also add a corpus of captured raw test binary outputs under cmd/testwrapper/testdata/, with one subdirectory per scenario, documenting the six representative shapes that go test -race can emit (race in test body, race in goroutines that outlive a test, race forced into a later test, race in TestMain post-m.Run, and a parallel-tests split-attribution case via a "=== NAME" redirect line). See its README.md for details. Fixes #19603 Change-Id: Ifbfcd67fb3b1882c4907bd9cb2d68a8b5a91dd54 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-19 21:21:05 -07:00
Claus Lensbøl	ee0a03b140	net/dnscache: run happy eyeballs with more than one dest IP (#19770 ) If the context given to DialContext has a shorter lifetime than the OS TCP SYN timeout, and TCP SYNs are dropped from the path to the remote, DialContext would never fall back to try IPv6 after IPv4. Instead, use the normal happy eyeballs race if there is more than one address. This does remove the implicit prioritization of IPv4 over IPv6 in cases where there is only a single IPv4 remote address. Updates #13346 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-05-19 12:59:11 -04:00
Naman Sood	5d56cc8512	util/linuxfw: return error instead of nil pointer dereference Issue #19737 ran into a nil pointer dereference, the cause of which was fixed by #19761. If we end up on this code path with a nil table again, we should bubble that up as an error (which is logged by the health warning system) rather than failing catastrophically. Signed-off-by: Naman Sood <mail@nsood.in>	2026-05-19 10:01:07 -04:00
Brad Fitzpatrick	2b338dd6a8	wgengine, cmd/tailscaled, control/controlclient: remove Engine watchdog The Engine watchdog wrapped every wgengine.Engine method call in a goroutine with a 45s timeout and crashed the process on timeout. It was added years ago to surface deadlocks during development, but the underlying deadlocks have long since been fixed, and even when it did fire it produced obscure stack traces (from inside the watchdog goroutine, not the original caller) without buying much. Audit of userspaceEngine's methods shows none have cyclic locking or unbounded blocking now that ResetAndStop no longer loops waiting for DERPs to drain (`fa49009ee`). The watchdog is dead weight; remove it along with the TS_DEBUG_DISABLE_WATCHDOG escape hatch. Updates #19759 Change-Id: Iba9d718fe1f8718a6631296e336b138c31b99ff1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-15 16:49:28 -07:00
Simon Law	5d1bf80597	feature/routecheck: add ts_omit_routecheck feature flag (#19638 ) RouteCheck, which checks that overlapping routers are reachable, is enabled by default for both tailscaled and tsnet. Updates #17366 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-15 15:50:50 -07:00
Noel O'Brien	894ff5d8ee	cmd/hello: split css and js into separate files (#19771 ) Move the inline CSS and JS into separate files to be more friendly to Content Security Policies. ServeHTTP is updated to serve these assets from the '/static/' path. Updates tailscale/corp#32398 Signed-off-by: Noel O'Brien <noel@tailscale.com>	2026-05-15 09:37:22 -07:00
Alex Chan	0cb432ed84	all: update more references to Tailnet/Network Lock Updates tailscale/corp#37904 Change-Id: I09e73b3248b9ddf86dafe33dfb621bd560f6596d Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-15 16:23:50 +01:00
Fernando Serboncini	c355618e73	wgengine/router/osrouter: skip netfilter add-ons when chain setup fails (#19757 ) linuxRouter has two blocks (connmark rules and the CGNAT drop rule) that gate on cfg.NetfilterMode, the requested config state. This may cause an error when setNetfilterModeLocked fails, since it may keep assuming this config is valid. We now gate both blocks on r.netfilterMode, matching the pattern used by SNAT, stateful, and loopback paths. Fixes #19737 Change-Id: Ia6003a082db99c376e662132d725661afbac0ee9 Signed-off-by: Fernando Serboncini <fserb@tailscale.com>	2026-05-15 09:32:30 -04:00
License Updater	1d3562b314	licenses: update license notices Signed-off-by: License Updater <noreply+license-updater@tailscale.com>	2026-05-14 21:04:41 -07:00
Brad Fitzpatrick	ef1bb5ac16	util/cibuild, cache_key_test: skip TestTsgoRevInCacheKey outside Tailscale CI cibuild.On() returns true for any CI environment that sets CI=true, including Alpine Linux's package build CI. TestTsgoRevInCacheKey was guarded by cibuild.On() (or use of tsgo), so it ran under Alpine's CI with stock Go, where go.toolchain.rev isn't blended into build cache keys, and unsurprisingly failed. Add cibuild.OnTailscaleCI, which keys off GITHUB_REPOSITORY_OWNER to distinguish tailscale/tailscale's own GitHub Actions CI from arbitrary downstream CI, and use it in TestTsgoRevInCacheKey. Fixes #19754 Change-Id: Id31cfe71903a235f1460dca1e2fdf334e3ba1ee5 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-14 15:55:05 -07:00
Brad Fitzpatrick	fa49009eee	wgengine: simplify ResetAndStop, drop drain loop Since `f343b496c3` ("wgengine, all: remove LazyWG, use wireguard-go callback API for on-demand peers"), Reconfig is fully synchronous: magicConn.UpdatePeers, wgdev.RemovePeer, router.Set, and dns.Set all return when the work is done, and the peer list is updated under wgLock before Reconfig returns. So after Reconfig with empty configs, len(st.Peers) is already 0. The old loop also waited for st.DERPs to drain to 0, but UpdatePeers only edits maps; active DERP connections idle out on their own timeout. The sole caller (LocalBackend.stopEngineAndWait) doesn't inspect st.DERPs anyway; it just hands the Status to setWgengineStatusLocked. So the drain-wait was for nothing observable and could theoretically (or at least appear to readers to) loop forever holding b.mu. Remove that reader confusion by removing the backoff loop entirely. Updates #19759 Change-Id: Ibfac3f0baabcad7604b713c934a8fc37932e0a50 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-14 15:45:38 -07:00
Brad Fitzpatrick	93440604e0	tstest/natlab/vmtest: add TestPeerRelay Add a VM-based natlab test that exercises the peer-relay feature (feature/relayserver) end-to-end across three Tailscale nodes whose network topology makes a direct A<->B UDP path impossible: both peers are behind HardNAT (FreeBSD/pfSense-style endpoint-dependent NAT) with no port-mapping services, while the relay node is behind One2OneNAT so its STUN-discovered WAN endpoint is reachable from both peers. The test enables the relay server via EditPrefs, then waits for an a->b PingDisco whose PingResult.PeerRelay is set (proving magicsock chose the peer-relay path, not DERP), and finally asserts that the relay's DebugPeerRelaySessions LocalAPI reports the session. The existing TestPeerRelayPing in tstest/integration runs three tailscaled processes on the loopback interface with no NATs; this new vmtest covers peer relay through real per-VM kernels and NATs. To wire control-server capabilities into vmtest, also add a PeerRelayGrants() EnvOption (sibling of AllOnline, SameTailnetUser) that flips testcontrol.Server.PeerRelayGrants so the wildcard packet filter grants tailcfg.PeerCapabilityRelay and PeerCapabilityRelayTarget; without those caps magicsock won't consider any peer a candidate relay. Updates #13038 Change-Id: Ib3440b83ec442da0d3b89ffa48ceea9398ea9062 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-14 14:47:29 -07:00
Andrew Lytvynov	9437a634e6	scripts/installer.sh: handle Zorin OS versions separately from Ubuntu (#19758 ) Their version scheme is different, even though the OS is based on Ubuntu. We need to check Zorin's version numbers to pick the right APT_KEY_TYPE. Updates #18925 Signed-off-by: Andrew Lytvynov <awly@tailscale.com>	2026-05-14 14:04:04 -07:00
M. J. Fromberger	4eb977413a	tstest/natlab/vmtest: add helpers for fatal step errors (#19753 ) In a lot of places, we construct an error to End a step, then immediately log it to the governing test as test fatal. Save ourselves a bit of boilerplate by putting methods on Step for that. There are a couple cases this doesn't cover, e.g., where we construct the Step outside a subtest that wants to fail individually, but it helps enough to pay for its lines. Updates #13038 Change-Id: I71f9900942962de16609b6b198d3ba13d6958a5f Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-05-14 09:24:47 -07:00
Claus Lensbøl	8203edc099	.github/workflows: change natlab test trigger label (#19750 ) The label "natlab" is a bit confusing and also used for other things. Instead, change the trigger label to "run-natlab-tests". Updates #13038 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-05-14 11:53:13 -04:00
Fernando Serboncini	2a06fb66d0	cmd/cloner: preserve nil-valued entries when cloning map (#19749 ) The codegen path for map-of-slice-of-pointer fields, skipped nil-valued entries. That dropped the key from the map. This broke how dns.Config.Routes uses nil values sentinels. Fixes #19730 Fixes #19732 Fixes #19746 Fixes #19744 Change-Id: Ic6400227f4ab21b3ca0e8c0eeecf9b83d145a9ab Signed-off-by: Fernando Serboncini <fserb@tailscale.com>	2026-05-14 10:30:59 -04:00
Mike O'Driscoll	48919f708b	util/linuxfw: fix nftables endianness and add connmark conditional check (#19725 ) Fix the following issues: 1. Endianness Bug: The nftables runner used hardcoded big-endian byte arrays for firewall mark values (0xff0000, etc.), breaking bitwise operations on little-endian systems (all x86/x64, ARM). This caused connmark save/restore rules to silently fail. Fixed by using binary.NativeEndian to generate correct byte order for the host system. 2. Connmark Restore Conditional Check: The connmark restore mechanism unconditionally overwrote packet marks, even when Tailscale hadn't set any mark bits in conntrack. This destroyed mark bits set by other systems (VPNs, policy routing, vendor flags), breaking coexistence. Fixed by adding a conditional check to only restore when (ct mark & 0xff0000) != 0, preventing the worst case of wiping all marks to zero. Changes: - util/linuxfw/linuxfw.go: Added nativeEndianUint32() helper and updated all mask functions to use native byte order instead of hardcoded bytes - util/linuxfw/nftables_runner.go: Added conditional check in makeConnmarkRestoreExprs() to only restore when ct mark has Tailscale bits set; added detailed comment about bit preservation limitations - util/linuxfw/iptables_runner.go: Added conditional check using -m connmark ! --mark to match nftables behavior - Tests updated: Fixed byte-level regression tests to expect little-endian byte sequences and verify the new conditional check Note: Perfect bit preservation in nftables remains challenging due to nftables expression VM limitations. The current implementation prevents the critical case of wiping marks with zero. Updates #3310 Fixes #11803 Related to #8555 Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>	2026-05-14 09:11:24 -04:00
James Tucker	e7415e6393	util/eventbus: unify Subscriber/SubscriberFunc cores; structural symmetry Brings Subscriber[T] in line with the same non-generic-core pattern already applied to SubscriberFunc[T] and Publisher[T]: - Renames subscriberFuncCore to subscriberCore and shares it between Subscriber[T] and SubscriberFunc[T]. Both typed facades hold a subscriberCore plus their respective per-T delivery state (Subscriber: chan T; SubscriberFunc: nothing, the user callback is captured in the dispatch closure). - The bus's outputs map and subscriber-interface itab key on subscriberCore for both subscriber kinds, so adding a new Subscribe[T] call site no longer pays a per-T itab, dictionary, or equality function for the subscriber-interface side. - Subscribe[T] now hoists the non-generic constructor portion into newSubscriberCore (timer setup, core allocation, cached type/typeName, unregister method-value), matching SubscribeFunc. The dispatch loop is intentionally NOT extracted to a non-generic helper for Subscriber[T], unlike SubscriberFunc[T]. The reason is the typed channel send 'case s.read <- t:' must appear lexically inside the select; the only way to lift it into a non-generic loop is to bridge typed and untyped via a per-event goroutine, which costs ~2.7x throughput on BenchmarkBasicThroughput. We keep dispatchTyped on the generic facade and accept the per-shape stencil cost as the cheaper alternative. Symbol-level effect on tailscaled (linux/amd64, measured via `go tool nm -size`): Before: (Subscriber[T]).dispatch 2 shape stencils: 1,682 + 1,549 = 3,231 B 3 thin per-T wrappers: 124 B each = 372 B 2 deferwrap1 helpers: 62 B each = 124 B total: 3,727 B After: (Subscriber[T]).dispatchTyped 2 shape stencils: 1,678 + 1,582 = 3,260 B 0 per-T wrappers (replaced by closure stored on core) 2 deferwrap1 helpers: 62 B each = 124 B total: 3,384 B dispatch path .text delta: -343 B (-9.2%) Per-shape stencils are ~1,600 B (.text body) + ~1,100 B (pclntab) = ~2,700 B each on production tailscaled. The shape count matches before/after (two distinct GC shapes for the Subscriber[T] event types in this binary). What changes is that the per-T thin wrappers are eliminated because Subscriber[T] no longer implements the subscriber interface directly. Whole-binary section deltas: .text: -2,304 B (includes the dispatch savings plus other small downstream effects) .rodata: +512 B (additional closure-type metadata) .gopclntab: -2,981 B (fewer per-T compiled functions => less metadata) Stripped tailscaled (linux/amd64): no change at the file level (the savings fall below the linker's section-alignment boundary). Unstripped builds shrink by ~2,900 B. Behavior is unchanged: BenchmarkBasicThroughput: 2,161 ns/op, 0 B/op, 0 allocs/op BenchmarkBasicFuncThroughput: 2,493 ns/op, 144 B/op, 2 allocs/op BenchmarkSubsThroughput: 3,727 ns/op, 0 B/op, 0 allocs/op Updates #12614 Change-Id: I97918ec68bd2cdb15958bbfd7687592b39663efe Signed-off-by: James Tucker <james@tailscale.com>	2026-05-13 17:36:30 -07:00
Brad Fitzpatrick	dc323b1351	derp/derpserver: collapse clients and clientsAtomic into one hashtriemap Server.clientsAtomic was introduced in `6b729795c3` as a lock-free mirror of Server.clients to skip Server.mu on the packet send hot path. This drops the non-concurrent map and makes all the existing callers of the old plain map just use the concurrent map, but still holding Server.mu. BenchmarkLookupDestHashTrie is unchanged at ~2ns/op. Fixes #19726 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I0894e4d86914d152b9b5fef969a3184bcb96f678	2026-05-13 16:57:26 -07:00
Nick Khyl	4d68493144	health: avoid publishing health.Change when warnable visibility remains unchanged Warnables with a non-zero TimeToVisible are only published on the eventbus when they remain unhealthy long enough to become visible. However, we still publish a health.Change when a warning that was never visible (and was never published to the eventbus) becomes healthy. This PR fixes that and reduces churn when there is no actual state change. In particular, it avoids unnecessary IPN bus notifications sent to GUI/CLI clients, captive portal detection, etc. Updates tailscale/corp#39759 (noticed while working on it) Signed-off-by: Nick Khyl <nickk@tailscale.com>	2026-05-13 17:02:35 -05:00
Adriano Sela Aviles	41286c2b56	ipn/ipnlocal,tsd: add NoiseRoundTripper to tsd.Sys Adds a new NoiseRoundTripper field to tsd.Sys to expose an http.RoundTripper to make requests over the control plane Noise connection. This will be used in PAM use cases soon. Updates tailscale/corp#41800 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>	2026-05-13 14:56:28 -07:00
Nick Khyl	32f984f54c	net/dns: create a new hosts file if it doesn't exist on Windows A missing hosts file is not a fatal error. We should log it, but still proceed and create a new one instead of failing the DNS reconfiguration completely. Fixes #19733 Signed-off-by: Nick Khyl <nickk@tailscale.com>	2026-05-13 16:10:36 -05:00
Claus Lensbøl	bb47ea2c6b	tstest/natlab/vmtest: start migrating old natlab tests to vmtest (#19727 ) Instead of having two entry points for running natlab tests, start converting the connectivity tests to use the vmtest framework. Grid and pair tests have yet to be moved over. Updates #13038 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-05-13 16:44:53 -04:00
Fran Bull	3a6261b79b	feature/conn25: keep addrAssignments through pool reconfig Fixes tailscale/corp#40250 Signed-off-by: Fran Bull <fran@tailscale.com>	2026-05-13 11:00:47 -07:00
Simon Law	e4e59a2af0	wgengine/netstack: stop inject goroutine from leaking in Impl.Start (#19721 ) This patch fixes a data race in wgengine/netstack that surfaced while running both TestTCPForwardLimits and TestTCPForwardLimits_PerClient. Because these two tests both setup the TS_DEBUG_NETSTACK envknob, a race happens because netstack.Impl.Close leaked its inject goroutine. The inject goroutine also reads the TS_DEBUG_NETSTACK envknob, so if it is still running when the next test starts, then it will break. This patch also cleans up the tests a bit, ensuring that neither of them run in T.Parallel. It also adds a T.Cleanup call to clear the envknob. Fixes #19720 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-13 08:13:40 -07:00
Simon Law	6467f0d067	ipn/ipnlocal: fix minor typo in shouldUseOneCGNATRoute (#19719 ) This fixes a log message where ipn/ipnlocal.shouldUseOneCGNATRoute would claim that an android machines was actually macOS. Updates #cleanup Updates #19652 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-12 21:55:29 -07:00
Brad Fitzpatrick	6b729795c3	derp/derpserver: use hashtriemap for peer lookup Replace the process-global Server.mu lookup in the packet send hot path with a global hashtriemap mirror of local clientSet entries. The authoritative clients map remains guarded by Server.mu; clientsAtomic is only a lock-free fast path for active local clients. Misses, stale inactive client sets, duplicate accounting, and mesh forwarding still fall back to lookupDestUncached. This avoids taking Server.mu for the common local active-client send path, at the cost of adding one global concurrent map that mirrors Server.clients for local peers. The benchmark uses four destination peers. The before run sets TS_DEBUG_DERP_DISABLE_PEER_HASHTRIE=true to force the old mutex lookup path; the after run uses the hashtrie fast path. goos: linux goarch: amd64 pkg: tailscale.com/derp/derpserver cpu: Intel(R) Xeon(R) 6975P-C │ before │ after │ │ sec/op │ sec/op vs base │ LookupDestHashTrie-16 176.050n ± 1% 1.904n ± 6% -98.92% (p=0.000 n=10) │ before │ after │ │ B/op │ B/op vs base │ LookupDestHashTrie-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ before │ after │ │ allocs/op │ allocs/op vs base │ LookupDestHashTrie-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal Updates #3560 (very indirectly, historically) Updates #19713 (as an alternative to that PR) Change-Id: Ifb72e5c9854ad00e938cd24c6ab9c27312f297e8 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-12 16:08:16 -07:00
Adriano Sela Aviles	72578de033	ipn/{ipnlocal,localapi},client/local: add per-dst cap resolution for services Adds two new cap resolution methods alongside the existing PeerCaps: PeerCapsForService(src netip.Addr, svcName tailcfg.ServiceName) resolves the service name to its VIP addresses via the node's service IP mappings and returns caps scoped to that service. Exposed on /v0/whois via the svc_name query parameter and on client/local.Client as WhoIsForService. PeerCapsForIP(src, dst netip.Addr) resolves caps against an arbitrary destination IP. Exposed on /v0/whois via the svc_addr query parameter and on client/local.Client as WhoIsForIP. svc_name takes priority over svc_addr when both are present. Invalid values for either return 400. The existing PeerCaps/WhoIs path is unchanged: without a service parameter, WhoIs returns only host-level caps. Updates tailscale/corp#41632 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>	2026-05-12 15:50:39 -07:00
DeedleFake	ad8ead9c94	cmd/tailscale/cli: add RunWithContext Fixes #12778 Change-Id: If9f8b299cef0cb68f93b344845b5c6a5b7554d2c Signed-off-by: DeedleFake <deedlefake@users.noreply.github.com>	2026-05-12 12:27:55 -07:00
M. J. Fromberger	9f48567bf1	ipn/ipnlocal,wgengine/magicsock: add basic counters for cached peer connectivity (#19699 ) Add new clientmetric counters for establishing contact with peers while using cached network map data. To do this, instrument the magicsock.Conn with a bit to indicate whether its peer data came from a cached netmap. If so, there are two conditions we will count as establishing connectivity to a peer: - Receipt of a CallMeMaybe from a peer via disco. - Establishing a valid endpoint address for a peer. In vmtest, add Env.ClientMetrics to scrape metrics from the specified node. Use this to check that counters were updated in caching tests. Updates https://github.com/tailscale/projects/issues/13 Updates #12639 Change-Id: Ie8cf3244ac8af4f5bcfe4d0d944078da2ba08990 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-05-12 12:01:05 -07:00
James Tucker	120bfcf1cc	util/eventbus: extract non-generic SubscriberFunc constructor body and cache type name Two changes that share the same intent of reducing per-T duplication in code that doesn't actually depend on T: 1. Hoist the non-generic portion of newSubscriberFunc[T] into a newSubscriberFuncCore() helper. The hoisted work is the time timer setup, the subscriberFuncCore allocation, and the unregister closure (which captures only the non-generic reflect.Type and subscribeState). The generic body now does only the two T-bound things it has to: compute reflect.TypeFor[T] and create the dispatch closure. Effect on the per-shape-stencil body of newSubscriberFunc[T]: before: 523 B per shape (in synthetic test) after: 293 B per shape (-230 B per shape; -56% on this body) 2. Cache reflect.Type.String() once at construction (in core.typeName) instead of recomputing it every time the dispatch closure runs. The dispatch closure also now takes the subscriberFuncCore directly rather than building an intermediate dispatchFuncState struct on every call. Effect on the dispatch closure body (newSubscriberFunc[T].func1): before: 581 B per shape after: 480 B per shape (-101 B per shape; -17%) Combined effect on tailscaled (linux/amd64): named-symbol savings via symcost: ~7 KB stripped binary delta: -8 KB (page-quantized) arm64 binary delta: 0 (page-quantized) cumulative reduction from baseline (5167ff412): linux/amd64: -110,592 bytes (-0.391%) linux/arm64: -131,072 bytes (-0.499%) Throughput is also improved by the typeName cache: BenchmarkBasic goes from 2018 ns/op to 1864 ns/op (-7.6%) because the dispatch hot path no longer allocates a string on every event. Updates #12614 Change-Id: Ib3a3d6796785e16506330ec034e1144580d467a3 Signed-off-by: James Tucker <james@tailscale.com>	2026-05-12 11:16:04 -07:00
Brad Fitzpatrick	758ebe9839	tstest/natlab/vmtest: use short paths for Unix sockets macOS limits Unix socket paths to 104 bytes. The Go test TempDir path (e.g. /var/folders/.../TestDirectConnection...679197086/001/) easily exceeds that, causing "bind: invalid argument". Create a short /tmp/vmtest* directory for all socket files (vnet, QMP, dgram) so the paths stay well under the limit on every platform. Updates #13038 Change-Id: I721d24561d1766aaa964692bc77f40a131aa9455 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-11 21:54:27 -07:00
Brad Fitzpatrick	f4c5613156	tstest/natlab/vmtest: don't require KVM; use TCG on macOS startCloudQEMU hardcoded -machine q35,accel=kvm and -cpu host, which fails on any host without KVM (notably macOS). Replace with a qemuAccelArgs helper that probes /dev/kvm and falls back to QEMU's TCG software emulation, matching the pattern already used by tstest/integration/nat. Also wire the helper into startGokrazyQEMU so gokrazy VMs pick up KVM when available. Updates #13038 Change-Id: I7745518db823279b1880957bb14ca2ffdaab4c50 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-11 19:18:17 -07:00

1 2 3 4 5 ...

10659 Commits