tailscale

mirror of https://github.com/tailscale/tailscale.git synced 2026-05-30 03:25:06 -04:00

Author	SHA1	Message	Date
Simon Law	47e86ff762	fixup! client/local,ipn/localapi: add /localapi/v0/routecheck endpoint Finish splitting off Client.RouteCheckProbe from Client.RouteCheck. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-28 14:33:57 -07:00
Simon Law	23ffe88340	fixup! client/local,ipn/localapi: add /localapi/v0/routecheck endpoint @illotum, drop localapi/routecheck_disabled.go because these methods don’t need to be there if the client is built with ts_omit_routecheck. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-28 14:03:28 -07:00
Simon Law	69e56f93a6	fixup! client/local,ipn/localapi: add /localapi/v0/routecheck endpoint @illotum points out that we should be using jsonv2. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-28 14:03:26 -07:00
Simon Law	1d0adfd630	fixup! client/local,ipn/localapi: add /localapi/v0/routecheck endpoint Address @bradfitz’s concerns about routecheck.NodeSet. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-28 11:35:16 -07:00
Simon Law	97680a49e6	fixup! client/local,ipn/localapi: add /localapi/v0/routecheck endpoint @bradfitz pointed out that I was holding localapi.Register upside down. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-28 11:35:16 -07:00
Simon Law	15ebc87e86	fixup! client/local,ipn/localapi: add /localapi/v0/routecheck endpoint Address @cmol’s review Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-28 11:35:15 -07:00
Simon Law	9c8a315a70	client/local,ipn/localapi: add /localapi/v0/routecheck endpoint In order to support a `tailscale routecheck` command, we introduce the `/localapi/v0/routecheck` endpoint to the local API. This endpoint returns the most recent report collected by the routecheck client. If `force=true` is an argument in the query string, then this endpoint will actively probe before returning the report. Updates #17366 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-28 11:35:15 -07:00
Simon Law	8a624a7b22	fixup! net/routecheck: introduce new package for checking peer reachability @amalscale: close using a nil channel pointer. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-28 11:34:48 -07:00
Simon Law	b5f10dda20	fixup! net/routecheck: introduce new package for checking peer reachability @amalscale: clarified routecheck.RoutersByPrefix documentation. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-28 11:21:02 -07:00
Simon Law	4d980a77c5	fixup! net/routecheck: introduce new package for checking peer reachability @illotum points out that there was a race between atomic.Pointer.Load and Store. Of course, we should have used Swap. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-27 22:05:05 -07:00
Simon Law	2de851c428	fixup! net/routecheck: introduce new package for checking peer reachability @amalscale points out that we already have pingTimeout in magicsock, so I’ve extracted the default out as a tsconst. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-27 15:37:11 -07:00
Simon Law	de32bf52d6	fixup! net/routecheck: introduce new package for checking peer reachability @bradfitz suggests signalling using channels is better than sync.Cond, because mutexes and channels don’t play nicely. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-27 15:37:10 -07:00
Simon Law	dbf1bd88c0	fixup! net/routecheck: introduce new package for checking peer reachability Address @illotum’s observation that we’re “probing” WireGuard-only nodes in the wrong place. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-27 15:37:08 -07:00
Simon Law	64aec43880	fixup! net/routecheck: introduce new package for checking peer reachability @illotum made me realize that routecheck.supportsIPVersions could be simpler. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 23:39:51 -07:00
Simon Law	03c498f05d	fixup! net/routecheck: introduce new package for checking peer reachability Minor fixes for comments left by @amalscale, @cmol, and @illotum. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 23:39:47 -07:00
Simon Law	7bde120d12	fixup! net/routecheck: introduce new package for checking peer reachability Fix deadlock in Client.waitForNetMap. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 22:05:49 -07:00
Simon Law	3630c57e96	fixup! net/routecheck: introduce new package for checking peer reachability Replace tc (test case), with the recommended tt (table test). Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 20:56:47 -07:00
Simon Law	9bb543f004	fixup! net/routecheck: introduce new package for checking peer reachability Move ipn/routecheck to net/routecheck Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 19:14:08 -07:00
Simon Law	314cc9a595	fixup! net/routecheck: introduce new package for checking peer reachability Provide more details in the doc comments, add TODOs, and fix an error message. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 19:14:08 -07:00
Simon Law	e5641ff585	fixup! net/routecheck: introduce new package for checking peer reachability Streamline the handling of ping responses and errors in Client.probe. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 19:14:08 -07:00
Simon Law	ddfca38c94	fixup! net/routecheck: introduce new package for checking peer reachability routecheck.Extension.onNetMapToggle doesn’t need to query for a netmap when it has just been passed a fresh one. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 19:14:07 -07:00
Simon Law	e51b352368	fixup! net/routecheck: introduce new package for checking peer reachability Clean up in routecheck.Extension.Shutdown and routecheck.Client.Close. Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 19:14:07 -07:00
Simon Law	881ea46bf4	net/routecheck: introduce new package for checking peer reachability The routecheck package parallels the netcheck package, where the former checks routes and routers while the latter checks networks. Like netcheck, it compiles reports for other systems to consume. Historically, the client has never known whether a peer is actually reachable. Most of the time this doesn’t matter, since the client will want to establish a WireGuard tunnel to any given destination. However, if the client needs to choose between two or more nodes, then it should try to choose a node that it can reach. Suggested exit nodes are one such example, where the client filters out any nodes that aren’t connected to the control plane. Sometimes an exit node will get disconnected from the control plane: when the network between the two is unreliable or when the exit node is too busy to keep its control connection alive. In these cases, Control disables the Node.Online flag for the exit node and broadcasts this across the tailnet. Arguably, the client should never have relied on this flag, since it only makes sense in the admin console. This patch implements an initial routecheck client that can probe every node that your client knows about. You should not ping scan your visible tailnet, this method is for debugging only. This patch also introduces a new OnNetMapToggle hook, which fires when the netmap transitions from nil to non-nil, or vice versa. This happens either when the client receives its first MapResponse after connecting to the control plane, or when it clears the netmap while it is disconnecting. Routecheck uses this to wait for a valid netmap so it knows which peers to probe. Updates #17366 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 19:14:07 -07:00
Simon Law	988615dbad	ipn/ipnlocal,tstest/integration: pause the control client consistently (#19846 ) There are two places where tailscaled transitions into a paused state: 1. tailscaled’s controlclient is initially created, 2. tailscale down, or the GUI equivalent, commands it to. This patch unifies the implementation of both scenarios into LocalBackend.shouldPauseControlClientLocked to prevent the implementation from drifting. The flaky tstest/integration.TestNoControlConnWhenDown test exposed this mismatch, but only by accident. This patch also changes TestNode.MustDown so that it runs `tailscale down` and then waits for the testcontrol server to finish handling any associated /machine/map requests. Fixes #19831 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 17:58:44 -07:00
Adrian Dewhurst	5d8f401956	net/dns: fix handling non-IP single split DNS Fixes #19834 Change-Id: I4d48efed00cd080b14c6fd713ff21e53a5a6ee3c Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>	2026-05-22 20:45:58 -04:00
Brad Fitzpatrick	5295e3e119	ipn/{ipnstate,ipnlocal}: add integer NodeID to PeerStatus In `aa5da2e5f2` we made the IPN bus include deltas, including the PeersRemoved, sending a slice of integer NodeIDs that were removed. But when updating xcode, I realized there was no way to map those integers to the stable node IDs used in other places. I was consdering changing the just-added ipn.Notify.PeersRemoved from an IntID to a string StableID, but then it doesn't match the MapResponse wire protocol, which we've tried to match so far. Instead, just add the integer ID as well. Callers can use whichever world they want, having both. It's a little regrettable that we still have two worlds of IDs, but oh well. Neither is really suitable to a hypothetical future fully federated world of control servers anyway, so we'll need a third type later anyway, so just live with the two we have for now. Updates #12542 Change-Id: Ib8fd48a265e1da1f8779152f141f624a7f7260e9 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-22 08:16:55 -07:00
Amal Bansode	e32b9bde1d	control/controlclient: fix deadlock in map session change queue processing (#19828 ) Holding an exclusive lock while writing to the unbuffered changequeue chan is likely going to deadlock when the run() path may try to grab the same lock before reading from the chan to drain it (on map session close). This causes the client to stop processing new map responses and TSMP disco key advertisements. There is a good probability of inducing this deadlock using the old code and new test added in this commit: TestUpdateDiscoForNodeCallback/test_deadlock. Also fix an unintentional regression in how the client responds to a mapResponse sleep command. `85bb5f84a5` moved the processing of mapResponses into a new goroutine, serialized via mapSession's changequeue. Thus, controlclient stopped sleeping in the same goroutine servicing mapResponses/control connections. This commit brings us back to sleeping synchronously in the same goroutine as controlclient. Updates #12639 Signed-off-by: Amal Bansode <amal@tailscale.com> Signed-off-by: Claus Lensbøl <claus@tailscale.com> Co-authored-by: Claus Lensbøl <claus@tailscale.com>	2026-05-22 07:13:18 -07:00
Simon Law	fd2405ca8f	tstest/integration: mark TestNoControlConnWhenDown as a flaky test (#19832 ) Updates #19831 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-21 17:36:09 -07:00
Simon Law	7dabebc691	net/traffic: switch rendezvous hashing from SHA256 to FNV-1a (#19821 ) In PR tailscale/corp#30448, we originally decided to break ties using SHA256 for our rendezvous hashing algorithm. Now that we’ve had some experience with it, we think that FNV-1a is a better choice. It distributes bits evenly, it’s much faster, and it doesn’t need to be cryptographically secure. The FNV designers recommend FNV-1a over the deprecated FNV-1. This PR makes the switch and updates the related tests, since changing the algorithm changes which stable pick gets selected. As of 2026-05, this is the best time to make this change, since there are almost no clients in the wild with traffic steering enabled. Updates #17366 Updates tailscale/corp#29964 Updates tailscale/corp#29966 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-21 10:11:59 -07:00
Brad Fitzpatrick	aa5da2e5f2	ipn/ipnlocal, control/controlclient: process node adds/removes in constant time For large tailnets (~50k+ nodes) with frequent peer churn (ephemeral GitHub Actions workers etc.), tailscaled used to rebuild the full netmap and fan it out on the IPN bus on every MapResponse that added or removed a peer. There were two O(N) costs per delta: the full netmap rebuild + every Notify.NetMap encode to every bus watcher. This change tackles both: 1. Plumb O(1) peer add/remove through the delta path. PeersChanged and PeersRemoved no longer prevent the delta happy path; instead, they mutate the per-node-backend peer map in place. 2. Restrict ipn.Notify.NetMap emission to the platforms whose host GUIs still depend on it (Windows, macOS, iOS) and migrate in-tree consumers off it everywhere else: - Migrate reactive consumers (containerboot, kube agents, sniproxy, tsconsensus, etc.) off Notify.NetMap to the previously-added Notify.SelfChange signal so they no longer have to subscribe to the full netmap. - Add ipn.NotifyNoNetMap so GUI clients on "legacy-emit" platforms that have already migrated can opt out of the per-watcher NetMap encode. - Gate Notify.NetMap emission on the producer side by a compile- time GOOS check, so the supporting code is dead-code-eliminated on Linux and other geese where no GUI consumer needs it. Re-running BenchmarkGiantTailnet from tstest/largetailnet, which was added along with baseline numbers on unmodified main in `ad5436af0d`, the per-delta cost (one peer add+remove pair) is now ~O(1) regardless of tailnet size N: N no-watcher (ms/op) bus-watcher (ms/op) before now factor before now factor 10000 32 0.11 300x 166 0.13 1300x 50000 222 0.11 2000x 865 0.13 6700x 100000 504 0.12 4100x 1765 0.13 13400x 250000 1551 0.12 12500x 4696 0.15 32400x Updates #12542 Change-Id: I94e34b37331d1a8ec74c299deffadf4d061fda9e Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-21 09:26:19 -07:00
Brad Fitzpatrick	2703f91174	wgengine/magicsock: fix data race in TestSetDERPMapDoReStun SetDERPMap spawns a goroutine that calls ReSTUN, which logs via the test logger. If the test returns before that goroutine logs, the goroutine races with testing cleanup. Use tstest.WhileTestRunningLogger so the goroutine's logf call becomes a no-op once the test finishes. Fixes #19829 Change-Id: I1097f98e40ffd1c5dd7fb7a715c918255853e3c6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-21 08:51:50 -07:00
Simon Law	7ebca58042	net/traffic,ipn/ipnlocal: extract traffic steering utilities (#19682 ) The traffic package contains helpers for evaluating traffic steering scores and picking appropriate nodes. These were extracted from ipnlocal.suggestExitNodeUsingTrafficSteering so they can be reused by the new routecheck package to probe exit nodes in priority order. Updates #17366 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-21 08:28:27 -07:00
Fran Bull	dbe92f98b5	feature/conn25: set assignment expiry based on dns response TTL Updates tailscale/corp#39975 Signed-off-by: Fran Bull <fran@tailscale.com>	2026-05-21 07:25:29 -07:00
Brad Fitzpatrick	f3a117e813	net/tsdial: run happy eyeballs across A and AAAA in UserDial When tailscaled is running in userspace-networking mode behind an exit node (e.g. as a SOCKS5 proxy), it resolves a hostname and then dials a single resolved IP through the tunnel. If the name has both A and AAAA, Go's net.Resolver merges them and we pick ips[0], which on an IPv6-native host is usually AAAA. If the exit node has no IPv6 egress (or vice versa), the dial fails silently through the tunnel and the user sees a hang. Resolve all candidates and race connect attempts across address families with a 300ms happy-eyeballs delay, matching Go's net.Dialer default and the existing pattern in net/dnscache (commit `ee0a03b14`). First success wins; losers are cancelled and any conns they produce are closed. A failBoost channel wakes the launcher when a connect fails fast (e.g. ICMP "no route" via the tunnel) so we don't sit on the 300ms timer when the answer is already known. userDialResolve is refactored into userDialResolveAll (returns the full candidate list) plus a thin single-IP wrapper for callers like UserDialPlan that don't race. UserDial's per-IP dispatch (netstack vs peer dialer vs SystemDial vs std) is extracted to dialOneUser so each candidate can route correctly on its own merits. Also fix serveDial in localapi to pass the original hostname to UserDial rather than a pre-resolved IP, so the race can fire. This fix is single-ended: it works against any exit node, including old ones, with no protocol changes. The trade-off versus filtering on the exit-node side via PeerAPI DoH is that every dial through an unreachable-family exit node costs one failed connect attempt per cache window, rather than zero, which is acceptable given the simplicity. Fixes #19792 Fixes #13257 Change-Id: I9d7645d0034caf3ee22ecdd8070798353f77e94b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-20 18:35:55 -07:00
James Tucker	36c52ef383	tstest/integration/testcontrol: fix serveMap read-modify-write race serveMap cloned s.nodes[nk], mutated the clone outside the mutex, then wrote it back via updateNodeLocked. A concurrent UpdateNode, SetNodeCapMap, or other writer landing between the clone and the writeback would be silently clobbered. Mutate the live node under the mutex instead. Surfaces in tsnet's TestListenService as a flaky ErrUntaggedServiceHost panic: the test calls control.UpdateNode to attach a tag, a concurrent updateRoutine map request from the host races, and the host's next netmap arrives with Tags=[]. Updates #19822 Change-Id: I6c5ebd5e5bf79a40316f53f627157230773cb469 Signed-off-by: James Tucker <james@tailscale.com>	2026-05-20 18:29:58 -07:00
Aria Stewart	61277e3ad4	Construct IPv6 ingress URLs correctly Fixes #19338 Signed-off-by: Aria Stewart <aredridel@dinhe.net>	2026-05-20 17:21:35 -07:00
M. J. Fromberger	c09407002f	ipn/ipnlocal/netmapcache: add UpdateSelfOnly method (#19818 ) Some netmap updates are guaranteed to affect only the "static" parts of the netmap, and so should not require us to walk through all the peers and user profiles when updating the cache. To support this, the new UpdateSelfOnly method updates only the Self node and other tailnet settings that are not dependent on the peers and profiles. Use this when updating the cache on DERP home changes. Updates #12542 Change-Id: Ifed522b29d579fb76e010b4ff738cc4e0a72d27f Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-05-20 16:29:04 -07:00
Simon Law	93dbd33ef7	ipn/ipnlocal: stub system interfaces for TestShouldUseOneCGNATRoute (#19807 ) The TestShouldUseOneCGNATRoute test fails when the underlying system interfaces don’t match what the underlying assumptions of the test. That assumption was that there would only ever be one CGNAT interface: the Tailscale one. This breaks on Linux when border0 is installed because border0 also creates an interface with a CGNAT route. This patch stubs netmon.RegisterInterfaceGetter to replace the system interfaces and netmon.SetTailscaleInterfaceProps to identify the test data that defines the Tailscale interface. This patch also tests the control knob override for CGNAT for every combination of operating system and system interfaces, instead of just a couple of combinations. Fixes #19731 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-20 16:00:14 -07:00
Brad Fitzpatrick	04ae61fe4b	tstest/integration/jswasmtest: add headless-Chromium tests for @tailscale/connect Add Go tests that drive a real headless Chromium (via chromedp) against the built cmd/tsconnect/pkg/ artifact and verify the @tailscale/connect public API surface end-to-end. The package has not been republished in three years, in part because no test exercises the produced artifact at runtime — only tsc --noEmit and a Go build run in CI. TestCreateIPN loads pkg.js into the browser, calls createIPN with a junk auth key, and asserts that pkg.createIPN / pkg.runSSHSession are functions and that createIPN() returns an IPN with the documented run/login/logout/ssh/fetch methods. No control-plane traffic. TestFetchTailnetPeer stands up a full local tailnet (testcontrol + DERP + a tsnet.Server peer) and verifies that the browser-side WASM client can join over WebSocket-noise to the same control, connect to DERP over WSS, and then ipn.fetch() an HTTP service hosted on the tsnet peer through the tailnet. The test asserts the response body matches a known string. Browser state transitions are logged: NoState -> NeedsLogin -> Starting -> Running. Tests are opt-in via --run-headless-browser-tests (matching the existing --run-vm-tests pattern in tstest/natlab/vmtest) so they never fire in casual `go test ./...` runs. When the flag is set, a test is skipped if cmd/tsconnect/pkg/ has not been built, and fails with t.Error if no chromium binary is found on $PATH (honoring $CHROME_BIN as an override). findChromium also falls back to /Applications/Google Chrome.app and /Applications/Chromium.app on darwin, since macOS Chrome's executable lives inside an .app bundle and is not on $PATH by default. The .github/workflows/test.yml wasm job is extended to install google-chrome-stable and run the tests with the flag after build-pkg. To prevent silently testing a stale pkg/main.wasm (built from an older checkout than the rest of the test invocation), build-pkg now writes pkg/build-info.json recording the sha256 of the raw (pre-wasm-opt) go-build output. The test does its own `go build` of cmd/tsconnect/wasm with the same -tags/-trimpath/-ldflags (factored into a new cmd/tsconnect/wasmbuild package shared by both call sites) and t.Fatalfs with a "rebuild" instruction on mismatch. Cost is near-zero because the Go build cache from the prior build-pkg makes the rebuild a cache hit. The new wasmbuild package also replaces cmd/tsconnect's hardcoded -tags string with a minimal-feature-set computation. wasmbuild.Keep names the small set of feature/featuretags entries the browser client actually needs (netstack, logtail, dns, health, c2n, ipnbus); wasmbuild.Tags() emits a ts_omit_<f> for every other omittable feature in feature/featuretags.Features, with transitive deps expanded via featuretags.Requires. An init() panics if Keep references a feature unknown to feature/featuretags so a rename there fails loudly. Net effect on size: 32M raw / 9.4M brotli before this change, 25M raw / 4.4M brotli after — vs the last-published 1.39.98 at 21M / 3.8M. The transitive package-import graph is unchanged (176 tailscale.com/* packages either way): featuretags omits eliminate dead code via `const HasX = false`, not imports. Trimming the import graph would require a separate, larger refactor splitting interface packages by build tag. Writing TestFetchTailnetPeer surfaced several real issues, all fixed here: * cmd/tsconnect built the wasm with the nethttpomithttp2 tag, but control/ts2021 (since commit `1d93bdce2`, "control/controlclient: remove x/net/http2, use net/http", Oct 2025) requires HTTP/2 from net/http's bundled implementation. With nethttpomithttp2 set, the bundle is excluded and the wasm client cannot speak HTTP/2 to any control plane, including production. Drop the tag. Wasm size grows ~1 MB raw / ~300 KB brotli (more than offset by the feature pruning above). The last published @tailscale/connect (1.39.98, early 2023) pre-dates the regression, which is why no consumer has reported the breakage. * tstest/integration/testcontrol.Server's /ts2021 noise upgrade endpoint rejected anything but POST. WebSocket clients (the only transport available to browser-WASM) come in as GET. Allow both; the controlhttp AcceptHTTP path dispatches on the Upgrade header, so the websocket library still enforces GET for WS upgrades. This matches production, where the same controlhttpserver.AcceptHTTP routes purely on the Upgrade header without checking method. * derp/derphttp's urlString built the DERP URL from node.HostName only, dropping node.DERPPort. Non-WS clients use a separate code path (connectToHost) that honors DERPPort, but WebSocket-only clients (browser-WASM) went through urlString and so could not reach a DERP running on any port other than 443. Include the port when it differs from the scheme default. Also move addWebSocketSupport from cmd/derper (where it was main-only) to derp/derpserver.AddWebSocketSupport so tstest/integration.RunDERPAndSTUN can wrap its DERP handler with WebSocket support — without that, the test DERP would not accept the browser's wss connection. Fixes #9394 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: Iff9cdee303e3b239924249b5bffb2fd04e02f391	2026-05-20 10:48:29 -07:00
Brad Fitzpatrick	95d874e9b4	cmd/testwrapper: surface race reports and skip retries when detected A data race in a package matters more than any individual test result. Two related problems: 1. Where go test's race detector text ("WARNING: DATA RACE" plus the goroutine stack traces) lands in JSON output is timing- dependent: it can be attributed to a test that ends up reporting PASS (e.g. when the racing goroutines outlive the test that spawned them and TSan prints during a different test's window). testwrapper's main loop only flushes the logs of failed tests, so the race report ends up stuck in a passing test's buffer and is silently dropped. The race builders just see a bare "FAIL\nFAIL\tpkg\ttime". 2. If the failing test in such a package happens to be marked flaky, testwrapper retries it. That is the worst possible response to a race: the flaky test might not even be the racy code, and a second run without the racy goroutines could "succeed" while hiding the real bug. Address both: scan every output line for the race detector's first- line marker. Track whether the package observed a race at all, on the pkgFinished testAttempt. When a race was seen, fold every per- test log buffer into the package-level logs (so the full report surfaces from the existing pkg-fail flush path), and drop any flaky-test retry plans for that package so we fail immediately instead of running another attempt. Two new tests: - TestRaceSuppressesFlakyRetry verifies that a flaky test alongside a racy test does NOT get retried. - TestRaceAttributedToPassingTest verifies that a race attributed by test2json to a passing test still surfaces in the output. Also add a corpus of captured raw test binary outputs under cmd/testwrapper/testdata/, with one subdirectory per scenario, documenting the six representative shapes that go test -race can emit (race in test body, race in goroutines that outlive a test, race forced into a later test, race in TestMain post-m.Run, and a parallel-tests split-attribution case via a "=== NAME" redirect line). See its README.md for details. Fixes #19603 Change-Id: Ifbfcd67fb3b1882c4907bd9cb2d68a8b5a91dd54 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-19 21:21:05 -07:00
Claus Lensbøl	ee0a03b140	net/dnscache: run happy eyeballs with more than one dest IP (#19770 ) If the context given to DialContext has a shorter lifetime than the OS TCP SYN timeout, and TCP SYNs are dropped from the path to the remote, DialContext would never fall back to try IPv6 after IPv4. Instead, use the normal happy eyeballs race if there is more than one address. This does remove the implicit prioritization of IPv4 over IPv6 in cases where there is only a single IPv4 remote address. Updates #13346 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-05-19 12:59:11 -04:00
Naman Sood	5d56cc8512	util/linuxfw: return error instead of nil pointer dereference Issue #19737 ran into a nil pointer dereference, the cause of which was fixed by #19761. If we end up on this code path with a nil table again, we should bubble that up as an error (which is logged by the health warning system) rather than failing catastrophically. Signed-off-by: Naman Sood <mail@nsood.in>	2026-05-19 10:01:07 -04:00
Brad Fitzpatrick	2b338dd6a8	wgengine, cmd/tailscaled, control/controlclient: remove Engine watchdog The Engine watchdog wrapped every wgengine.Engine method call in a goroutine with a 45s timeout and crashed the process on timeout. It was added years ago to surface deadlocks during development, but the underlying deadlocks have long since been fixed, and even when it did fire it produced obscure stack traces (from inside the watchdog goroutine, not the original caller) without buying much. Audit of userspaceEngine's methods shows none have cyclic locking or unbounded blocking now that ResetAndStop no longer loops waiting for DERPs to drain (`fa49009ee`). The watchdog is dead weight; remove it along with the TS_DEBUG_DISABLE_WATCHDOG escape hatch. Updates #19759 Change-Id: Iba9d718fe1f8718a6631296e336b138c31b99ff1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-15 16:49:28 -07:00
Simon Law	5d1bf80597	feature/routecheck: add ts_omit_routecheck feature flag (#19638 ) RouteCheck, which checks that overlapping routers are reachable, is enabled by default for both tailscaled and tsnet. Updates #17366 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-15 15:50:50 -07:00
Noel O'Brien	894ff5d8ee	cmd/hello: split css and js into separate files (#19771 ) Move the inline CSS and JS into separate files to be more friendly to Content Security Policies. ServeHTTP is updated to serve these assets from the '/static/' path. Updates tailscale/corp#32398 Signed-off-by: Noel O'Brien <noel@tailscale.com>	2026-05-15 09:37:22 -07:00
Alex Chan	0cb432ed84	all: update more references to Tailnet/Network Lock Updates tailscale/corp#37904 Change-Id: I09e73b3248b9ddf86dafe33dfb621bd560f6596d Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-15 16:23:50 +01:00
Fernando Serboncini	c355618e73	wgengine/router/osrouter: skip netfilter add-ons when chain setup fails (#19757 ) linuxRouter has two blocks (connmark rules and the CGNAT drop rule) that gate on cfg.NetfilterMode, the requested config state. This may cause an error when setNetfilterModeLocked fails, since it may keep assuming this config is valid. We now gate both blocks on r.netfilterMode, matching the pattern used by SNAT, stateful, and loopback paths. Fixes #19737 Change-Id: Ia6003a082db99c376e662132d725661afbac0ee9 Signed-off-by: Fernando Serboncini <fserb@tailscale.com>	2026-05-15 09:32:30 -04:00
License Updater	1d3562b314	licenses: update license notices Signed-off-by: License Updater <noreply+license-updater@tailscale.com>	2026-05-14 21:04:41 -07:00
Brad Fitzpatrick	ef1bb5ac16	util/cibuild, cache_key_test: skip TestTsgoRevInCacheKey outside Tailscale CI cibuild.On() returns true for any CI environment that sets CI=true, including Alpine Linux's package build CI. TestTsgoRevInCacheKey was guarded by cibuild.On() (or use of tsgo), so it ran under Alpine's CI with stock Go, where go.toolchain.rev isn't blended into build cache keys, and unsurprisingly failed. Add cibuild.OnTailscaleCI, which keys off GITHUB_REPOSITORY_OWNER to distinguish tailscale/tailscale's own GitHub Actions CI from arbitrary downstream CI, and use it in TestTsgoRevInCacheKey. Fixes #19754 Change-Id: Id31cfe71903a235f1460dca1e2fdf334e3ba1ee5 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-14 15:55:05 -07:00
Brad Fitzpatrick	fa49009eee	wgengine: simplify ResetAndStop, drop drain loop Since `f343b496c3` ("wgengine, all: remove LazyWG, use wireguard-go callback API for on-demand peers"), Reconfig is fully synchronous: magicConn.UpdatePeers, wgdev.RemovePeer, router.Set, and dns.Set all return when the work is done, and the peer list is updated under wgLock before Reconfig returns. So after Reconfig with empty configs, len(st.Peers) is already 0. The old loop also waited for st.DERPs to drain to 0, but UpdatePeers only edits maps; active DERP connections idle out on their own timeout. The sole caller (LocalBackend.stopEngineAndWait) doesn't inspect st.DERPs anyway; it just hands the Status to setWgengineStatusLocked. So the drain-wait was for nothing observable and could theoretically (or at least appear to readers to) loop forever holding b.mu. Remove that reader confusion by removing the backoff loop entirely. Updates #19759 Change-Id: Ibfac3f0baabcad7604b713c934a8fc37932e0a50 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-14 15:45:38 -07:00

1 2 3 4 5 ...

10681 Commits