tailscale

mirror of https://github.com/tailscale/tailscale.git synced 2026-06-27 09:15:40 -04:00

Author	SHA1	Message	Date
Brad Fitzpatrick	15bb10dbce	tsnet: ban awsstore and kubestore as deps in TestDeps Commit `69c79cb9f` (Sep 2025) moved awsstore and kubestore registration behind condregister build tags so tsnet wouldn't pull in the AWS SDK and Kubernetes client by default. The accompanying TestDeps BadDeps entry was missed, so PR #19667 (which re-added those imports) wasn't caught by the test. Add the two packages to BadDeps so future regressions fail the test. Updates #19667 Updates #12614 Change-Id: I903b7c976e5e122cc0c0b956dc73740f5d474fac Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-06 14:57:47 -07:00
Tom Proctor	b74eeda055	cmd/testwrapper: print unit for package duration (#19663 ) Include the unit (s) when printing the time taken to test each package. Updates #cleanup Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>	2026-05-06 22:31:48 +01:00
kari-ts	c721189cef	ipn/ipnlocal: prefer one CGNAT route on Android (#19652 ) Android rebuilds its VpnService interface when the VPN route configuration changes, which tears down long lived TCP connections through the tunnel. Use the same automatic OneCGNATRoute behavior as macOS on Android, and prefer the single CGNAT route when no other interface is using the CGNAT, falling back to fine grained peer routes otherwise. Updates tailscale/tailscale#19591 Signed-off-by: kari <kari@tailscale.com>	2026-05-05 19:11:17 -07:00
Brad Fitzpatrick	f844c8bc32	util/winutil/gp: deflake TestGroupPolicyReadLockClose The test goroutine read lockCnt immediately after Lock returned, racing with Close: close(lk.closing) wakes lockSlow's select, whose deferred Add(-2) on lockCnt can run before Close's CAS clears the LSB. When that happens, lockCnt is briefly 1 (3 - 2) instead of 0 (1 + 2 - 2 - 1), producing "lockCnt: got 1; want 0". Move the lockCnt assertion into the main test goroutine, after both Close has returned and the Lock goroutine has finished, so both updates have settled before we read. Fixes #19647 Change-Id: Ia67036ff73a1beb528cbd621460db9048f3066ad Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-05 14:02:35 -07:00
Jonathan Nobels	872d79089e	VERSION.txt: this is v1.99.0 (#19645 ) Signed-off-by: Jonathan Nobels <jonathan@tailscale.com> v1.99.0-pre	2026-05-05 15:07:20 -04:00
Evan Lowry	aa21b0c008	client/systray: fix recommended exit node not showing as selected (#19627 ) When an exit node was set before launching systray, the recommended row in exit nodes rendered as not selected even when the active exit node was at the same location. This looks to be two different things: - suggestExitNode takes its own suggestion into account, and not the users active exit node. When a mullvad city is reached via the picker rather than the recommended row, the suggester's pick and prefs.ExitNodeID end up as distinct peers in the same city, resulting in an ID-only equality check missing the match. - Toggle state was constructed and mutated via .Check(), which for newly created elements may be cached (such as when launching systray, with an already active node). Fixes #19626 Signed-off-by: Evan Lowry <evan@tailscale.com>	2026-05-05 10:49:38 -03:00
Alex Chan	eac531da8e	cmd/tailscale/cli: unhide `--report posture` flag in `up` This was originally hidden during the beta period in both `up` and `set`, then when device posture went GA we unhid the flag in `set` but not in `up`. This is confusing for users, because an error message can direct them to run `tailscale up` with this flag if they've set it previously, but the help text won't tell them what it does. Updates #5902 Updates #17972 Change-Id: I9a31946f4b3bb411feed0f5a6449d7ff9a5ba9d3 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-05 10:12:36 +01:00
Brad Fitzpatrick	883d4fd2cd	wgengine/netstack, net/ping: stop using pro-bing and use our net/ping instead Fixes #19633 Fixes #13760 Change-Id: I0fa9423523a3a0fb1dfcde57de0f26e51723ff97 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-04 14:05:24 -07:00
Brad Fitzpatrick	81569e891f	tstest/iosdeps: update import list to mirror ipn-go-bridge The purpose of this package is to test the iOS dependency closure, but it had drifted from the actual import list of the ipn-go-bridge package in the corp repo (the Go side of the iOS / macOS app). Update the imports to match ipn-go-bridge's GOOS=ios import list, adding many missing packages including wgengine/netstack, feature/{taildrop,syspolicy,condregister}, the util/syspolicy/* subpackages, types/{key,lazy,logid,netmap}, tsd, safesocket, util/{eventbus,must,set}, and several net/* and ipn/* packages. Drop two now-stale BadDeps entries (for now!): database/sql/driver and github.com/google/uuid are reached via wgengine/netstack -> github.com/prometheus-community/pro-bing, which netstack imports on darwin \|\| ios for ICMP user-ping, so the iOS app already ships them. But we should fix that later. Updates #19633 Change-Id: Ic50779fdb195685a2e8ccd7c513eee91b0feeaf8 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-04 14:05:24 -07:00
Brad Fitzpatrick	9bb7ca6116	cmd/vet/lowerell, drive/driveimpl: forbid variables named "l" or "I" Add a new vet checker that rejects variables, parameters, named return values, receivers, range/type-switch bindings, type parameters, struct fields, and constants named "l" (lowercase ell) or "I" (uppercase i). Both are hard to distinguish from the digit "1" and from each other in too many fonts. Rename the two pre-existing struct fields named "l" (both of type net.Listener) in drive/driveimpl/drive_test.go to "ln", matching the convention used elsewhere for net.Listener locals. Rename the test-fixture struct fields "I" (single int label) to "Int" in metrics/multilabelmap_test.go and util/deephash/deephash_test.go, preserving the "first letters of types" convention used alongside neighboring fields like I8/I16/U/U8. Also teach pkgdoc_test.go to skip testdata/ directories, which the go tool ignores; they are not real packages. Fixes #19631 Change-Id: I71ad2fa990705f7a070406ebcdb8cefa7487d849 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-04 14:03:28 -07:00
Andrew Lytvynov	0cf899610c	util/linuxfw/linuxfwtest: remove unused package (#19520 ) Added in 2022, this appears to be unused now. Updates #cleanup Signed-off-by: Andrew Lytvynov <awly@tailscale.com>	2026-05-04 12:33:12 -07:00
License Updater	ca2317439d	licenses: update license notices Signed-off-by: License Updater <noreply+license-updater@tailscale.com>	2026-05-04 10:34:27 -07:00
Jordan Whited	ce76f44df2	derp/derpserver: remove global rate limiter Which can be unfair around varying packet sizes. Updates tailscale/corp#40962 Signed-off-by: Jordan Whited <jordan@tailscale.com>	2026-05-04 09:41:14 -07:00
Fernando Serboncini	29122506be	misc/git_hook: propagate shared HOOK_VERSION (#19476 ) Move HOOK_VERSION into the githook package and export it as githook.HookVersion, so tailscale/corp can reference it via the shared-code bump instead of having to bump HOOK_VERSION by hand. New launcher.sh composes the wanted version from 2 sources: the shared HOOK_VERSION and an optional repo local version, misc/git_hook/HOOK_VERSION, for repo-specific config bumps. Updates tailscale/corp#40381 Change-Id: I7cf16889ba53cb564cc2df7dfd7588748f542c55 Signed-off-by: Fernando Serboncini <fserb@tailscale.com>	2026-05-04 12:38:28 -04:00
George Jones	290a6cc03c	appc, feature/conn25: handle exact and wildcard domains correctly (#19202 ) Installed SplitDNS routes are always treated as wildcard domains, so the domains that we pass to the local resolver should be normalized and have any leading *. wildcard prefix removed. When looking at DNS responses to see if the domain matches, we need to consider both exact matches and wildcard matches. We now keep separate maps of exact-match domains and wildcard domains, and when we match we check to see if there's a match in the exact-match map, otherwise we check against the wild card match map until we find a match, removing a label after each check. Rather than looking for matching self-hosted domains (domains serviced by the connector being run on the self-node), the apps that are being serviced by the connector on the self-node are tracked instead. When checking to see if a DNS response should be rewritten, it is ignored if any of the matching apps for the domain are in the self-hosted apps set. Fixes tailscale/corp#39272 Signed-off-by: George Jones <george@tailscale.com>	2026-05-01 17:33:21 -04:00
Fran Bull	bdf3419e7d	net/dns: add custom scheme resolvers If another part of the client code registers a custom scheme with the forwarder, the forwarder will check resolver addresses to see if they match the scheme. If they do, the corresponding custom scheme handler will be called to find the actual address for the resolver at this moment. If the handler returns the empty string then that resolver will be ignored. This is useful if you want to dynamically determine where to send certain DNS requests. It is being added to support new app connector (conn25) work that would like to make sure it sends DNS requests to the current connector peer in a high availability configuration. Updates tailscale/corp#39858 Signed-off-by: Fran Bull <fran@tailscale.com>	2026-05-01 14:01:10 -07:00
Rollie Ma	78126c5d9f	tailcfg: add node capability for services in desktop clients (#19605 ) Add a node capability to help determine if the desktop clients should show services list/menu/section Updates: https://github.com/tailscale/corp/issues/40900 Change-Id: Ie34b3362f921d710173b2a0dd190354352bb26f0 Signed-off-by: Rollie Ma <rollie@tailscale.com>	2026-05-01 12:07:33 -07:00
Tom Meadows	ee10f9881c	cmd/k8s-operator: add authkey reissuing to recorder reconciler (#19556 ) also fixes memory leak with authKeyReissuing map on ProxyGroup reconciler authkey reissue. Updates #19311 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>	2026-05-01 18:26:55 +01:00
Alex Chan	3ced30b0b6	tka: clarify that this limit is on disablement values not secrets Values get written into TKA state; secrets don't. Updates #cleanup Change-Id: Ief9831dcb1102f584a33b2e71b611b38ca463724 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-01 18:25:39 +01:00
Andrew Lytvynov	f15a4f4416	client/web: move API permission checks into handlers (#19576 ) There are only a couple endpoints that check peer capabilities. Keeping permission checks with the code that assumes they were performed, rather than with the routing layer, feels easier to reason about. Check that the caller is actually a peer and pass their capabilities via a context value for handlers that want to check them. Along with this, simplify the helper handler wrappers that are not needed for most of the endpoints. Updates #40851 Signed-off-by: Andrew Lytvynov <awly@tailscale.com>	2026-05-01 09:01:53 -07:00
Brad Fitzpatrick	bbcb8650d4	cmd/tailscale/cli: fetch netmap via current-netmap debug action Stop opening an IPN bus subscription with NotifyInitialNetMap purely to read the current netmap once. Use the LocalAPI debug current-netmap action (added in `159cf8707`) instead, which returns the current netmap synchronously without subscribing to the bus. Updates #12542 Change-Id: I8aa2096d65aaea4dfe62634f03ce06b5470e0e51 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-01 07:53:51 -07:00
Brad Fitzpatrick	4c3ed5ab32	all: migrate code off Notify.NetMap to Notify.SelfChange Move tailscaled's in-tree reactive users from of IPN bus Notify.NetMap updates to the narrower Notify.SelfChange signal introduced earlier in this series. Consumers that need additional state (peers, DNS config, etc.) fetch it on demand via the LocalAPI. It is a step toward the larger goal of not fanning Notify.NetMap out to every bus watcher on Linux/non-GUI hosts. A future change stops sending Notify.NetMap entirely on Linux and non-GUI platforms. (eventually once macOS/iOS/Windows migrate to the upcoming new Notify APIs, we'll remove ipn.Notify.NetMap entirely) Updates #12542 Change-Id: I51ea9d86bdca1909d6ac0e7d5bd3934a3a4e8516 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-01 06:51:40 -07:00
Claus Lensbøl	ff9c3f0e00	tstest/natlab/vmtest: add test loading netmap cache from disk (#19598 ) For testing the loading of netmap cache from disk, the cache needs to exist. The simple solution is to start two nodes and connect them to control, with the netmap caching capability set. Then cut the connection to control, restart the nodes, and ping between them. This tests that we can start from a cache and get to running state, but also that we are able to establish a connection between the nodes. For now this is not testing how the nodes are able to talk to each other (DERP vs direct). Updates #19597 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-05-01 09:46:19 -04:00
Brad Fitzpatrick	89a78dc9b7	client/local, ipn/localapi, ipn/ipnlocal: add PeerByID Add a narrow LocalAPI accessor and matching client/LocalBackend method to look up a single peer's current full [tailcfg.Node] by NodeID, in O(1) time on the daemon side, without fetching the entire netmap. Useful for callers that need the latest state of a single peer (e.g. in response to a peer-mutation event on the IPN bus) without paying for a full netmap fetch. Updates #12542 Change-Id: I1cb2d350e6ad846a5dabc1f5368dfc8121387f7c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-01 06:20:46 -07:00
Alex Chan	cac94f51cc	ipn/ipnlocal: don't compact TKA state on startup Compacting on startup means nodes may compact at a different cadence based on whether they're long-running or restarting frequently. We already compact after every sync, which only occurs when the TKA state has changed. Waiting for TKA changes to trigger compaction on nodes means compaction will occur more consistently across a tailnet. Updates tailscale/corp#33537 Change-Id: Ia0aa6d9e5e362e9ab08450fde69772841790d5b5 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-01 13:27:12 +01:00
Brad Fitzpatrick	a6c5d23742	ipn, ipn/ipnlocal: add Notify.SelfChange Add a new bus signal that lets reactive consumers (containerboot, kube agents, sniproxy, tsconsensus, etc.) react to self-node updates without having to subscribe to the full netmap. Today those consumers either watch Notify.NetMap (which on large tailnets is expensive to encode and ship per watcher) or poll. SelfChange is a cheap, narrow alternative: addresses, name, key expiry, capabilities, etc. Consumers that need additional state can react to SelfChange and then fetch the relevant bits on demand via existing LocalClient methods. Producer-side, every netmap-bearing setControlClientStatus call now also publishes SelfChange. Future changes will migrate individual in-tree consumers off Notify.NetMap to this signal, and eventually gate the legacy NetMap emission to platforms whose host GUIs still require it. Updates #12542 Change-Id: I4441650b0e085d663eb6bf26a03748b7d961ca49 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-30 14:47:03 -07:00
Brad Fitzpatrick	9f343fdc0c	client/local, ipn/localapi, all: add CertDomains and DNSConfig accessors Add two narrow LocalAPI accessors so callers don't have to subscribe to the IPN bus and pull a full *netmap.NetworkMap just to read DNS-shaped fields: - GET /localapi/v0/cert-domains returns DNS.CertDomains. - GET /localapi/v0/dns-config returns the full tailcfg.DNSConfig. Migrate in-tree callers off the netmap-on-the-bus pattern: - kube/certs.waitForCertDomain still wakes on the IPN bus but now queries CertDomains via LocalClient.CertDomains rather than reading n.NetMap.DNS.CertDomains. The kube LocalClient interface and FakeLocalClient gain a CertDomains method. - cmd/tailscale dns status calls LocalClient.DNSConfig directly instead of opening a NotifyInitialNetMap watcher. - cmd/tailscale configure kubeconfig switches from a netmap watcher + serviceDNSRecordFromNetMap to LocalClient.DNSConfig + serviceDNSRecordFromDNSConfig. This is part of a series moving callers away from depending on the netmap traveling on the IPN bus, so the bus payload can shrink in a later change. Updates #12542 Change-Id: Ie10204e141d085fbac183b4cfe497226b670ad6c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-30 13:50:46 -07:00
Michael Ben-Ami	822299642b	feature/conn25: centralize config on Conn25 with atomic access We have two sources of truth for configuration state: the node view (from the netmap/policy) and prefs (the --advertise-connector option). These come with two independent update paths: onSelfChange for node view changes and profileStateChange for pref changes. Centralize config on Conn25 so that onSelfChange and profileStateChange can update their independent parts without bundling changes together. The old bundled approach required read-modify-write, which opened the door to potential TOCTOU bugs. The node view config is stored as an atomic.Pointer[config] and the prefs-derived field (advertise-connector) becomes an independent atomic.Bool. onSelfChange creates a fresh config and stores it atomically. profileStateChange sets the bool. This also establishes clearer lines of responsibility: - Configuration state lives on Conn25. Methods that need to read config (isConnectorDomain, mapDNSResponse, the IPMapper methods) are on Conn25, and use the atomics for synchronization. - "Active" state (address allocations, transit IP mappings) lives on client and connector, and use a mutex for synchronization on that state, without conflicting with configuration synchronization. It's fine for active state to be out of sync with config — e.g. a transit IP allocated for an app should still be tracked, and gracefully expired, even if the app is removed from the node view. Removing config responsibility from client/connector makes these cases clearer to handle. - In cases where the client or connector does need access to config-derived state, e.g. a client reconfiguring its IP pools from the IPSets in the config, we can use closures for the client or connector to get just the latest state it needs from the config. See getIPSets() in this commit. - As of this commit, the connector doesn't need config-derived state at all. Fixes tailscale/corp#40872 Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>	2026-04-30 16:29:56 -04:00
Brad Fitzpatrick	159cf8707a	ipn/ipnlocal, all: split LocalBackend.NetMap into NetMapNoPeers / NetMapWithPeers Add two narrower accessors alongside the existing [LocalBackend.NetMap], with docs that distinguish their semantics: - NetMapNoPeers: cheap (returns the cached *netmap.NetworkMap with a possibly-stale Peers slice). For callers that only read non-Peers fields like SelfNode, DNS, PacketFilter, capabilities. - NetMapWithPeers: documented as returning an up-to-date Peers slice. For callers that genuinely need to iterate Peers or call PeerByXxx. Mark the existing NetMap deprecated and point readers at the two new accessors. NetMap, NetMapNoPeers, and NetMapWithPeers all currently return the same value (b.currentNode().NetMap()): this commit is a no-op behaviorally, just a renaming and migration of in-tree callers. A subsequent change in the same series will switch NetMapWithPeers to actually rebuild the Peers slice from the live per-node-backend peers map (O(N) per call), at which point the distinction between the two new accessors becomes load-bearing. Migrate in-tree callers to the appropriate accessor based on what fields they read: - NetMapNoPeers (most common): localapi handlers, peerapi accept, GetCertPEMWithValidity, web client noise request, doctor DNS resolver check, tsnet CertDomains/TailscaleIPs, ssh/tailssh SSH-policy/cap reads, several LocalBackend internals (isLocalIP, allowExitNodeDNSProxyToServeName, pauseForNetwork nil-check, serve config). - NetMapWithPeers: writeNetmapToDiskLocked (persist full netmap to disk for fast restart), PeerByTailscaleIP lookup. Tests still call the legacy NetMap; they'll see the deprecation warning but otherwise behave identically. Also add two pieces of plumbing the next change in this series will need, but which are already useful on their own: - [client/local.GetDebugResultJSON]: a generic [Client.DebugResultJSON] that decodes directly into a target type T, avoiding the marshal/unmarshal roundtrip callers otherwise need. - localapi "current-netmap" debug action: returns the current netmap (with peers) as JSON. Documented as debug-only — the netmap.NetworkMap shape is internal and may change without notice. This commit is part of a series breaking up a larger change for review; on its own it is a no-op refactor. Updates #12542 Change-Id: Idbb30707414f8da3149c44ca0273262708375b02 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-30 11:14:06 -07:00
Brad Fitzpatrick	92179b1fc7	cmd/hello: split server into helloserver package Move the template, request handler, and HTTP/HTTPS server wiring out of package main and into a new cmd/hello/helloserver package so the server can be embedded in other binaries. The main package now only constructs a helloserver.Server with the production addresses and calls Run. While here, drop the -http, -https, and -test-ip flags along with the dev-mode template and fake-data fallbacks they enabled; the binary is only run in production. Updates tailscale/corp#32398 Change-Id: Id1d38b981733334cafc596021130f36e1c1eed67 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-30 08:40:55 -07:00
David Bond	644c3224e9	cmd/{containerboot,k8s-operator}: don't return pointers to maps (#19593 ) This commit modifies the usage of the `egressservices.Configs` type within containerboot and the k8s operator. Originally it was being thrown around as a pointer which is not required as maps are already pointers under the hood. Signed-off-by: David Bond <davidsbond93@gmail.com>	2026-04-30 16:11:00 +01:00
Brad Fitzpatrick	815bb291c9	cmd/tailscale/cli: allow tag without "tag:" prefix in 'tailscale up' If a user passes --advertise-tags=foo,bar (with no colons in any segment), automatically prepend "tag:" client-side so it goes on the wire as "tag:foo,tag:bar". Segments that already contain a colon are left untouched and must be fully-qualified ("tag:foo"), which keeps the door open for future colon-bearing syntax. This was originally added in `cd07437ad` (2020-10-28) and then reverted in `1be01ddc6` (2020-11-10) over forward-compatibility concerns. But then it was realized in 2026-04-29 that this was always safe for future extensiblity anyway (tags can't contain colons-- tag:foo:bar is invalid anyway, per the 2020 CheckTag restrictions). So if we wanted to perhaps some hypothetical --advertise-tags=tagset:setfoo or "group:foo", we'd still have syntax to do, as it can't conflict with tag:group:foo. Avery signed off on this on Slack: "Ok, I withdraw my objection to auto-qualifying tag names in advertise-tags and I hope I won't regret it :)" Updates #861 Change-Id: I06935b0d3ae909894c95c9c2e185b7d6a219ff32 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-30 07:13:48 -07:00
Brad Fitzpatrick	f343b496c3	wgengine, all: remove LazyWG, use wireguard-go callback API for on-demand peers Replace the UAPI text protocol-based wireguard configuration with wireguard-go's new direct callback API (SetPeerLookupFunc, SetPeerByIPPacketFunc, RemoveMatchingPeers, SetPrivateKey). Instead of computing a trimmed wireguard config ahead of time upon control plane updates and pushing it via UAPI, install callbacks so wireguard-go creates peers on demand when packets arrive. This removes all the LazyWG trimming machinery: idle peer tracking, activity maps, noteRecvActivity callbacks, the KeepFullWGConfig control knob, and the ts_omit_lazywg build tag. For incoming packets, PeerLookupFunc answers wireguard-go's questions about unknown public keys by looking up the peer in the full config. For outgoing packets, PeerByIPPacketFunc (installed from LocalBackend.lookupPeerByIP) maps destination IPs to node public keys using the existing nodeByAddr index. Updates tailscale/corp#12345 Change-Id: I4cba80979ac49a1231d00a01fdba5f0c2af95dd8 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 19:46:19 -07:00
Brad Fitzpatrick	b313bffbe7	control/tsp, tstest/integration/testcontrol: deflake TestMapAgainstTestControl The test was flaky under stress with "AddRawMapResponse N: node not connected" failures. The root cause was in testcontrol's addDebugMessage: it conflated "no streaming poll registered" with "wake-up channel buffer momentarily full". The single-slot updatesCh is just a lossy wake-up signal, but the streaming serveMap loop has fast paths (takeRawMapMessage and the hasPendingRawMapMessage continue) that don't drain it. A stale notification could remain buffered, causing the next sendUpdate to fail even though msgToSend had been queued and the streaming poll would still pick it up. Detect the real failure case (no streaming poll) by checking s.updates[nodeID] directly, and treat sendUpdate's buffer-full result as benign — the message is in msgToSend, which is the source of truth. Also plumb an optional health.Tracker through tsp.ClientOpts to the underlying ts2021.Client and supply one in the tests, eliminating the "## WARNING: (non-fatal) nil health.Tracker (being strict in CI)" stack dumps emitted by controlhttp.(Dialer).forceNoise443 under CI. Fixes #19583 Change-Id: Ib2334376585e8d6562f000a0b71dea0117acb0ff Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 16:11:00 -07:00
Claus Lensbøl	978b6a81b2	ipn/ipnlocal: always ReSTUN when starting up without a cache (#19586 ) `78627c1` introduced starting up and preserving the DERP server from cache, but also changed it so the initial ReSTUN would not fire when setting the DERPMap. Change this so when not working from a cache, the ReSTUN will always fire during startup. Updates #19585 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-04-29 18:56:57 -04:00
Jordan Whited	c0a9728fe2	derp/derpserver: fix Server.UpdateRateLimits docs As of `0e9f9e2bd` it is possible to have an infinity per-client limit, with finite global. Updates tailscale/corp#40962 Signed-off-by: Jordan Whited <jordan@tailscale.com>	2026-04-29 14:43:12 -07:00
Jordan Whited	0e9f9e2bd8	derp/derpserver: support global rate limiting independent of per-client This commit enables the operator to set a global rate limit without any per-client. Updates tailscale/corp#40962 Signed-off-by: Jordan Whited <jordan@tailscale.com>	2026-04-29 14:15:53 -07:00
Brad Fitzpatrick	15cba0a3f6	tstest/natlab/vmtest: add TestDiscoKeyChange Add a vmtest that brings up two gokrazy nodes A and B behind two One2OneNAT networks (so direct UDP works in both directions and any slowness can't be blamed on NAT traversal), establishes a WireGuard tunnel A → B with TSMP, then rotates B's disco key four times and asserts that the data plane recovers in both directions after each rotation. All pings are TSMP (the data-plane ping; disco pings would not exercise the WireGuard tunnel itself). The five pings: 1. A → B (initial; brings up the tunnel; 30s budget) 2. B → A after rotate (LocalAPI rotate-disco-key debug action) 3. A → B after rotate (LocalAPI) 4. B → A after restart (SIGKILL; gokrazy supervisor respawns) 5. A → B after restart (SIGKILL) Each post-rotation ping gets a 15-second budget. Two unavoidable multi-second waits dominate today: - The rotate-then-a→b phase takes ~10s on main because of LazyWG. After B's WantRunning bounce, B's wgengine resets its sentActivityAt/recvActivityAt maps and trims A out of the wireguard-go config as an "idle peer"; B only re-adds A on inbound activity, by which point A's first few TSMP packets have been silently dropped at B's tundev. The bradfitz/rm_lazy_wg branch removes that trimming entirely (verified locally: this phase drops to <100ms there). - The restart phases take ~5s for wireguard-go's RekeyTimeout handshake retry. After SIGKILL+respawn the first WG handshake init from the restarted node sometimes goes into the void (likely the brief peer-removed window in the receiver's two-step maybeReconfigWireguardLocked reconfig during which the peer is absent from wireguard-go), and wg-go's 5s+jitter retransmit timer is the next opportunity to retry. That retry succeeds and the staged TSMP packet flushes. Intrinsic to the protocol's retransmit policy. Once LazyWG is removed and the first-handshake-after-reconfig race is fixed, the budget should drop to 5s. Supporting changes: ipn/ipnlocal: DebugRotateDiscoKey now toggles WantRunning off and back on after rotating the disco key. magicsock.Conn.RotateDiscoKey only resets local disco state; without also dropping wireguard-go session keys, peers keep encrypting with their stale per-peer session against us until their rekey timer fires (WireGuard has no data-plane signaling to invalidate sessions). Bouncing WantRunning runs the engine through Reconfig(empty) → authReconfig, which drops every peer's WG session so the next packet either way triggers a fresh handshake. ipn/ipnlocal, ipn/localapi: add a debug-only "peer-disco-keys" LocalAPI action ([LocalBackend.DebugPeerDiscoKeys]) that returns a map[NodePublic]DiscoPublic from the current netmap. Tests reach it via [local.Client.DebugResultJSON]. We do not surface disco keys via [ipnstate.PeerStatus] because adding a non-comparable [key.DiscoPublic] field there breaks reflect-based test helpers (e.g. TestFilterFormatAndSortExitNodes' use of cmp.Diff), and general LocalAPI clients have no need for disco keys. Since the debug LocalAPI is gated behind the ts_omit_debug build tag, this endpoint is automatically stripped from small binaries. cmd/tta: add /restart-tailscaled handler (Linux-only, via /proc walk) to drive the SIGKILL phase. On gokrazy the supervisor respawns tailscaled within a second. tstest/integration/testcontrol: add Server.AllOnline. When set, every peer entry in MapResponses is marked Online=true. Several disco-key handling fast paths in controlclient and wgengine (removeUnwantedDiscoUpdates, removeUnwantedDiscoUpdatesFromFull NetmapUpdate, the wgengine tsmpLearnedDisco fast path) only fire for online peers; without this flag, tests exercising disco-key rotation only hit the offline-peer code paths, which mask issues and are several seconds slower in this scenario. Finer-grained per-node online tracking can be added later. tstest/natlab/vmtest: add Env.RotateDiscoKey, Env.RestartTailscaled, Env.PeerDiscoKey, Node.Name, an [AllOnline] EnvOption that plumbs through to testcontrol.Server.AllOnline, and an exported Env.Ping(from, to, type, timeout). Ping replaces the unexported helper so callers can specify both a ping type (PingDisco for warming peer state, PingTSMP for asserting end-to-end connectivity) and a deadline. PeerDiscoKey returns its LocalAPI error so callers inside tstest.WaitFor can retry transient failures rather than fataling the test. Updates #12639 Updates #13038 Change-Id: I3644f27fc30e52990ba25a3983498cc582ddb958 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 12:58:00 -07:00
Brad Fitzpatrick	22ff402da9	wgengine/magicsock: restore SetDERPMap signature, add SetDERPMapWithoutReSTUN Commit `78627c132f` changed the signature of magicsock.Conn.SetDERPMap to take an additional bool doReStun parameter. Avoid both the boolean parameter and the API signature change by restoring SetDERPMap to its original single-argument form and adding a new SetDERPMapWithoutReSTUN method for the cache-loading caller that wants to skip the post-set ReSTUN. Updates #19490 Change-Id: I97d9e82156bfc546ccf59756d1ea52f039b5de06 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 12:46:15 -07:00
Adriano Sela Aviles	1cd8bcc827	tailcfg: extend services model for client application actions Updates: tailscale/corp#40648 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>	2026-04-29 11:33:13 -07:00
Brad Fitzpatrick	70f0b261b6	go.mod, gokrazy: bump to fork of gokrazy/gokrazy init process for syslog change When we switched to monogok in `371d6369cd`, we lost our gokrazy fork's change to let the syslog be configured from the Linux cmdline. That's sent upstream in gokrazy/gokrazy#275 but still in review. Meanwhile, revert to a fork, while still keeping monogok. Monogok was updated to support an alternate init package, which is now hosted temporarily at https://github.com/tailscale/ts-gokrazy This means we can rip out the log polling loop out of pending PR #19568 and go ack to using syslog. Updates #13038 Change-Id: I36931ee8eecc40d6165ad036c6181dfb07b86ba2 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 11:27:41 -07:00
Alex Valiushko	01d0bdd253	cmd/derper,derp: add metrics for rate limit hits (#19560 ) Expvars track count of rate limiters exceeding their threshold. Covers (1) global rate limiter and (2) total of local rate limiters. Also publish optional rate-limit metrics during ExpVar() call if -rate-config is specified. Fixes current rate-limit metrics being published outside of "derp" in /debug/vars. Updates tailscale/corp#38509 Change-Id: Ic7f5a1e890d0d7d3d7b679daa4b5f8926a6a6964 Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>	2026-04-29 10:29:09 -07:00
Claus Lensbøl	be7cce74ba	wgengine/userspace: do not fall back to old key on tsmpLearned mismatch (#19575 ) The mismatch behaviour of falling back to a previous key could end up breaking connections when the netmap update took longer than the 2 seconds allowed in controlClient.auto for netmap updates, or if the controlClient context was canceled. This could end up breaking legitimate updates to the netmap for disco keys coming from control. Instead, log the event, and let the connection be reset to that of the key as that is safer. Issue found by @bradfitz. Updates #19574 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-04-29 13:23:04 -04:00
Brad Fitzpatrick	fd6ae2fad4	tstest/natlab/vmtest: serialize per-platform setup with sync.Once Two cloud-platform nodes (e.g. sr-a and sr-b in TestSiteToSite) boot in parallel via errgroup and both call ensureCompiled and the inline image preparation block, racing to Begin() the same shared Step (which is deduped by name in Env.Step). The second goroutine panics: panic: Step "Compile linux_amd64 binaries": Begin called in state running panic: Step "Prepare ubuntu-24.04 image": Begin called in state done ensureCompiled had a TOCTOU dedup attempt (released compileMu before doing the work, only added to the compiled set at the end), and image preparation had no dedup at all. Replace the compiled set with a per-key map[string]sync.Once for each of compile and image preparation, so concurrent callers serialize on the Once and only the first executes Begin/work/End. Fixes commit `02ffe5baa8`. Updates #13038 Change-Id: If710bcc9e0aafebf0ad5b61553bae11458d976d7 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 09:54:58 -07:00
Brad Fitzpatrick	02ffe5baa8	tstest/natlab/vmtest: add macOS VM snapshot caching for fast test starts Cache a pre-booted macOS VM snapshot on disk so subsequent test runs restore from the snapshot instead of cold-booting. The snapshot is keyed by the Tart base image digest and a code version constant (macOSSnapshotCodeVersion); bumping either invalidates the cache. Snapshot preparation (one-time): - Boot the Tart base image with a NAT NIC (--nat-nic flag) - Wait for SSH, compile and install cmd/tta as a LaunchDaemon - TTA polls the host via AF_VSOCK for an IP assignment; during prep the host replies "wait" - Disconnect NIC, save VM state via SIGINT Test fast path (cached, ~7s to agent connected): - APFS clone the snapshot, write test-specific config.json - Launch Host.app with --disconnected-nic --attach-network --assign-ip - VZ restores from SaveFile.vzvmsave (~5s with 4GB RAM) - TTA's vsock poll gets the IP config, sets static IP via ifconfig (bypasses DHCP entirely), switches driver addr to the IP directly (bypasses DNS), and resets the dial context so the reverse-dial reconnects immediately - TTA agent connects to test driver within ~2s of IP assignment Key optimizations: - 4GB RAM instead of 8GB: halves SaveFile.vzvmsave (1.4GB vs 2.4GB), halves restore time (5.5s vs 11s) - AF_VSOCK IP assignment: bypasses macOS DHCP (~5-7s saved) - Direct IP dial: bypasses DNS resolution for test-driver.tailscale - Dial context reset: cancels stale in-flight dials from snapshot - Kill instead of SIGINT for test VM cleanup (no state save needed) - Parallel VM launches Also: - Add TestDriverIPv4/TestDriverPort constants to vnet - Add --nat-nic and --assign-ip flags to Host.app - Fix SIGINT handler: retain DispatchSource globally, use dispatchMain() - Add vsock listener (port 51011) to Host.app for IP config protocol - Add disconnectNetwork() to VMController for clean snapshot state - Fix Makefile: set -o pipefail so xcodebuild failures aren't swallowed Updates #13038 Change-Id: Icbab73b57af7df3ae96136fb49cda2536310f31b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 08:17:13 -07:00
M. J. Fromberger	7b53550fe6	control/controlclient: fix a nil-indirection bug in DERP key pruning (#19565 ) Upon deciding to update the LastSeen timestamp, we weren't checking that the field we are replacing into was non-nil. Rather than add an additional check, just allocate a fresh pointer for the updated time. Updates #19564 Change-Id: I589ebe65175fc7677c04a31dd6c4670e2531ee62 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-04-29 07:57:38 -07:00
David Bond	a29e42135b	cmd/k8s-operator: add nodeSelector to `DNSConfig` resource (#19429 ) This commit modifies the `DNSConfig` resource to allow customisation of the `spec.nodeSelector` field in the nameserver pods. Closes: https://github.com/tailscale/tailscale/issues/19419 Signed-off-by: David Bond <davidsbond93@gmail.com>	2026-04-29 15:56:33 +01:00
Brad Fitzpatrick	4cec06b8f2	tstest/natlab/vmtest: add macOS VM screenshot streaming to web UI When --vmtest-web is set, Host.app is launched with --screenshot-port 0 to start a localhost HTTP server that captures the VZVirtualMachineView display. The Go test harness parses the SCREENSHOT_PORT=<port> line from stdout, then polls every 2 seconds for JPEG thumbnails and pushes them over WebSocket to the web dashboard. Clicking a screenshot thumbnail opens a full-resolution image proxied through the web UI's /screenshot/{node} endpoint. Screenshot events are excluded from the EventBus history (they're large and only the latest matters, stored in NodeStatus.Screenshot). Updates #13038 Change-Id: I9bc67ddd1cc72948b33c555d4be3d8db06a41f6d Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 07:48:26 -07:00
Claus Lensbøl	78627c132f	wgengine/magicsock,ipn/ipnlocal: store and load homeDERP from cache (#19491 ) With netmap caching, the home DERP of the self node was neither saved to the cache or loaded from it, making nodes not stick to a DERP when starting without a connection to control. Instead, make sure that when a cache is available, load that cache, before looking for DERP servers. This is implemented by allowing a skip of ReSTUN in setting the DERP map (we must have a DERP map before setting the home DERP), so the DERP from cache will set itself and be sticky until a connection to control is established. Making DERP only change when connected to control is handled by existing code from `f072d017bd`. Updates #19490 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-04-29 10:24:09 -04:00
Alex Chan	1841a93ab2	ssh/tailssh: mark TestSSHRecordingCancelsSessionsOnUploadFailure as flaky (again) This test is still flaking on macOS, so mark it as such so we can track and investigate further. Updates #7707 Change-Id: I640da3c1068a90a9815caab2df9431bceb01f846 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-04-29 14:22:09 +01:00

1 2 3 4 5 ...

10591 Commits