tailscale

mirror of https://github.com/tailscale/tailscale.git synced 2026-05-29 19:18:57 -04:00

Author	SHA1	Message	Date
Brad Fitzpatrick	412c812d76	ipn/ipnlocal: use ACME ALPN for authorized Funnel non-CertDomain domains If a user explicitly adds a non-ts.net (not a CertDomain domain) domain like "foo.com" to their serve config as a web target that's also an allowed funnel domain (using raw "tailscale serve set-config"), then use the new ALPN cert fetching (from `b553969b`) to get certs for that domain. This is just plumbing; there's no new product functionality to actually enable this easily client-side, and it also has no visible product surface to enable it server-side. Updates tailscale/corp#41736 Change-Id: Ie2e421ac9611bce64bba3de6a454b2d505ea0e8a Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-28 13:33:45 -07:00
Alex Chan	9d126aec34	all: remove network lock references from private method names Updates tailscale/corp#37904 Change-Id: I312d46d958209ca3d1152d1877fb91a57c91798d Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-28 18:00:36 +01:00
Brendan Creane	8d90a6ab1e	ipn/ipnlocal: add HTTP/2 Content-Type tests for serve reverse proxy (#19905 ) Adds two tests exercising the HTTP/2-inbound -> plaintext HTTP/1.1 backend path through serve's reverseProxy and through the full serveWebHandler entry point (with a funnel serveHTTPContext). Updates #19866 Signed-off-by: Brendan Creane <bcreane@gmail.com>	2026-05-28 09:46:36 -07:00
Alex Chan	f4a280cdbd	all: update a few more references to network/tailnet lock Updates tailscale/corp#37904 Change-Id: I746b06328e080fa2b9ff28a2d099f95645aa3d0b Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-28 16:44:16 +01:00
Alex Chan	446ae97491	ipn: improve --exit-node hostname error during startup When parsing the `tailscale up --exit-node=ARG` argument, we try to resolve hostnames by searching the list of peers. However, at startup, the peer list is empty, causing hostname lookups to trivially fail with an unhelpful "invalid value" erorr. Improve the error message when the peer list is empty to inform the user that hostnames cannot be resolved during startup, and advise them to use the exit node's Tailscale IP address instead. Also, clarify that hostnames must be peer hostnames, not arbitrary hostnames. Fixes #19882 Change-Id: I9390a427c2863d657cf46c5e33b43cb3c5363764 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-28 16:43:45 +01:00
Brad Fitzpatrick	c9fb05b6f5	ipn/ipnlocal: don't dup-suppress UserProfiles on IPNBus on profile switches Fixes #19889 Change-Id: I324a735c13772c0c79ed7392c0baa5064b34823b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 14:47:02 -07:00
Brad Fitzpatrick	364b952d62	cmd/containerboot: track peers from IPN bus updates, stop using netmap.NetworkMap Some tests in another repo were broken by tailscale/tailscale#19607. This fixes them, by finishing off the rest of the migration away from netmap.NetworkMap on the IPN bus in containerboot. Containerboot used to rebuild a full NetworkMap-shaped view while reacting to IPN bus notifications. Now it insteads has its own netmapState type (immutable) of exactly what it needs to track, and sends those immutable values around, making cheap edits of new immutable values when an IPN bus edit arrives. This should make cmd/containerboot scale to much larger tailnets now too. Fixes #19852 Fixes tailscale/corp#42347 Updates #12542 Change-Id: I88adaf061f85f677f954a764935e6654329d75a6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 14:12:48 -07:00
Brad Fitzpatrick	b553969b03	ipnlocal: try ACME TLS-ALPN for Funnel renewals Use TLS-ALPN-01 for Funnel certificate renewals only when the node already has a cached certificate, and fall back to DNS-01 with a fresh order if the ALPN path is unavailable or fails. Dynamically advertise acme-tls/1 only while an ACME challenge certificate is pending, and add client metrics for DNS-01 and TLS-ALPN-01 start/success/failure paths. Updates tailscale/corp#41736 Fixes tailscale/corp#42320 Change-Id: I5adc6ea129237f9ef592f84fc1a8953c80bc9d5c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 09:30:23 -07:00
Brad Fitzpatrick	2c965ab540	types/netmap, ipn/ipnlocal, control/controlclient: rename NodeMutationAdd to NodeMutationUpsert NodeMutationAdd was a misleading name: a PeersChanged entry in a MapResponse can represent either a truly new peer or a full replacement for an existing peer that couldn't be expressed as a PeerChangedPatch. Calling it "Add" implied it was always a completely new node, which is wrong. (I'd changed my mind on the design of mapping add/delete events to NodeMutations halfway through #19607 and forgot to update the name, even though I'd updated half the docs) Rename it to NodeMutationUpsert to reflect the actual semantics: the node should be inserted or replaced in the peer map regardless of whether it already existed. Updates #19607 Updates #12542 Change-Id: Iebd3daddb3318cba02e115a1b184fcb3ee8f83d6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 08:37:14 -07:00
Brad Fitzpatrick	a8f40a2ca5	ipn/ipnlocal: add missing bus notify of peers on full netmap The prior `aa5da2e5f2` ("process node adds/removes in constant time") commit missed a bus notification case, where new-style subscribers set NotifyNoNetmap and then the controlclient map routing sends a full update (rather than a delta). Those profiles + peers need to be put on the bus too. I noticed this only when porting the Android app over to use the new bus stuff. Updates #19607 Updates #12542 Change-Id: I82c35011d2c532222ca27f7d4e790522c31bd156 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 08:03:47 -07:00
Jordan Whited	e5a8cf3b18	control/controlknobs,feature/*,ipn/ipnlocal,tailcfg: add runtimemetrics Emit runtime metrics as clientmetrics when the NodeAttrEmitRuntimeMetrics NodeCapability is present. We start small with just 2 metrics: heap bytes and total process memory. Updates tailscale/corp#39434 Signed-off-by: Jordan Whited <jordan@tailscale.com>	2026-05-26 16:02:01 -07:00
Simon Law	da8cd5cc7f	ipn/ipnlocal: fix documentation typo, NodeAttrCacheNetworkMaps (#19851 ) Updates #cleanup Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 22:19:10 -07:00
Simon Law	988615dbad	ipn/ipnlocal,tstest/integration: pause the control client consistently (#19846 ) There are two places where tailscaled transitions into a paused state: 1. tailscaled’s controlclient is initially created, 2. tailscale down, or the GUI equivalent, commands it to. This patch unifies the implementation of both scenarios into LocalBackend.shouldPauseControlClientLocked to prevent the implementation from drifting. The flaky tstest/integration.TestNoControlConnWhenDown test exposed this mismatch, but only by accident. This patch also changes TestNode.MustDown so that it runs `tailscale down` and then waits for the testcontrol server to finish handling any associated /machine/map requests. Fixes #19831 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 17:58:44 -07:00
Brad Fitzpatrick	5295e3e119	ipn/{ipnstate,ipnlocal}: add integer NodeID to PeerStatus In `aa5da2e5f2` we made the IPN bus include deltas, including the PeersRemoved, sending a slice of integer NodeIDs that were removed. But when updating xcode, I realized there was no way to map those integers to the stable node IDs used in other places. I was consdering changing the just-added ipn.Notify.PeersRemoved from an IntID to a string StableID, but then it doesn't match the MapResponse wire protocol, which we've tried to match so far. Instead, just add the integer ID as well. Callers can use whichever world they want, having both. It's a little regrettable that we still have two worlds of IDs, but oh well. Neither is really suitable to a hypothetical future fully federated world of control servers anyway, so we'll need a third type later anyway, so just live with the two we have for now. Updates #12542 Change-Id: Ib8fd48a265e1da1f8779152f141f624a7f7260e9 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-22 08:16:55 -07:00
Simon Law	7dabebc691	net/traffic: switch rendezvous hashing from SHA256 to FNV-1a (#19821 ) In PR tailscale/corp#30448, we originally decided to break ties using SHA256 for our rendezvous hashing algorithm. Now that we’ve had some experience with it, we think that FNV-1a is a better choice. It distributes bits evenly, it’s much faster, and it doesn’t need to be cryptographically secure. The FNV designers recommend FNV-1a over the deprecated FNV-1. This PR makes the switch and updates the related tests, since changing the algorithm changes which stable pick gets selected. As of 2026-05, this is the best time to make this change, since there are almost no clients in the wild with traffic steering enabled. Updates #17366 Updates tailscale/corp#29964 Updates tailscale/corp#29966 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-21 10:11:59 -07:00
Brad Fitzpatrick	aa5da2e5f2	ipn/ipnlocal, control/controlclient: process node adds/removes in constant time For large tailnets (~50k+ nodes) with frequent peer churn (ephemeral GitHub Actions workers etc.), tailscaled used to rebuild the full netmap and fan it out on the IPN bus on every MapResponse that added or removed a peer. There were two O(N) costs per delta: the full netmap rebuild + every Notify.NetMap encode to every bus watcher. This change tackles both: 1. Plumb O(1) peer add/remove through the delta path. PeersChanged and PeersRemoved no longer prevent the delta happy path; instead, they mutate the per-node-backend peer map in place. 2. Restrict ipn.Notify.NetMap emission to the platforms whose host GUIs still depend on it (Windows, macOS, iOS) and migrate in-tree consumers off it everywhere else: - Migrate reactive consumers (containerboot, kube agents, sniproxy, tsconsensus, etc.) off Notify.NetMap to the previously-added Notify.SelfChange signal so they no longer have to subscribe to the full netmap. - Add ipn.NotifyNoNetMap so GUI clients on "legacy-emit" platforms that have already migrated can opt out of the per-watcher NetMap encode. - Gate Notify.NetMap emission on the producer side by a compile- time GOOS check, so the supporting code is dead-code-eliminated on Linux and other geese where no GUI consumer needs it. Re-running BenchmarkGiantTailnet from tstest/largetailnet, which was added along with baseline numbers on unmodified main in `ad5436af0d`, the per-delta cost (one peer add+remove pair) is now ~O(1) regardless of tailnet size N: N no-watcher (ms/op) bus-watcher (ms/op) before now factor before now factor 10000 32 0.11 300x 166 0.13 1300x 50000 222 0.11 2000x 865 0.13 6700x 100000 504 0.12 4100x 1765 0.13 13400x 250000 1551 0.12 12500x 4696 0.15 32400x Updates #12542 Change-Id: I94e34b37331d1a8ec74c299deffadf4d061fda9e Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-21 09:26:19 -07:00
Simon Law	7ebca58042	net/traffic,ipn/ipnlocal: extract traffic steering utilities (#19682 ) The traffic package contains helpers for evaluating traffic steering scores and picking appropriate nodes. These were extracted from ipnlocal.suggestExitNodeUsingTrafficSteering so they can be reused by the new routecheck package to probe exit nodes in priority order. Updates #17366 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-21 08:28:27 -07:00
Brad Fitzpatrick	f3a117e813	net/tsdial: run happy eyeballs across A and AAAA in UserDial When tailscaled is running in userspace-networking mode behind an exit node (e.g. as a SOCKS5 proxy), it resolves a hostname and then dials a single resolved IP through the tunnel. If the name has both A and AAAA, Go's net.Resolver merges them and we pick ips[0], which on an IPv6-native host is usually AAAA. If the exit node has no IPv6 egress (or vice versa), the dial fails silently through the tunnel and the user sees a hang. Resolve all candidates and race connect attempts across address families with a 300ms happy-eyeballs delay, matching Go's net.Dialer default and the existing pattern in net/dnscache (commit `ee0a03b14`). First success wins; losers are cancelled and any conns they produce are closed. A failBoost channel wakes the launcher when a connect fails fast (e.g. ICMP "no route" via the tunnel) so we don't sit on the 300ms timer when the answer is already known. userDialResolve is refactored into userDialResolveAll (returns the full candidate list) plus a thin single-IP wrapper for callers like UserDialPlan that don't race. UserDial's per-IP dispatch (netstack vs peer dialer vs SystemDial vs std) is extracted to dialOneUser so each candidate can route correctly on its own merits. Also fix serveDial in localapi to pass the original hostname to UserDial rather than a pre-resolved IP, so the race can fire. This fix is single-ended: it works against any exit node, including old ones, with no protocol changes. The trade-off versus filtering on the exit-node side via PeerAPI DoH is that every dial through an unreachable-family exit node costs one failed connect attempt per cache window, rather than zero, which is acceptable given the simplicity. Fixes #19792 Fixes #13257 Change-Id: I9d7645d0034caf3ee22ecdd8070798353f77e94b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-20 18:35:55 -07:00
M. J. Fromberger	c09407002f	ipn/ipnlocal/netmapcache: add UpdateSelfOnly method (#19818 ) Some netmap updates are guaranteed to affect only the "static" parts of the netmap, and so should not require us to walk through all the peers and user profiles when updating the cache. To support this, the new UpdateSelfOnly method updates only the Self node and other tailnet settings that are not dependent on the peers and profiles. Use this when updating the cache on DERP home changes. Updates #12542 Change-Id: Ifed522b29d579fb76e010b4ff738cc4e0a72d27f Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-05-20 16:29:04 -07:00
Simon Law	93dbd33ef7	ipn/ipnlocal: stub system interfaces for TestShouldUseOneCGNATRoute (#19807 ) The TestShouldUseOneCGNATRoute test fails when the underlying system interfaces don’t match what the underlying assumptions of the test. That assumption was that there would only ever be one CGNAT interface: the Tailscale one. This breaks on Linux when border0 is installed because border0 also creates an interface with a CGNAT route. This patch stubs netmon.RegisterInterfaceGetter to replace the system interfaces and netmon.SetTailscaleInterfaceProps to identify the test data that defines the Tailscale interface. This patch also tests the control knob override for CGNAT for every combination of operating system and system interfaces, instead of just a couple of combinations. Fixes #19731 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-20 16:00:14 -07:00
Alex Chan	0cb432ed84	all: update more references to Tailnet/Network Lock Updates tailscale/corp#37904 Change-Id: I09e73b3248b9ddf86dafe33dfb621bd560f6596d Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-15 16:23:50 +01:00
Adriano Sela Aviles	41286c2b56	ipn/ipnlocal,tsd: add NoiseRoundTripper to tsd.Sys Adds a new NoiseRoundTripper field to tsd.Sys to expose an http.RoundTripper to make requests over the control plane Noise connection. This will be used in PAM use cases soon. Updates tailscale/corp#41800 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>	2026-05-13 14:56:28 -07:00
Simon Law	6467f0d067	ipn/ipnlocal: fix minor typo in shouldUseOneCGNATRoute (#19719 ) This fixes a log message where ipn/ipnlocal.shouldUseOneCGNATRoute would claim that an android machines was actually macOS. Updates #cleanup Updates #19652 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-12 21:55:29 -07:00
Adriano Sela Aviles	72578de033	ipn/{ipnlocal,localapi},client/local: add per-dst cap resolution for services Adds two new cap resolution methods alongside the existing PeerCaps: PeerCapsForService(src netip.Addr, svcName tailcfg.ServiceName) resolves the service name to its VIP addresses via the node's service IP mappings and returns caps scoped to that service. Exposed on /v0/whois via the svc_name query parameter and on client/local.Client as WhoIsForService. PeerCapsForIP(src, dst netip.Addr) resolves caps against an arbitrary destination IP. Exposed on /v0/whois via the svc_addr query parameter and on client/local.Client as WhoIsForIP. svc_name takes priority over svc_addr when both are present. Invalid values for either return 400. The existing PeerCaps/WhoIs path is unchanged: without a service parameter, WhoIs returns only host-level caps. Updates tailscale/corp#41632 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>	2026-05-12 15:50:39 -07:00
M. J. Fromberger	9f48567bf1	ipn/ipnlocal,wgengine/magicsock: add basic counters for cached peer connectivity (#19699 ) Add new clientmetric counters for establishing contact with peers while using cached network map data. To do this, instrument the magicsock.Conn with a bit to indicate whether its peer data came from a cached netmap. If so, there are two conditions we will count as establishing connectivity to a peer: - Receipt of a CallMeMaybe from a peer via disco. - Establishing a valid endpoint address for a peer. In vmtest, add Env.ClientMetrics to scrape metrics from the specified node. Use this to check that counters were updated in caching tests. Updates https://github.com/tailscale/projects/issues/13 Updates #12639 Change-Id: Ie8cf3244ac8af4f5bcfe4d0d944078da2ba08990 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-05-12 12:01:05 -07:00
Alex Chan	d6ffc0d986	tka,ipn: reduce boilerplate in Tailnet Lock tests The `CreateStateForTest` helper reduces boilerplate in cases where the test only cares about the trusted keys and not the disablement values (and makes it more obvious where the disablement values are meaningful). The `setupChonkStorage` helper reduces the boilerplate when creating on-disk TKA storage in tests. The `fakeLocalBackend` helper reduces the boilerplate when setting up a `LocalBackend` instance in the IPN tests. Updates #cleanup Change-Id: Iacfba1be5f7fab208eec11e4369d63c7d7519da5 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-07 21:49:27 +01:00
kari-ts	c721189cef	ipn/ipnlocal: prefer one CGNAT route on Android (#19652 ) Android rebuilds its VpnService interface when the VPN route configuration changes, which tears down long lived TCP connections through the tunnel. Use the same automatic OneCGNATRoute behavior as macOS on Android, and prefer the single CGNAT route when no other interface is using the CGNAT, falling back to fine grained peer routes otherwise. Updates tailscale/tailscale#19591 Signed-off-by: kari <kari@tailscale.com>	2026-05-05 19:11:17 -07:00
Brad Fitzpatrick	89a78dc9b7	client/local, ipn/localapi, ipn/ipnlocal: add PeerByID Add a narrow LocalAPI accessor and matching client/LocalBackend method to look up a single peer's current full [tailcfg.Node] by NodeID, in O(1) time on the daemon side, without fetching the entire netmap. Useful for callers that need the latest state of a single peer (e.g. in response to a peer-mutation event on the IPN bus) without paying for a full netmap fetch. Updates #12542 Change-Id: I1cb2d350e6ad846a5dabc1f5368dfc8121387f7c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-01 06:20:46 -07:00
Alex Chan	cac94f51cc	ipn/ipnlocal: don't compact TKA state on startup Compacting on startup means nodes may compact at a different cadence based on whether they're long-running or restarting frequently. We already compact after every sync, which only occurs when the TKA state has changed. Waiting for TKA changes to trigger compaction on nodes means compaction will occur more consistently across a tailnet. Updates tailscale/corp#33537 Change-Id: Ia0aa6d9e5e362e9ab08450fde69772841790d5b5 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-01 13:27:12 +01:00
Brad Fitzpatrick	a6c5d23742	ipn, ipn/ipnlocal: add Notify.SelfChange Add a new bus signal that lets reactive consumers (containerboot, kube agents, sniproxy, tsconsensus, etc.) react to self-node updates without having to subscribe to the full netmap. Today those consumers either watch Notify.NetMap (which on large tailnets is expensive to encode and ship per watcher) or poll. SelfChange is a cheap, narrow alternative: addresses, name, key expiry, capabilities, etc. Consumers that need additional state can react to SelfChange and then fetch the relevant bits on demand via existing LocalClient methods. Producer-side, every netmap-bearing setControlClientStatus call now also publishes SelfChange. Future changes will migrate individual in-tree consumers off Notify.NetMap to this signal, and eventually gate the legacy NetMap emission to platforms whose host GUIs still require it. Updates #12542 Change-Id: I4441650b0e085d663eb6bf26a03748b7d961ca49 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-30 14:47:03 -07:00
Brad Fitzpatrick	9f343fdc0c	client/local, ipn/localapi, all: add CertDomains and DNSConfig accessors Add two narrow LocalAPI accessors so callers don't have to subscribe to the IPN bus and pull a full *netmap.NetworkMap just to read DNS-shaped fields: - GET /localapi/v0/cert-domains returns DNS.CertDomains. - GET /localapi/v0/dns-config returns the full tailcfg.DNSConfig. Migrate in-tree callers off the netmap-on-the-bus pattern: - kube/certs.waitForCertDomain still wakes on the IPN bus but now queries CertDomains via LocalClient.CertDomains rather than reading n.NetMap.DNS.CertDomains. The kube LocalClient interface and FakeLocalClient gain a CertDomains method. - cmd/tailscale dns status calls LocalClient.DNSConfig directly instead of opening a NotifyInitialNetMap watcher. - cmd/tailscale configure kubeconfig switches from a netmap watcher + serviceDNSRecordFromNetMap to LocalClient.DNSConfig + serviceDNSRecordFromDNSConfig. This is part of a series moving callers away from depending on the netmap traveling on the IPN bus, so the bus payload can shrink in a later change. Updates #12542 Change-Id: Ie10204e141d085fbac183b4cfe497226b670ad6c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-30 13:50:46 -07:00
Brad Fitzpatrick	159cf8707a	ipn/ipnlocal, all: split LocalBackend.NetMap into NetMapNoPeers / NetMapWithPeers Add two narrower accessors alongside the existing [LocalBackend.NetMap], with docs that distinguish their semantics: - NetMapNoPeers: cheap (returns the cached *netmap.NetworkMap with a possibly-stale Peers slice). For callers that only read non-Peers fields like SelfNode, DNS, PacketFilter, capabilities. - NetMapWithPeers: documented as returning an up-to-date Peers slice. For callers that genuinely need to iterate Peers or call PeerByXxx. Mark the existing NetMap deprecated and point readers at the two new accessors. NetMap, NetMapNoPeers, and NetMapWithPeers all currently return the same value (b.currentNode().NetMap()): this commit is a no-op behaviorally, just a renaming and migration of in-tree callers. A subsequent change in the same series will switch NetMapWithPeers to actually rebuild the Peers slice from the live per-node-backend peers map (O(N) per call), at which point the distinction between the two new accessors becomes load-bearing. Migrate in-tree callers to the appropriate accessor based on what fields they read: - NetMapNoPeers (most common): localapi handlers, peerapi accept, GetCertPEMWithValidity, web client noise request, doctor DNS resolver check, tsnet CertDomains/TailscaleIPs, ssh/tailssh SSH-policy/cap reads, several LocalBackend internals (isLocalIP, allowExitNodeDNSProxyToServeName, pauseForNetwork nil-check, serve config). - NetMapWithPeers: writeNetmapToDiskLocked (persist full netmap to disk for fast restart), PeerByTailscaleIP lookup. Tests still call the legacy NetMap; they'll see the deprecation warning but otherwise behave identically. Also add two pieces of plumbing the next change in this series will need, but which are already useful on their own: - [client/local.GetDebugResultJSON]: a generic [Client.DebugResultJSON] that decodes directly into a target type T, avoiding the marshal/unmarshal roundtrip callers otherwise need. - localapi "current-netmap" debug action: returns the current netmap (with peers) as JSON. Documented as debug-only — the netmap.NetworkMap shape is internal and may change without notice. This commit is part of a series breaking up a larger change for review; on its own it is a no-op refactor. Updates #12542 Change-Id: Idbb30707414f8da3149c44ca0273262708375b02 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-30 11:14:06 -07:00
Brad Fitzpatrick	f343b496c3	wgengine, all: remove LazyWG, use wireguard-go callback API for on-demand peers Replace the UAPI text protocol-based wireguard configuration with wireguard-go's new direct callback API (SetPeerLookupFunc, SetPeerByIPPacketFunc, RemoveMatchingPeers, SetPrivateKey). Instead of computing a trimmed wireguard config ahead of time upon control plane updates and pushing it via UAPI, install callbacks so wireguard-go creates peers on demand when packets arrive. This removes all the LazyWG trimming machinery: idle peer tracking, activity maps, noteRecvActivity callbacks, the KeepFullWGConfig control knob, and the ts_omit_lazywg build tag. For incoming packets, PeerLookupFunc answers wireguard-go's questions about unknown public keys by looking up the peer in the full config. For outgoing packets, PeerByIPPacketFunc (installed from LocalBackend.lookupPeerByIP) maps destination IPs to node public keys using the existing nodeByAddr index. Updates tailscale/corp#12345 Change-Id: I4cba80979ac49a1231d00a01fdba5f0c2af95dd8 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 19:46:19 -07:00
Claus Lensbøl	978b6a81b2	ipn/ipnlocal: always ReSTUN when starting up without a cache (#19586 ) `78627c1` introduced starting up and preserving the DERP server from cache, but also changed it so the initial ReSTUN would not fire when setting the DERPMap. Change this so when not working from a cache, the ReSTUN will always fire during startup. Updates #19585 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-04-29 18:56:57 -04:00
Brad Fitzpatrick	15cba0a3f6	tstest/natlab/vmtest: add TestDiscoKeyChange Add a vmtest that brings up two gokrazy nodes A and B behind two One2OneNAT networks (so direct UDP works in both directions and any slowness can't be blamed on NAT traversal), establishes a WireGuard tunnel A → B with TSMP, then rotates B's disco key four times and asserts that the data plane recovers in both directions after each rotation. All pings are TSMP (the data-plane ping; disco pings would not exercise the WireGuard tunnel itself). The five pings: 1. A → B (initial; brings up the tunnel; 30s budget) 2. B → A after rotate (LocalAPI rotate-disco-key debug action) 3. A → B after rotate (LocalAPI) 4. B → A after restart (SIGKILL; gokrazy supervisor respawns) 5. A → B after restart (SIGKILL) Each post-rotation ping gets a 15-second budget. Two unavoidable multi-second waits dominate today: - The rotate-then-a→b phase takes ~10s on main because of LazyWG. After B's WantRunning bounce, B's wgengine resets its sentActivityAt/recvActivityAt maps and trims A out of the wireguard-go config as an "idle peer"; B only re-adds A on inbound activity, by which point A's first few TSMP packets have been silently dropped at B's tundev. The bradfitz/rm_lazy_wg branch removes that trimming entirely (verified locally: this phase drops to <100ms there). - The restart phases take ~5s for wireguard-go's RekeyTimeout handshake retry. After SIGKILL+respawn the first WG handshake init from the restarted node sometimes goes into the void (likely the brief peer-removed window in the receiver's two-step maybeReconfigWireguardLocked reconfig during which the peer is absent from wireguard-go), and wg-go's 5s+jitter retransmit timer is the next opportunity to retry. That retry succeeds and the staged TSMP packet flushes. Intrinsic to the protocol's retransmit policy. Once LazyWG is removed and the first-handshake-after-reconfig race is fixed, the budget should drop to 5s. Supporting changes: ipn/ipnlocal: DebugRotateDiscoKey now toggles WantRunning off and back on after rotating the disco key. magicsock.Conn.RotateDiscoKey only resets local disco state; without also dropping wireguard-go session keys, peers keep encrypting with their stale per-peer session against us until their rekey timer fires (WireGuard has no data-plane signaling to invalidate sessions). Bouncing WantRunning runs the engine through Reconfig(empty) → authReconfig, which drops every peer's WG session so the next packet either way triggers a fresh handshake. ipn/ipnlocal, ipn/localapi: add a debug-only "peer-disco-keys" LocalAPI action ([LocalBackend.DebugPeerDiscoKeys]) that returns a map[NodePublic]DiscoPublic from the current netmap. Tests reach it via [local.Client.DebugResultJSON]. We do not surface disco keys via [ipnstate.PeerStatus] because adding a non-comparable [key.DiscoPublic] field there breaks reflect-based test helpers (e.g. TestFilterFormatAndSortExitNodes' use of cmp.Diff), and general LocalAPI clients have no need for disco keys. Since the debug LocalAPI is gated behind the ts_omit_debug build tag, this endpoint is automatically stripped from small binaries. cmd/tta: add /restart-tailscaled handler (Linux-only, via /proc walk) to drive the SIGKILL phase. On gokrazy the supervisor respawns tailscaled within a second. tstest/integration/testcontrol: add Server.AllOnline. When set, every peer entry in MapResponses is marked Online=true. Several disco-key handling fast paths in controlclient and wgengine (removeUnwantedDiscoUpdates, removeUnwantedDiscoUpdatesFromFull NetmapUpdate, the wgengine tsmpLearnedDisco fast path) only fire for online peers; without this flag, tests exercising disco-key rotation only hit the offline-peer code paths, which mask issues and are several seconds slower in this scenario. Finer-grained per-node online tracking can be added later. tstest/natlab/vmtest: add Env.RotateDiscoKey, Env.RestartTailscaled, Env.PeerDiscoKey, Node.Name, an [AllOnline] EnvOption that plumbs through to testcontrol.Server.AllOnline, and an exported Env.Ping(from, to, type, timeout). Ping replaces the unexported helper so callers can specify both a ping type (PingDisco for warming peer state, PingTSMP for asserting end-to-end connectivity) and a deadline. PeerDiscoKey returns its LocalAPI error so callers inside tstest.WaitFor can retry transient failures rather than fataling the test. Updates #12639 Updates #13038 Change-Id: I3644f27fc30e52990ba25a3983498cc582ddb958 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 12:58:00 -07:00
Brad Fitzpatrick	22ff402da9	wgengine/magicsock: restore SetDERPMap signature, add SetDERPMapWithoutReSTUN Commit `78627c132f` changed the signature of magicsock.Conn.SetDERPMap to take an additional bool doReStun parameter. Avoid both the boolean parameter and the API signature change by restoring SetDERPMap to its original single-argument form and adding a new SetDERPMapWithoutReSTUN method for the cache-loading caller that wants to skip the post-set ReSTUN. Updates #19490 Change-Id: I97d9e82156bfc546ccf59756d1ea52f039b5de06 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-29 12:46:15 -07:00
Claus Lensbøl	78627c132f	wgengine/magicsock,ipn/ipnlocal: store and load homeDERP from cache (#19491 ) With netmap caching, the home DERP of the self node was neither saved to the cache or loaded from it, making nodes not stick to a DERP when starting without a connection to control. Instead, make sure that when a cache is available, load that cache, before looking for DERP servers. This is implemented by allowing a skip of ReSTUN in setting the DERP map (we must have a DERP map before setting the home DERP), so the DERP from cache will set itself and be sticky until a connection to control is established. Making DERP only change when connected to control is handled by existing code from `f072d017bd`. Updates #19490 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-04-29 10:24:09 -04:00
Alex Chan	bb91bb842c	all: remove everything related to non-seamless key renewal Seamless key renewal has been the default in all clients since 1.90. We retained the ability to disable it from the control plane as a precaution, but we haven't seen any issues that require us to disable it. We're now removing all the code for non-seamless key renewal, because we don't expect to turn it on again, and indeed it's been untested in the field for three releases so might contain latent bugs! Updates tailscale/corp#33042 Change-Id: I4b80bf07a3a50298d1c303743484169accc8844b Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-04-29 10:03:26 +01:00
Brad Fitzpatrick	ad5436af0d	tstest/largetailnet, tstest/integration/testcontrol: add in-process large-tailnet benchmark Add a Go benchmark that exercises a single tailnet client (a [tsnet.Server] running in the test process) against a synthetic large initial netmap and a stream of caller-driven peer add/remove deltas, all in-process. The harness is split in two parts: - tstest/largetailnet, a reusable package containing a [Streamer] that hijacks the map long-poll on a [testcontrol.Server] via the new AltMapStream hook, sends one initial MapResponse with N synthetic peers, and forwards caller-supplied delta MapResponses on the same stream. Helpers like MakePeer / AllocPeer build synthetic peers with unique IDs and addresses derived from the Tailscale ULA range. - tstest/largetailnet/largetailnet_test.go, BenchmarkGiantTailnet (headless tailscaled workload, no IPN bus subscriber) and BenchmarkGiantTailnetBusWatcher (GUI-client workload with one Notify subscriber attached). Both are gated on --actually-test-giant-tailnet (skipped by default), stand up an in-process testcontrol + tsnet.Server, let Up block until the initial N-peer netmap has been processed, then ResetTimer and run add+remove pairs via b.Loop. Per-delta sync is via a test-only [ipnlocal.LocalBackend.AwaitNodeKeyForTest] channel that closes once the just-added peer key appears in the netmap (no-watcher variant) or via bus-Notify drain (bus-watcher variant). To support the hijack, [testcontrol.Server] grows an AltMapStream hook and a small MapStreamWriter interface for benchmarks/stress tests that need to drive a controlled MapResponse sequence; the normal serveMap path is untouched when AltMapStream is nil. The streamer answers non-streaming "lite" map polls (which controlclient issues before the streaming long-poll to push HostInfo) with an empty MapResponse and returns immediately, so the streaming poll that follows is the one that gets the initial netmap. The benchmark is intended for before/after comparisons of netmap- and delta-handling changes targeted at large tailnets. CPU profiles on unmodified main show the expected O(N) hotspots: setControlClientStatusLocked / authReconfigLocked / userspaceEngine.Reconfig / setNetMapLocked, plus JSON encoding of the full Notify.NetMap to bus watchers (which dominates the BusWatcher variant). Median ms/op over 10 runs on unmodified main, by tailnet size N: N no-watcher bus-watcher 10000 32 166 50000 222 865 100000 504 1765 250000 1551 4696 Recommended invocation: go test ./tstest/largetailnet/ -run=^$ \ -bench='BenchmarkGiantTailnet(BusWatcher)?$' \ -benchtime=2000x -timeout=10m \ --actually-test-giant-tailnet \ --giant-tailnet-n=250000 \ -cpuprofile=/tmp/giant.cpu.pprof Updates #12542 Change-Id: I4f5b2bb271a36ba853d5a0ffe82054ef2b15c585 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-27 11:47:12 -07:00
Brad Fitzpatrick	0e10a3f580	net/tsdial, ipn/localapi, client/local: let clients dial non-Tailscale addresses directly Add a tsdial.Dialer.UserDialPlan method that resolves an address and reports whether the dialer would route it via Tailscale. The LocalAPI /dial handler now uses this to skip proxying for addresses that aren't Tailscale routes (e.g. localhost), returning a Dial-Self response with the resolved address so the client can dial it directly. This avoids an unnecessary round-trip through the daemon for local connections. The client's UserDial handles the new response by dialing the resolved address itself, and the server passes the pre-resolved IP:port for Tailscale dials to avoid redundant DNS lookups. Thanks to giacomo and Moyao for pointing this out! Updates tailscale/corp#39702 Change-Id: I78d640f11ccd92f43ddd505cbb0db8fee19f43a6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-04-27 09:33:27 -07:00
Evan Lowry	3a05c450ce	posture: add HealthTracker for serial number retrieval (#19181 ) Device posture checking can fail while enabled if tailscaled does not have access to smbios. Previously, this was only observable by looking in the tailscaled logs. Fixes tailscale/corp#39314 Signed-off-by: Evan Lowry <evan@tailscale.com>	2026-04-25 15:42:47 -03:00
kari-ts	aa740cb393	ipnlocal/drive: reduce noisey per-peer remote logs (#19493 ) This drops the per peer "appending remote" log while constructing the remote list, which can get noisy on big tailnets, and keeps logs around remote availability checks, including whether a peer is missing, offline, lacks PeerAPI reachability, lacks sharing permission, or is available. Updates tailscale/corp#40580 Signed-off-by: kari-ts <kari@tailscale.com>	2026-04-24 08:26:33 -07:00
James 'zofrex' Sanderson	36f094ea3b	ipn/ipnlocal: deflake TestStateMachine{,Seamless} (#19475 ) Remove the remaining known sources of flakiness in TestStateMachine and TestStateMachineSeamless. Updates tailscale/corp#36230 Updates #19377 Signed-off-by: James Sanderson <jsanderson@tailscale.com>	2026-04-22 10:22:47 +01:00
James 'zofrex' Sanderson	ffae275d4d	ipn/ipnlocal,tailcfg: add /debug/tka c2n endpoint (#19198 ) Updates tailscale/corp#35015 Signed-off-by: James Sanderson <jsanderson@tailscale.com>	2026-04-20 16:00:03 +01:00
James 'zofrex' Sanderson	ec86f0ff93	ipn/ipnlocal: make TestStateMachine less flaky (#19434 ) TestStateMachine & TestStateMachineSeamless both flake a lot asserting the "Shutdown" call on cc after a Logout. This is because Shutdown is called on a goroutine to avoid a deadlock if it's called while holding the LocalBackend lock (#18052). This fixes that cause of flakes by waiting for LocalBackend's goroutine tracker to have no goroutines running (so the goroutine that calls Shutdown must have finished). This does not make TestStateMachine non-flaky because it can flake later in the test, too: the assertion on "unpause" after clearing the netmap between "Start4" and "Start4 -> netmap" sometimes fails. Updates tailscale/corp#36230 Updates #19377 Updates #18052 Signed-off-by: James Sanderson <jsanderson@tailscale.com>	2026-04-20 15:58:21 +01:00
Alex Chan	cf76202aa3	ipn/ipnlocal: log the local and remote TKA HEADs during sync Update this log message to show both the local and remote TKA HEAD; this is useful for debugging issues on nodes that have fallen behind the remote TKA HEAD. Updates tailscale/corp#39455 Change-Id: Ia62ce15756180d2fbac4a898fb94d6143df08b54 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-04-19 16:52:48 +01:00
Scott Graham	cb5a53c424	ipn/ipnlocal: preserve b.loginFlags in auto-login cc.Login calls LocalBackend stores loginFlags at construction so that per-instance properties (e.g. LoginEphemeral set by tsnet.Server.Ephemeral) persist for the session. StartLoginInteractiveAs already merges b.loginFlags into its cc.Login call, but the two auto-login call sites pass bare controlclient.LoginDefault, silently dropping any stored flags. Merge b.loginFlags at both auto-login call sites to match the existing StartLoginInteractiveAs pattern. LoginDefault is zero so this is a no-op when loginFlags is empty, and restores the documented behavior when it isn't. Fixes #15852 Signed-off-by: Scott Graham <scott.github@h4ck3r.net>	2026-04-17 23:31:18 -05:00
Michael Ben-Ami	1dc08f4d41	appc,feature/conn25: prevent clients from forwarding DNS requests and modifying DNS responses for domains they are also connectors for For Connectors 2025, determine if a client is configured as a connector and what domains it is a connector for. When acting as a client, don't install Split DNS routes to other connectors for those domains, and don't alter DNS responses for those domains. The responses are forwarded back to the original client, which in turn does the alteration, swapping the real IP for a Magic IP. A client is also a connector for a domain if it has tags that overlap with tags in the configured policy, and --advertise-connector=true in the prefs (not in the self-node Hostinfo from the netmap). We use the prefs as the source of truth because control only gets a copy from the prefs, and may drift. And the AppConnector field is currently zeroed out in the self-node Hostinfo from control. The extension adds a ProfileStateChange hook to process prefs changes, and the config type is split into prefs and nodeview sub-configs. Fixes tailscale/corp#39317 Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>	2026-04-16 09:41:54 -04:00
Alex Chan	4f47c3c93d	ipn/ipnlocal: log AUM hash on startup as base32, not hex Before: tka initialized at head 325557575a59525354484e4a534f494b4c4e56575435583737564b5036584c4d4c335534554255344c344c36484c5a444a323341 After: tka initialized at head 2UWWZYRSTHNJSOIKLNVWT5X77VKP6XLML3U4UBU4L4L6HLZDJ23A Printing the AUM hash as hex makes it difficult to compare to other AUM hashes; stringifying it will make it consistent with other printing. Updates #cleanup Change-Id: Ic1e23a9ce6a71a53cff7d2190f9fa06eb838ab89 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-04-16 13:45:29 +01:00
M. J. Fromberger	1e4934659b	ipn/ipnlocal: discard cached netmaps upon panic during SetNetworkMap (#19414 ) For debugging purposes, unstable builds will sometimes intentionally panic for unexpected behaviours. We observed such a panic after loading a cached netmap, but because we had a valid cached map, the client was unable to recover on its own and the operator had to manually reset the cache. As a defensive hedge, when netmap caching is enabled, check for a panic during installation of a net network map: If one occurs, discard any cached netmaps before letting the panic unwind, so that we do not lose the panic itself, but reduce the need for manual intervention. Updates #12639 Updates tailscale/corp#27300 Change-Id: I0436889c6bdc2fa728c9cb83630cd7b00a72ce68 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-04-15 11:07:42 -07:00

1 2 3 4 5 ...

2061 Commits