tailscale

mirror of https://github.com/tailscale/tailscale.git synced 2026-06-24 07:52:47 -04:00

Author	SHA1	Message	Date
Simon Law	00b9e8d8ce	ipn: add fmt.Stringer support to NotifyWatchOpt (#20072 ) This patch adds support for the fmt.Stringer interface to the ipn.NotifyWatchOpt enum. This is useful when debugging these bitmasks. For example: fmt.Printf("%s", ipn.NotifyPeerChanges \| ipn.NotifyNoNetMap) // Output: (ipn.NotifyPeerChanges \| ipn.NotifyNoNetMap) Fixes #20066 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-06-18 10:27:16 -07:00
Alex Chan	c3c2aa7093	all: don't repeat the the word "the" unnecessarily Updates #cleanup Change-Id: Ic1f430cd5dbf6cc1a385c59074a5d5cabe6fca57 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-06-18 16:32:08 +01:00
Brad Fitzpatrick	8f210454dd	wgengine/netlog: stop using netmap.NetworkMap type, use LocalBackend The Logger previously took a *netmap.NetworkMap at Startup and on every ReconfigNetworkMap call, denormalizing it into per-IP and self lookup maps. That denormalization is O(n) over all peers and ran on every netmap update, contributing to the broader quadratic behavior we want to eliminate when a single peer is added or removed. Instead, this makes netlog ask LocalBackend (well, nodeBackend) for the info it needs, letting us remove the netmap.NetworkMap type entirely from the netlog package. This is a dependency to removing the netmap.NetworkMap type from upstream callers, like wgengine.Engine in general. Updates #12542 Change-Id: Ib5f2de96e788a667332c0a6f7ac833b3d0053b5c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-06-17 15:11:57 -07:00
Bobi Gunardi	ca20611d11	util: add parse fallback helpers (#20022 ) util/def: add def.Bool and def.Duration default parse helpers Replace multiple instances of def.Bool and def.Duration with a new util/def package. Updates #20018 Co-authored-by: Bobby <boby@codelabs.co.id> Co-authored-by: Simon Law <sfllaw@tailscale.com> Signed-off-by: Bobby <boby@codelabs.co.id> Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-06-15 15:58:51 -07:00
Simon Law	eddd019ee4	ipn/ipnlocal: protect populatePeerStatusLocked from nil Hostinfo (#20150 ) ipnlocal.LocalBackend.populatePeerStatusLocked assumed that Hostinfo was always valid, but that’s not always true, especially in tests. ipnlocal.peerAPIPorts suffered from a similar assumption. This patch checks for NodeView.Valid and Hostinfo.Valid; assuming the zero value as a safe default. Updates #8948 Updates #12542 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-06-15 13:14:12 -07:00
Brad Fitzpatrick	6596d237a3	ipn/ipnlocal: add wireguard session state metrics + publish on IPN bus Updates #19989 Updates tailscale/corp#42874 Change-Id: I843ed95bc7b0f5cd38ba1467332c6b022901e254 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-06-15 11:41:18 -07:00
Brad Fitzpatrick	ae743642d9	ipn/ipnlocal: revert earlier change, force Reconfig + SetNetworkMap new/removed peers The earlier `aa5da2e5f2` made peer adds and removes through a netmap delta path that mutates only nodeBackend, on the assumption that PeerForIP, lookupPeerByIP, the engine's wireguard config (e.lastCfgFull), the engine BART, wgdev's PeerLookupFunc closure, and the engine's cached netmap (e.netMap) would all stay correct without further updates. They don't. I'd totally forgotten that Engine.PeerForIP has its own alternate IP-to-peer lookup codepath. Concretely, all of these failed for a peer that arrived via [tailcfg.MapResponse.PeersChanged] (and never via a full [tailcfg.MapResponse.Peers] list): - [wgengine.Engine.PeerForIP] read from e.netMap and e.lastCfgFull (neither updated on the delta path) and so missed the new peer. The rando non-data-plane callers (Ping, TSMP, pendopen, debug endpoints, tsdial.Dialer.UseNetstackForIP for tsnet and onlyNetstack tailscaled) all returned "no matching peer". - The engine BART (built from e.lastCfgFull) missed the new peer's subnet routes / exit-node default routes. - wgdev's [device.PeerLookupFunc] closure (rebuilt only inside wgcfg.ReconfigDevice) didn't have the new peer's noise key, so outbound encryption to the new peer dropped the packet even when SetPeerByIPPacketFunc returned the right NodePublic. - And nothing in the delta path triggered NodeMutationRemove to flow through to authReconfig either, so the same stale state pointed at removed peers indefinitely. So just (functionally) revert it for now, to have something easily cherry-pickable to the 1.100 release branch. Proper fixes can come later for the next release. This also adds three new tests: - TestPingPeerLearnedViaDelta runs disco and TSMP subtests over a delta-added peer with only self addresses. disco exercises the cold PeerForIP path (magicsock); TSMP exercises the full data path through wgdev encryption. Both fail without this fix. - TestPingSubnetRouteOfDeltaPeer exercises a subnet-router peer arriving via delta. With s1 in --accept-routes mode, an IP inside the advertised CIDR must resolve to s2 and a TSMP ping must round-trip. Hits the BART + lastCfgFull + wgdev staleness in one go. - TestPingSelfReturnsIsLocalIP is a regression guard for the IsSelf early-out in Engine.Ping. Passes on main today; included here so future refactors of PeerForIP can't regress self handling without test breakage. Updates tailscale/corp#43394 Change-Id: I7a049271359bd73e7147ae9e2554e85614c2b8d2 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-06-15 11:41:01 -07:00
M. J. Fromberger	f002f6bb3a	ipn/ipnlocal: remove logs for peer delta cache updates (#20145 ) Added in #20111, but it is too noisy under real load to be useful. Updates #12542 Change-Id: Ib99a8966ade0bfa4281fccc057249819cdcdfe83 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-06-15 10:00:03 -07:00
Brendan Creane	c48f953840	cmd/tailscale/cli, ipn/conffile: accept legacy serve config in set-config (#20056 ) tailscale serve set-config now also accepts the legacy raw ipn.ServeConfig format (as emitted by `tailscale serve status --json` and consumed via TS_SERVE_CONFIG, which has no "version" field), so the common serve-status-edit-set workflow stops failing. Only the services-oriented content is applied; any node-level fields are skipped with a warning to stderr pointing users at get-config to migrate. Fixes tailscale/corp#39793 Signed-off-by: Brendan Creane <bcreane@gmail.com>	2026-06-12 18:52:17 -07:00
M. J. Fromberger	9cb071666c	ipn/ipnlocal: update netmap cache after peer deltas are applied (#20111 ) Add an UpdatePeers method to the cache. This allows us to support netmap peer deltas, by allowing just the peers to be updated in an existing cache. As a safety check, reject an update if there was no base netmap data to apply a change to. Then, when processing peer mutations in the backend, capture any changes that should be applied to the cache and update it, if one is enabled. Updates #12542 Change-Id: I2f8790a8fdc5e85fce6700ba4821a8cb10dddffa Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-06-12 09:41:00 -07:00
Raj Singh	241456ab57	ipn/ipnlocal: add metrics for inbound and outbound bytes on Serve connections (#19991 ) Adds tailscaled_serve_{inbound,outbound}_bytes_total, labeled by Tailscale Service name, by wrapping the peer-facing conn in tcpHandlerForVIPService. Per-service counters persist for the process lifetime rather than being evicted on serve-config changes. Fixes #19572 Signed-off-by: Raj Singh <raj@tailscale.com> Co-authored-by: Ethan Smith <ethan.smith@grafana.com>	2026-06-12 05:49:00 -05:00
Avery Pennarun	6a822dcc36	control/controlclient: continue map poll during key expiry to receive extensions When a client's node key expires and the user clicks "Login" (or runs `tailscale up`), the Login() method was cancelling the map poll context. This caused key extension notifications from the server to be lost, leaving clients stuck in NeedsLogin state even after an admin extended their key. The fix has three parts: 1. Login(): Don't cancel mapCtx if we have valid credentials (loggedIn=true) or a valid node key. This allows the map poll to continue receiving server notifications while the auth flow proceeds in parallel. 2. mapRoutine(): Poll when we have a node key, even if !loggedIn. This handles the tsnet restart scenario where control returns an AuthURL (so loggedIn=false) but we still have a valid node key that can receive map updates. 3. sendStatus()/UpdateFullNetmap(): Forward netmaps when we have a node key, not just when loggedIn. This ensures the backend sees key expiry changes even when the auth flow hasn't completed. "First successful flow wins": if a key extension arrives via map poll, the client recovers automatically. If the auth flow completes first, that works too. Either way, the client is no longer stuck. This aligns with the SeamlessKeyRenewal philosophy: maintain connectivity paths while authentication proceeds, allowing server-initiated recovery. Fixes #19326 Change-Id: I26dbbc1fa7c1159ba075362e44d02814355d6b44 Signed-off-by: Avery Pennarun <apenwarr@tailscale.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-11 19:21:09 +01:00
Brad Fitzpatrick	1deb6a8449	ipn: add no-disconnect in-process bus subscribers Add NotifyInProcessNoDisconnect for in-process IPN bus subscribers that must apply every bus update. When such a subscriber falls behind, block Notify production instead of sending the terminal fell-behind message and closing the watch. This is intentionally not available over LocalAPI, where a slow or stuck out-of-process client should still be disconnected rather than allowed to stall tailscaled. In-process callers that use the bit must keep their callbacks fast and must not call back into LocalBackend from the callback. Updates #20062 Change-Id: I730ad61a07475243bb226fba2262c1a3ded211ae Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-06-09 12:51:38 -07:00
Brad Fitzpatrick	edcc2c94d9	ipn: enforce lossless IPN bus delta streams New-style IPN bus subscribers consume stateful delta streams. Reject NotifyRateLimit when it is combined with those subscription bits so tailscaled cannot merge or delay messages that clients need to apply in order. Also stop silently dropping notifications when a watcher falls behind. Remove the watcher, replace its stale queue with one terminal ErrMessage notification, and close the watch. Updates #20062 Change-Id: Id9d402ea76f4011cd23f122adf62f30dd4b6f90b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-06-09 11:12:20 -07:00
Alex Chan	65a117184b	all: rename NetworkLock functions/types to TailnetLock To avoid breaking downstream code, add deprecated aliases for all the old names. Updates tailscale/corp#37904 Change-Id: I86d0b0d7da371946440b181c665448f91c3ef8d2 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-06-08 13:14:28 +01:00
Mike O'Driscoll	6a709216b9	ipn/ipnlocal,wgengine/magicsock: re-report NetInfo to new control client (#20025 ) magicsock de-duplicates NetInfo callbacks against c.netInfoLast, a cache that lives on the long-lived magicsock.Conn. That cache survives a control client swap (interactive login or profile switch), where only the control client (and its own per-client NetInfo dedup) is replaced. As a result, the first netcheck after the swap produces a structurally-identical NetInfo (same PreferredDERP, same NAT shape), magicsock suppresses it as unchanged, and the new control session never learns our home DERP. Peers can't reach the node over DERP until some unrelated NetInfo field happens to change. Add Conn.ResetNetInfoLast to clear the dedup cache, and call it from LocalBackend.setControlClientLocked whenever a control client is installed, so the next netcheck re-reports the current NetInfo to the new client. netInfoLast is only a dedup/optimization cache (all readers nil-guard, and it is recomputed by every netcheck), so clearing it can only add a delivery, never lose or misroute one; it is scoped to control-client lifecycle events, not steady-state operation. Updates #17887 Fixes #20024 Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>	2026-06-05 13:36:00 -04:00
Harry Harpham	fa542426e5	ipn,ipn/localapi: require local admin to serve Unix domain sockets This resolves a local privilege escalation (LPE). Prior to this change, a non-admin user could utilize serve to access local Unix sockets they otherwise should not be able to access. For example, tailscale serve --http 80 unix:/var/run/docker.sock would give the user access to the Docker socket (usually root only). This works because tailscaled has root access and implements the proxy to the socket (see also: 'the confused deputy problem'). We resolve the problem by refusing to serve Unix targets altogether unless instructed to by a root user. Thanks to Tim Sageser (dtrsecurity) for this report. Fixes tailscale/corp#41998 Signed-off-by: Harry Harpham <harry@tailscale.com>	2026-06-03 09:45:02 -06:00
Brad Fitzpatrick	c91b7188e8	ipn/localapi,tstest/natlab: fix debug derp TLS check for sha256-raw CertName serveDebugDERPRegion built its TLS config with ServerName: cmp.Or(derpNode.CertName, derpNode.HostName), which for a "sha256-raw:<hex>" CertName passed the raw fingerprint to Go's stock verifier as a hostname; the handshake always failed with a hostname mismatch. This is the second half of #15579; the first half (tailscaled itself failing with "unexpected multiple certs presented") was fixed in Extract a tlsConfigForNode helper that mirrors derphttp.Client.tlsClient so that sha256-raw and domain-fronting CertName values are dispatched to tlsdial.SetConfigExpectedCertHash and tlsdial.SetConfigExpectedCert respectively, falling back to HostName when CertName is empty. The core fix here was originally written by @imnuke in #19965; that PR also added a unit test in ipn/localapi/debugderp_test.go which is replaced in this commit by a new vmtest that exercises the whole stack: vnet now serves a self-signed cert valid for each fake DERP node's HostName and exposes its SHA-256 fingerprint, and vmtest grows a new SelfSignedDERPCertPinning EnvOption that swaps the test DERP map's nodes to CertName="sha256-raw:<hex>" with InsecureForTests cleared. TestSelfSignedDERPHashPinning then stands up two hard-NAT'd nodes, has them communicate over DERP, and calls DebugDERPRegion on each. Before this fix the test fails with the exact x509 hostname-mismatch error from the original bug; after, it passes. Updates #15579 Change-Id: I61f38ffebc7ac5abc962639db1ae88f5cd8633b1 Co-authored-by: Nuke <nuke@imnuke.dev> Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-06-02 12:02:40 -07:00
Brad Fitzpatrick	52400dc6f4	ipn/ipnlocal: add back a watchdog after earlier removal from engine Commit `2b338dd6a8` removed watchdogEngine because it was weird (so many methods) and increasingly unnecessary after we'd cleaned up and simplified so much of the locking. This adds back a watchdog, but an easier to maintain one that's more idiomatic. Updates #19759 Change-Id: I86c458473e126c0809f37696446ce7acf4cc4eb9 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-06-02 11:57:12 -07:00
Brad Fitzpatrick	a6ab7efa4f	ipn/ipnlocal, cmd/tailscale/cli: auto-renew TLS certs and warn while pending The Tailscale daemon only refreshed TLS certs as a side effect of inbound TLS handshakes or "tailscale cert" CLI calls. A node that doesn't see inbound traffic during the renewal window silently rolls past expiry. Add a once-per-hour background loop on LocalBackend that enumerates Serve and Funnel HTTPS hostnames (filtered against the netmap's CertDomains so we don't poke ACME for other nodes' service hostnames) and calls the existing GetCertPEM path. The renewal decision (ARI window, then 2/3 expiry fallback) is unchanged; the loop just guarantees it runs. For visibility during initial issuance or restart with a long-expired cached cert, add a "tls-cert-pending" health Warnable that's set while ACME is in flight and no usable cached cert exists. Async renewal of a still-valid cert intentionally doesn't fire it. And then make the CLI "cert" subcommand print out a warning if it's blocking due to a cert fetch in flight, using that health info. Fixes #19911 Fixes #19912 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I144e46c40e957b2e879587decace32a523a6eade	2026-06-01 16:31:54 -07:00
Simon Law	92bfda580c	cmd/tailscale/cli: fix time in `tailscale routecheck` (#19956 ) When running `tailscale netcheck`, the reported timestamp used to be in UTC and formatted according to RFC 3339 with a `T` to separate the date from the time: sfllaw@h2co3:~$ tailscale netcheck \| head -n3 Report: * Time: 2026-06-01T21:12:32.252620138Z This is machine-readable time leaking out to the user interface. Times in normal commands are formatted for humans to read: sfllaw@h2co3:~$ date Mon 01 Jun 2026 02:39:14 PM PDT sfllaw@h2co3:~$ journalctl -t tailscaled \| tail -n1 Jun 01 14:35:21 h2co3 tailscaled[3328921]: wgengine: sending TSMP disco key advertisement to 100.90.144.102 sfllaw@h2co3:~$ timedatectl show Timezone=America/Los_Angeles LocalRTC=no CanNTP=yes NTP=yes NTPSynchronized=yes TimeUSec=Mon 2026-06-01 14:38:32 PDT RTCTimeUSec=Mon 2026-06-01 14:38:32 PDT sfllaw@h2co3:~$ uptime --since 2026-05-15 07:37:45 This PR makes the times printed by the CLI commands consistent: - For `tailscale routecheck`, it now prints local time as `2026-05-15 07:37:45-07:00`. - For `netlogfmt`, it has always printed local time with a space, but now includes the time zone. - All machine-readable outputs continue to be standard RFC 3339 in UTC, i.e. `--format=json`. As part of a general cleanup, this PR also adds standard common time.Format layouts as tstime constants. Fixes #19928 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-06-01 16:12:08 -07:00
Simon Law	28801674a6	net/routecheck: introduce new package for checking peer reachability (#19639 ) The routecheck package parallels the netcheck package, where the former checks routes and routers while the latter checks networks. Like netcheck, it compiles reports for other systems to consume. Historically, the client has never known whether a peer is actually reachable. Most of the time this doesn’t matter, since the client will want to establish a WireGuard tunnel to any given destination. However, if the client needs to choose between two or more nodes, then it should try to choose a node that it can reach. Suggested exit nodes are one such example, where the client filters out any nodes that aren’t connected to the control plane. Sometimes an exit node will get disconnected from the control plane: when the network between the two is unreliable or when the exit node is too busy to keep its control connection alive. In these cases, Control disables the Node.Online flag for the exit node and broadcasts this across the tailnet. Arguably, the client should never have relied on this flag, since it only makes sense in the admin console. This patch implements an initial routecheck client that can probe every node that your client knows about. You should not ping scan your visible tailnet, this method is for debugging only. This patch also introduces a new OnNetMapToggle hook, which fires when the netmap transitions from nil to non-nil, or vice versa. This happens either when the client receives its first MapResponse after connecting to the control plane, or when it clears the netmap while it is disconnecting. Routecheck uses this to wait for a valid netmap so it knows which peers to probe. Updates #17366 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-06-01 10:33:08 -07:00
Brad Fitzpatrick	2ba426802f	ipn/ipnlocal: fix 'tailscale status --peers=false' missing user profile Fixes #19894 Change-Id: I310504987170e0742480c8a02706eb0dbf4ec3dc Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-31 20:34:43 -07:00
Brad Fitzpatrick	3e34e721e8	tsnet: add opt-in SSH support (Server.ListenSSH) This adds tsnet.Server.ListenSSH which, if the SSH feature is linked, returns a net.Listener whose Accept yields *tailssh.Session values (as net.Conn). This lets tsnet apps accept incoming SSH connections to implement custom TUI applications. Basic apps can use net.Conn directly (Read/Write/Close). Rich apps import ssh/tailssh and type-assert for peer identity, PTY, signals, etc. If feature/ssh isn't imported, ListenSSH returns an error. Includes a demo guess-the-number game in tsnet/example/ssh-game. Updates tailscale/corp#37839 Change-Id: I4e7c3c96afb030cdf4da8f2d8b2253820628129a Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-30 14:17:50 -07:00
Fran Bull	c9333854fb	appc,feature/conn25: use custom scheme resolvers for conn25 Currently we are picking a peer for the split dns routes when we get a netmap. Use the new custom scheme resolvers, installed per app in the config in the netmap, to allow us to choose which connector peer should handle a DNS request at the time the request is made. Fixes tailscale/corp#39858 Signed-off-by: Fran Bull <fran@tailscale.com>	2026-05-29 12:23:47 -07:00
kari-ts	7355116c05	ipn/store: make WriteState(id, nil) delete key instead of adding nil entry (#19920 ) All StateStore implementations store a nil value in the cache map when WriteState is called with a nil byte slice instead of deleting the key. This causes ReadState to return (nil, nil) instead of (nil, ErrStateNotExist), since the key is still present in the map. This breaks reset-auth in Windows, Linux, and Android, and the node can't log back in without manually editing the state file. (macOS uses a different state store) DeleteProfile, DeleteAllProfilesForUser, setUnattendedModeAsConfigured are impacted but don't seem to break because the deleted keys are not reread. This deletes the key from the cache instead. Fixes tailscale/corp#42477 Signed-off-by: kari-ts <kari@tailscale.com>	2026-05-29 11:22:14 -07:00
Brad Fitzpatrick	412c812d76	ipn/ipnlocal: use ACME ALPN for authorized Funnel non-CertDomain domains If a user explicitly adds a non-ts.net (not a CertDomain domain) domain like "foo.com" to their serve config as a web target that's also an allowed funnel domain (using raw "tailscale serve set-config"), then use the new ALPN cert fetching (from `b553969b`) to get certs for that domain. This is just plumbing; there's no new product functionality to actually enable this easily client-side, and it also has no visible product surface to enable it server-side. Updates tailscale/corp#41736 Change-Id: Ie2e421ac9611bce64bba3de6a454b2d505ea0e8a Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-28 13:33:45 -07:00
Alex Chan	9d126aec34	all: remove network lock references from private method names Updates tailscale/corp#37904 Change-Id: I312d46d958209ca3d1152d1877fb91a57c91798d Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-28 18:00:36 +01:00
Brendan Creane	8d90a6ab1e	ipn/ipnlocal: add HTTP/2 Content-Type tests for serve reverse proxy (#19905 ) Adds two tests exercising the HTTP/2-inbound -> plaintext HTTP/1.1 backend path through serve's reverseProxy and through the full serveWebHandler entry point (with a funnel serveHTTPContext). Updates #19866 Signed-off-by: Brendan Creane <bcreane@gmail.com>	2026-05-28 09:46:36 -07:00
Alex Chan	f4a280cdbd	all: update a few more references to network/tailnet lock Updates tailscale/corp#37904 Change-Id: I746b06328e080fa2b9ff28a2d099f95645aa3d0b Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-28 16:44:16 +01:00
Alex Chan	446ae97491	ipn: improve --exit-node hostname error during startup When parsing the `tailscale up --exit-node=ARG` argument, we try to resolve hostnames by searching the list of peers. However, at startup, the peer list is empty, causing hostname lookups to trivially fail with an unhelpful "invalid value" erorr. Improve the error message when the peer list is empty to inform the user that hostnames cannot be resolved during startup, and advise them to use the exit node's Tailscale IP address instead. Also, clarify that hostnames must be peer hostnames, not arbitrary hostnames. Fixes #19882 Change-Id: I9390a427c2863d657cf46c5e33b43cb3c5363764 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-28 16:43:45 +01:00
Brad Fitzpatrick	c9fb05b6f5	ipn/ipnlocal: don't dup-suppress UserProfiles on IPNBus on profile switches Fixes #19889 Change-Id: I324a735c13772c0c79ed7392c0baa5064b34823b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 14:47:02 -07:00
Brad Fitzpatrick	364b952d62	cmd/containerboot: track peers from IPN bus updates, stop using netmap.NetworkMap Some tests in another repo were broken by tailscale/tailscale#19607. This fixes them, by finishing off the rest of the migration away from netmap.NetworkMap on the IPN bus in containerboot. Containerboot used to rebuild a full NetworkMap-shaped view while reacting to IPN bus notifications. Now it insteads has its own netmapState type (immutable) of exactly what it needs to track, and sends those immutable values around, making cheap edits of new immutable values when an IPN bus edit arrives. This should make cmd/containerboot scale to much larger tailnets now too. Fixes #19852 Fixes tailscale/corp#42347 Updates #12542 Change-Id: I88adaf061f85f677f954a764935e6654329d75a6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 14:12:48 -07:00
Brad Fitzpatrick	b553969b03	ipnlocal: try ACME TLS-ALPN for Funnel renewals Use TLS-ALPN-01 for Funnel certificate renewals only when the node already has a cached certificate, and fall back to DNS-01 with a fresh order if the ALPN path is unavailable or fails. Dynamically advertise acme-tls/1 only while an ACME challenge certificate is pending, and add client metrics for DNS-01 and TLS-ALPN-01 start/success/failure paths. Updates tailscale/corp#41736 Fixes tailscale/corp#42320 Change-Id: I5adc6ea129237f9ef592f84fc1a8953c80bc9d5c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 09:30:23 -07:00
Brad Fitzpatrick	2c965ab540	types/netmap, ipn/ipnlocal, control/controlclient: rename NodeMutationAdd to NodeMutationUpsert NodeMutationAdd was a misleading name: a PeersChanged entry in a MapResponse can represent either a truly new peer or a full replacement for an existing peer that couldn't be expressed as a PeerChangedPatch. Calling it "Add" implied it was always a completely new node, which is wrong. (I'd changed my mind on the design of mapping add/delete events to NodeMutations halfway through #19607 and forgot to update the name, even though I'd updated half the docs) Rename it to NodeMutationUpsert to reflect the actual semantics: the node should be inserted or replaced in the peer map regardless of whether it already existed. Updates #19607 Updates #12542 Change-Id: Iebd3daddb3318cba02e115a1b184fcb3ee8f83d6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 08:37:14 -07:00
Brad Fitzpatrick	a8f40a2ca5	ipn/ipnlocal: add missing bus notify of peers on full netmap The prior `aa5da2e5f2` ("process node adds/removes in constant time") commit missed a bus notification case, where new-style subscribers set NotifyNoNetmap and then the controlclient map routing sends a full update (rather than a delta). Those profiles + peers need to be put on the bus too. I noticed this only when porting the Android app over to use the new bus stuff. Updates #19607 Updates #12542 Change-Id: I82c35011d2c532222ca27f7d4e790522c31bd156 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 08:03:47 -07:00
Jordan Whited	e5a8cf3b18	control/controlknobs,feature/*,ipn/ipnlocal,tailcfg: add runtimemetrics Emit runtime metrics as clientmetrics when the NodeAttrEmitRuntimeMetrics NodeCapability is present. We start small with just 2 metrics: heap bytes and total process memory. Updates tailscale/corp#39434 Signed-off-by: Jordan Whited <jordan@tailscale.com>	2026-05-26 16:02:01 -07:00
Simon Law	da8cd5cc7f	ipn/ipnlocal: fix documentation typo, NodeAttrCacheNetworkMaps (#19851 ) Updates #cleanup Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 22:19:10 -07:00
Simon Law	988615dbad	ipn/ipnlocal,tstest/integration: pause the control client consistently (#19846 ) There are two places where tailscaled transitions into a paused state: 1. tailscaled’s controlclient is initially created, 2. tailscale down, or the GUI equivalent, commands it to. This patch unifies the implementation of both scenarios into LocalBackend.shouldPauseControlClientLocked to prevent the implementation from drifting. The flaky tstest/integration.TestNoControlConnWhenDown test exposed this mismatch, but only by accident. This patch also changes TestNode.MustDown so that it runs `tailscale down` and then waits for the testcontrol server to finish handling any associated /machine/map requests. Fixes #19831 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 17:58:44 -07:00
Brad Fitzpatrick	5295e3e119	ipn/{ipnstate,ipnlocal}: add integer NodeID to PeerStatus In `aa5da2e5f2` we made the IPN bus include deltas, including the PeersRemoved, sending a slice of integer NodeIDs that were removed. But when updating xcode, I realized there was no way to map those integers to the stable node IDs used in other places. I was consdering changing the just-added ipn.Notify.PeersRemoved from an IntID to a string StableID, but then it doesn't match the MapResponse wire protocol, which we've tried to match so far. Instead, just add the integer ID as well. Callers can use whichever world they want, having both. It's a little regrettable that we still have two worlds of IDs, but oh well. Neither is really suitable to a hypothetical future fully federated world of control servers anyway, so we'll need a third type later anyway, so just live with the two we have for now. Updates #12542 Change-Id: Ib8fd48a265e1da1f8779152f141f624a7f7260e9 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-22 08:16:55 -07:00
Simon Law	7dabebc691	net/traffic: switch rendezvous hashing from SHA256 to FNV-1a (#19821 ) In PR tailscale/corp#30448, we originally decided to break ties using SHA256 for our rendezvous hashing algorithm. Now that we’ve had some experience with it, we think that FNV-1a is a better choice. It distributes bits evenly, it’s much faster, and it doesn’t need to be cryptographically secure. The FNV designers recommend FNV-1a over the deprecated FNV-1. This PR makes the switch and updates the related tests, since changing the algorithm changes which stable pick gets selected. As of 2026-05, this is the best time to make this change, since there are almost no clients in the wild with traffic steering enabled. Updates #17366 Updates tailscale/corp#29964 Updates tailscale/corp#29966 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-21 10:11:59 -07:00
Brad Fitzpatrick	aa5da2e5f2	ipn/ipnlocal, control/controlclient: process node adds/removes in constant time For large tailnets (~50k+ nodes) with frequent peer churn (ephemeral GitHub Actions workers etc.), tailscaled used to rebuild the full netmap and fan it out on the IPN bus on every MapResponse that added or removed a peer. There were two O(N) costs per delta: the full netmap rebuild + every Notify.NetMap encode to every bus watcher. This change tackles both: 1. Plumb O(1) peer add/remove through the delta path. PeersChanged and PeersRemoved no longer prevent the delta happy path; instead, they mutate the per-node-backend peer map in place. 2. Restrict ipn.Notify.NetMap emission to the platforms whose host GUIs still depend on it (Windows, macOS, iOS) and migrate in-tree consumers off it everywhere else: - Migrate reactive consumers (containerboot, kube agents, sniproxy, tsconsensus, etc.) off Notify.NetMap to the previously-added Notify.SelfChange signal so they no longer have to subscribe to the full netmap. - Add ipn.NotifyNoNetMap so GUI clients on "legacy-emit" platforms that have already migrated can opt out of the per-watcher NetMap encode. - Gate Notify.NetMap emission on the producer side by a compile- time GOOS check, so the supporting code is dead-code-eliminated on Linux and other geese where no GUI consumer needs it. Re-running BenchmarkGiantTailnet from tstest/largetailnet, which was added along with baseline numbers on unmodified main in `ad5436af0d`, the per-delta cost (one peer add+remove pair) is now ~O(1) regardless of tailnet size N: N no-watcher (ms/op) bus-watcher (ms/op) before now factor before now factor 10000 32 0.11 300x 166 0.13 1300x 50000 222 0.11 2000x 865 0.13 6700x 100000 504 0.12 4100x 1765 0.13 13400x 250000 1551 0.12 12500x 4696 0.15 32400x Updates #12542 Change-Id: I94e34b37331d1a8ec74c299deffadf4d061fda9e Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-21 09:26:19 -07:00
Simon Law	7ebca58042	net/traffic,ipn/ipnlocal: extract traffic steering utilities (#19682 ) The traffic package contains helpers for evaluating traffic steering scores and picking appropriate nodes. These were extracted from ipnlocal.suggestExitNodeUsingTrafficSteering so they can be reused by the new routecheck package to probe exit nodes in priority order. Updates #17366 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-21 08:28:27 -07:00
Brad Fitzpatrick	f3a117e813	net/tsdial: run happy eyeballs across A and AAAA in UserDial When tailscaled is running in userspace-networking mode behind an exit node (e.g. as a SOCKS5 proxy), it resolves a hostname and then dials a single resolved IP through the tunnel. If the name has both A and AAAA, Go's net.Resolver merges them and we pick ips[0], which on an IPv6-native host is usually AAAA. If the exit node has no IPv6 egress (or vice versa), the dial fails silently through the tunnel and the user sees a hang. Resolve all candidates and race connect attempts across address families with a 300ms happy-eyeballs delay, matching Go's net.Dialer default and the existing pattern in net/dnscache (commit `ee0a03b14`). First success wins; losers are cancelled and any conns they produce are closed. A failBoost channel wakes the launcher when a connect fails fast (e.g. ICMP "no route" via the tunnel) so we don't sit on the 300ms timer when the answer is already known. userDialResolve is refactored into userDialResolveAll (returns the full candidate list) plus a thin single-IP wrapper for callers like UserDialPlan that don't race. UserDial's per-IP dispatch (netstack vs peer dialer vs SystemDial vs std) is extracted to dialOneUser so each candidate can route correctly on its own merits. Also fix serveDial in localapi to pass the original hostname to UserDial rather than a pre-resolved IP, so the race can fire. This fix is single-ended: it works against any exit node, including old ones, with no protocol changes. The trade-off versus filtering on the exit-node side via PeerAPI DoH is that every dial through an unreachable-family exit node costs one failed connect attempt per cache window, rather than zero, which is acceptable given the simplicity. Fixes #19792 Fixes #13257 Change-Id: I9d7645d0034caf3ee22ecdd8070798353f77e94b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-20 18:35:55 -07:00
M. J. Fromberger	c09407002f	ipn/ipnlocal/netmapcache: add UpdateSelfOnly method (#19818 ) Some netmap updates are guaranteed to affect only the "static" parts of the netmap, and so should not require us to walk through all the peers and user profiles when updating the cache. To support this, the new UpdateSelfOnly method updates only the Self node and other tailnet settings that are not dependent on the peers and profiles. Use this when updating the cache on DERP home changes. Updates #12542 Change-Id: Ifed522b29d579fb76e010b4ff738cc4e0a72d27f Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-05-20 16:29:04 -07:00
Simon Law	93dbd33ef7	ipn/ipnlocal: stub system interfaces for TestShouldUseOneCGNATRoute (#19807 ) The TestShouldUseOneCGNATRoute test fails when the underlying system interfaces don’t match what the underlying assumptions of the test. That assumption was that there would only ever be one CGNAT interface: the Tailscale one. This breaks on Linux when border0 is installed because border0 also creates an interface with a CGNAT route. This patch stubs netmon.RegisterInterfaceGetter to replace the system interfaces and netmon.SetTailscaleInterfaceProps to identify the test data that defines the Tailscale interface. This patch also tests the control knob override for CGNAT for every combination of operating system and system interfaces, instead of just a couple of combinations. Fixes #19731 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-20 16:00:14 -07:00
Alex Chan	0cb432ed84	all: update more references to Tailnet/Network Lock Updates tailscale/corp#37904 Change-Id: I09e73b3248b9ddf86dafe33dfb621bd560f6596d Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-15 16:23:50 +01:00
Adriano Sela Aviles	41286c2b56	ipn/ipnlocal,tsd: add NoiseRoundTripper to tsd.Sys Adds a new NoiseRoundTripper field to tsd.Sys to expose an http.RoundTripper to make requests over the control plane Noise connection. This will be used in PAM use cases soon. Updates tailscale/corp#41800 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>	2026-05-13 14:56:28 -07:00
Simon Law	6467f0d067	ipn/ipnlocal: fix minor typo in shouldUseOneCGNATRoute (#19719 ) This fixes a log message where ipn/ipnlocal.shouldUseOneCGNATRoute would claim that an android machines was actually macOS. Updates #cleanup Updates #19652 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-12 21:55:29 -07:00
Adriano Sela Aviles	72578de033	ipn/{ipnlocal,localapi},client/local: add per-dst cap resolution for services Adds two new cap resolution methods alongside the existing PeerCaps: PeerCapsForService(src netip.Addr, svcName tailcfg.ServiceName) resolves the service name to its VIP addresses via the node's service IP mappings and returns caps scoped to that service. Exposed on /v0/whois via the svc_name query parameter and on client/local.Client as WhoIsForService. PeerCapsForIP(src, dst netip.Addr) resolves caps against an arbitrary destination IP. Exposed on /v0/whois via the svc_addr query parameter and on client/local.Client as WhoIsForIP. svc_name takes priority over svc_addr when both are present. Invalid values for either return 400. The existing PeerCaps/WhoIs path is unchanged: without a service parameter, WhoIs returns only host-level caps. Updates tailscale/corp#41632 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>	2026-05-12 15:50:39 -07:00

1 2 3 4 5 ...

2087 Commits