Commit Graph

2091 Commits

Author SHA1 Message Date
Brad Fitzpatrick
1d69894084 ipn/ipnlocal, drive: stop using netmap.NetworkMap in Taildrive too
This applies the same treatment from PR #20162 (netlog) and
PR #20171 (wglog) to the local Taildrive filesystem wiring, ending the
per-netmap-update O(n) rebuild of the drive remotes list.

This moves the O(n peers) taildrive-remote list rebuild from every
peer change (which previously happened regardless of whether you were
even using taildrive) to instead happen only as needed.

That running on every netmap update and was a contributor to the
broader quadratic behavior we want to eliminate when a single peer is
added or removed.

Instead, this introduces drive.RemoteSource, a small interface the
Taildrive filesystem pulls from lazily on incoming WebDAV requests,
and caches by a generation counter. ipn/ipnlocal installs a
driveRemoteSource once at NewLocalBackend time and bumps
LocalBackend.driveGen on the three events that can actually flip the
drive-capable peer set: full netmap installs (domain + self caps),
UpdateNetmapDelta (peer add/remove or per-peer address changes), and
updatePacketFilter (since PeerCapability values are derived from the
packet filter rules, not from peer.CapMap).

The hook itself is kept but narrowed: it no longer takes a
*netmap.NetworkMap and its only remaining job is to re-notify IPN bus
listeners of the current local shares list on full installs.

This is a dependency to removing the netmap.NetworkMap type from
upstream callers, like wgengine.Engine in general.

(Also add a bunch more tests)

Updates #12542

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I7e3d2f5b4a9c8e1d6f0a3b7c9e2d4f8a1b6c5e9d
2026-06-23 10:41:50 -07:00
Brad Fitzpatrick
988b0905bb wgengine/wglog: stop using netmap.NetworkMap here too
This applies the same treatment from 8f210454dd (netlog) to wglog,
ending use of netmap.NetworkMap and instead getting the canonical data
from LocalBackend/nodeBackend.

This is a dependency to removing the netmap.NetworkMap from
upstream callers, like wgengine.Engine in general.

Updates #12542

Change-Id: Icb5af0799322def048a6f594b49f7d11273f025d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-23 09:06:37 -07:00
Brad Fitzpatrick
af2f228a18 ipn/ipnlocal, types/netmap, tsnet: filter unsigned peers on delta path
aa5da2e5f2 (in the 1.99.x dev series, unstable) introduced some bugs,
only some of which were later fixed. This fixed another. As of that
change, tkaFilterNetmapLocked ran only on full netmaps through
LocalBackend.setClientStatusLocked and not peer upserts via new or
changed peers. The later ae743642d9 fixed a regression in the
Engine layer but didn't fix the tkaFilter code from re-running on
upserts.

This add a tkaFilterDeltaMutsLocked pass before
nodeBackend.UpdateNetmapDelta. For each NodeMutationUpsert whose
peer fails the same signature check tkaFilterNetmapLocked applies,
rewrite the upsert in place into a NodeMutationRemove targeting the
same node ID, so magicsock's per-mutation dispatch and
nodeBackend.peers both drop the peer, matching the prior full-netmap
semantics.

New tsnet tests added:

  - TestTailnetLockFiltersUnsignedDeltaPeer covers the new-peer
    case.
  - TestTailnetLockFiltersUnsignedDeltaPeerReplacement covers the
    existing-peer-replacement case, to an empty signature.
  - TestTailnetLockFiltersDeltaPeerWithInvalidSignature like above
    but with a bogus signature.

Updates #12542
Updates tailscale/corp#43767

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: Ib35d0391541fee654867c26489847dbc5b7e2ae8
2026-06-23 08:12:36 -07:00
Anton Tolchanov
f442cda999 ipn/ipnlocal: consider all DERP regions for exit node recommendations
When recommending an exit node, suggestExitNodeLocked ranks candidates by
the latency to their home DERP region, taken from the most recent netcheck
report. But netcheck alternates between full reports, which probe every
region, and incremental reports, which only re-probe the home region and a
handful of the fastest regions. When the most recent report is incremental,
the suggestion fell back to a random for exit nodes that are far away.

Now we rank candidates against the best recent latency, tracked by the
`netcheck.Client` - the same data that is used to pick the preferred
DERP. It uses a history of measurements which includes a full netcheck
report, so should cover all DERP regions.

Updates tailscale/corp#17516

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2026-06-22 12:28:09 +02:00
Simon Law
00b9e8d8ce ipn: add fmt.Stringer support to NotifyWatchOpt (#20072)
This patch adds support for the fmt.Stringer interface to the
ipn.NotifyWatchOpt enum. This is useful when debugging these bitmasks.

For example:

	fmt.Printf("%s", ipn.NotifyPeerChanges | ipn.NotifyNoNetMap)
	// Output: (ipn.NotifyPeerChanges | ipn.NotifyNoNetMap)

Fixes #20066

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-18 10:27:16 -07:00
Alex Chan
c3c2aa7093 all: don't repeat the the word "the" unnecessarily
Updates #cleanup

Change-Id: Ic1f430cd5dbf6cc1a385c59074a5d5cabe6fca57
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-06-18 16:32:08 +01:00
Brad Fitzpatrick
8f210454dd wgengine/netlog: stop using netmap.NetworkMap type, use LocalBackend
The Logger previously took a *netmap.NetworkMap at Startup and on every
ReconfigNetworkMap call, denormalizing it into per-IP and self lookup
maps. That denormalization is O(n) over all peers and ran on every
netmap update, contributing to the broader quadratic behavior we want
to eliminate when a single peer is added or removed.

Instead, this makes netlog ask LocalBackend (well, nodeBackend) for
the info it needs, letting us remove the netmap.NetworkMap type
entirely from the netlog package.

This is a dependency to removing the netmap.NetworkMap type from
upstream callers, like wgengine.Engine in general.

Updates #12542

Change-Id: Ib5f2de96e788a667332c0a6f7ac833b3d0053b5c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-17 15:11:57 -07:00
Bobi Gunardi
ca20611d11 util: add parse fallback helpers (#20022)
util/def: add def.Bool and def.Duration default parse helpers

Replace multiple instances of def.Bool and def.Duration with a new util/def
package.

Updates #20018

Co-authored-by: Bobby <boby@codelabs.co.id>
Co-authored-by: Simon Law <sfllaw@tailscale.com>
Signed-off-by: Bobby <boby@codelabs.co.id>
Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-15 15:58:51 -07:00
Simon Law
eddd019ee4 ipn/ipnlocal: protect populatePeerStatusLocked from nil Hostinfo (#20150)
ipnlocal.LocalBackend.populatePeerStatusLocked assumed that Hostinfo
was always valid, but that’s not always true, especially in tests.
ipnlocal.peerAPIPorts suffered from a similar assumption.

This patch checks for NodeView.Valid and Hostinfo.Valid; assuming the
zero value as a safe default.

Updates #8948
Updates #12542

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-15 13:14:12 -07:00
Brad Fitzpatrick
6596d237a3 ipn/ipnlocal: add wireguard session state metrics + publish on IPN bus
Updates #19989
Updates tailscale/corp#42874

Change-Id: I843ed95bc7b0f5cd38ba1467332c6b022901e254
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-15 11:41:18 -07:00
Brad Fitzpatrick
ae743642d9 ipn/ipnlocal: revert earlier change, force Reconfig + SetNetworkMap new/removed peers
The earlier aa5da2e5f2 made peer adds and removes through a netmap
delta path that mutates only nodeBackend, on the assumption that
PeerForIP, lookupPeerByIP, the engine's wireguard config
(e.lastCfgFull), the engine BART, wgdev's PeerLookupFunc closure, and
the engine's cached netmap (e.netMap) would all stay correct without
further updates.  They don't. I'd totally forgotten that
Engine.PeerForIP has its own alternate IP-to-peer lookup codepath.

Concretely, all of these failed for a peer that arrived via
[tailcfg.MapResponse.PeersChanged] (and never via a full
[tailcfg.MapResponse.Peers] list):

  - [wgengine.Engine.PeerForIP] read from e.netMap and e.lastCfgFull
    (neither updated on the delta path) and so missed the new
    peer. The rando non-data-plane callers (Ping, TSMP, pendopen,
    debug endpoints, tsdial.Dialer.UseNetstackForIP for tsnet and
    onlyNetstack tailscaled) all returned "no matching peer".

  - The engine BART (built from e.lastCfgFull) missed the new peer's
    subnet routes / exit-node default routes.

  - wgdev's [device.PeerLookupFunc] closure (rebuilt only inside
    wgcfg.ReconfigDevice) didn't have the new peer's noise key, so
    outbound encryption to the new peer dropped the packet even when
    SetPeerByIPPacketFunc returned the right NodePublic.

  - And nothing in the delta path triggered NodeMutationRemove to
    flow through to authReconfig either, so the same stale state
    pointed at removed peers indefinitely.

So just (functionally) revert it for now, to have something easily
cherry-pickable to the 1.100 release branch. Proper fixes can come later
for the next release.

This also adds three new tests:

  - TestPingPeerLearnedViaDelta runs disco and TSMP subtests over a
    delta-added peer with only self addresses. disco exercises the
    cold PeerForIP path (magicsock); TSMP exercises the full data path
    through wgdev encryption. Both fail without this fix.

  - TestPingSubnetRouteOfDeltaPeer exercises a subnet-router peer
    arriving via delta. With s1 in --accept-routes mode, an IP
    inside the advertised CIDR must resolve to s2 and a TSMP ping
    must round-trip. Hits the BART + lastCfgFull + wgdev staleness
    in one go.

  - TestPingSelfReturnsIsLocalIP is a regression guard for the
    IsSelf early-out in Engine.Ping. Passes on main today; included
    here so future refactors of PeerForIP can't regress self
    handling without test breakage.

Updates tailscale/corp#43394

Change-Id: I7a049271359bd73e7147ae9e2554e85614c2b8d2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-15 11:41:01 -07:00
M. J. Fromberger
f002f6bb3a ipn/ipnlocal: remove logs for peer delta cache updates (#20145)
Added in #20111, but it is too noisy under real load to be useful.

Updates #12542

Change-Id: Ib99a8966ade0bfa4281fccc057249819cdcdfe83
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-06-15 10:00:03 -07:00
Brendan Creane
c48f953840 cmd/tailscale/cli, ipn/conffile: accept legacy serve config in set-config (#20056)
tailscale serve set-config now also accepts the legacy raw ipn.ServeConfig
format (as emitted by `tailscale serve status --json` and consumed via
TS_SERVE_CONFIG, which has no "version" field), so the common
serve-status-edit-set workflow stops failing. Only the services-oriented
content is applied; any node-level fields are skipped with a warning to
stderr pointing users at get-config to migrate.

Fixes tailscale/corp#39793

Signed-off-by: Brendan Creane <bcreane@gmail.com>
2026-06-12 18:52:17 -07:00
M. J. Fromberger
9cb071666c ipn/ipnlocal: update netmap cache after peer deltas are applied (#20111)
Add an UpdatePeers method to the cache. This allows us to support netmap peer deltas,
by allowing just the peers to be updated in an existing cache. As a safety check, reject
an update if there was no base netmap data to apply a change to.

Then, when processing peer mutations in the backend, capture any changes that should
be applied to the cache and update it, if one is enabled.

Updates #12542

Change-Id: I2f8790a8fdc5e85fce6700ba4821a8cb10dddffa
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-06-12 09:41:00 -07:00
Raj Singh
241456ab57 ipn/ipnlocal: add metrics for inbound and outbound bytes on Serve connections (#19991)
Adds tailscaled_serve_{inbound,outbound}_bytes_total, labeled by Tailscale
Service name, by wrapping the peer-facing conn in tcpHandlerForVIPService.
Per-service counters persist for the process lifetime rather than being
evicted on serve-config changes.

Fixes #19572

Signed-off-by: Raj Singh <raj@tailscale.com>
Co-authored-by: Ethan Smith <ethan.smith@grafana.com>
2026-06-12 05:49:00 -05:00
Avery Pennarun
6a822dcc36 control/controlclient: continue map poll during key expiry to receive extensions
When a client's node key expires and the user clicks "Login" (or runs
`tailscale up`), the Login() method was cancelling the map poll context.
This caused key extension notifications from the server to be lost,
leaving clients stuck in NeedsLogin state even after an admin extended
their key.

The fix has three parts:

1. Login(): Don't cancel mapCtx if we have valid credentials (loggedIn=true)
   or a valid node key. This allows the map poll to continue receiving
   server notifications while the auth flow proceeds in parallel.

2. mapRoutine(): Poll when we have a node key, even if !loggedIn. This
   handles the tsnet restart scenario where control returns an AuthURL
   (so loggedIn=false) but we still have a valid node key that can
   receive map updates.

3. sendStatus()/UpdateFullNetmap(): Forward netmaps when we have a node
   key, not just when loggedIn. This ensures the backend sees key expiry
   changes even when the auth flow hasn't completed.

"First successful flow wins": if a key extension arrives via map poll,
the client recovers automatically. If the auth flow completes first,
that works too. Either way, the client is no longer stuck.

This aligns with the SeamlessKeyRenewal philosophy: maintain connectivity
paths while authentication proceeds, allowing server-initiated recovery.

Fixes #19326

Change-Id: I26dbbc1fa7c1159ba075362e44d02814355d6b44
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-11 19:21:09 +01:00
Brad Fitzpatrick
1deb6a8449 ipn: add no-disconnect in-process bus subscribers
Add NotifyInProcessNoDisconnect for in-process IPN bus subscribers that
must apply every bus update. When such a subscriber falls behind, block
Notify production instead of sending the terminal fell-behind message and
closing the watch.

This is intentionally not available over LocalAPI, where a slow or stuck
out-of-process client should still be disconnected rather than allowed to
stall tailscaled. In-process callers that use the bit must keep their
callbacks fast and must not call back into LocalBackend from the callback.

Updates #20062

Change-Id: I730ad61a07475243bb226fba2262c1a3ded211ae
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-09 12:51:38 -07:00
Brad Fitzpatrick
edcc2c94d9 ipn: enforce lossless IPN bus delta streams
New-style IPN bus subscribers consume stateful delta streams. Reject
NotifyRateLimit when it is combined with those subscription bits so
tailscaled cannot merge or delay messages that clients need to apply in
order.

Also stop silently dropping notifications when a watcher falls behind.
Remove the watcher, replace its stale queue with one terminal ErrMessage
notification, and close the watch.

Updates #20062

Change-Id: Id9d402ea76f4011cd23f122adf62f30dd4b6f90b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-09 11:12:20 -07:00
Alex Chan
65a117184b all: rename NetworkLock functions/types to TailnetLock
To avoid breaking downstream code, add deprecated aliases for all the
old names.

Updates tailscale/corp#37904

Change-Id: I86d0b0d7da371946440b181c665448f91c3ef8d2
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-06-08 13:14:28 +01:00
Mike O'Driscoll
6a709216b9 ipn/ipnlocal,wgengine/magicsock: re-report NetInfo to new control client (#20025)
magicsock de-duplicates NetInfo callbacks against c.netInfoLast, a cache
that lives on the long-lived magicsock.Conn. That cache survives a control
client swap (interactive login or profile switch), where only the control
client (and its own per-client NetInfo dedup) is replaced. As a result, the
first netcheck after the swap produces a structurally-identical NetInfo
(same PreferredDERP, same NAT shape), magicsock suppresses it as unchanged,
and the new control session never learns our home DERP. Peers can't reach
the node over DERP until some unrelated NetInfo field happens to change.

Add Conn.ResetNetInfoLast to clear the dedup cache, and call it from
LocalBackend.setControlClientLocked whenever a control client is installed,
so the next netcheck re-reports the current NetInfo to the new client.

netInfoLast is only a dedup/optimization cache (all readers nil-guard, and
it is recomputed by every netcheck), so clearing it can only add a delivery,
never lose or misroute one; it is scoped to control-client lifecycle events,
not steady-state operation.

Updates #17887
Fixes #20024

Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
2026-06-05 13:36:00 -04:00
Harry Harpham
fa542426e5 ipn,ipn/localapi: require local admin to serve Unix domain sockets
This resolves a local privilege escalation (LPE). Prior to this change,
a non-admin user could utilize serve to access local Unix sockets they
otherwise should not be able to access. For example,

  tailscale serve --http 80 unix:/var/run/docker.sock

would give the user access to the Docker socket (usually root only).
This works because tailscaled has root access and implements the proxy
to the socket (see also: 'the confused deputy problem').

We resolve the problem by refusing to serve Unix targets altogether
unless instructed to by a root user.

Thanks to Tim Sageser (dtrsecurity) for this report.

Fixes tailscale/corp#41998

Signed-off-by: Harry Harpham <harry@tailscale.com>
2026-06-03 09:45:02 -06:00
Brad Fitzpatrick
c91b7188e8 ipn/localapi,tstest/natlab: fix debug derp TLS check for sha256-raw CertName
serveDebugDERPRegion built its TLS config with
ServerName: cmp.Or(derpNode.CertName, derpNode.HostName), which for a
"sha256-raw:<hex>" CertName passed the raw fingerprint to Go's stock
verifier as a hostname; the handshake always failed with a hostname
mismatch. This is the second half of #15579; the first half (tailscaled
itself failing with "unexpected multiple certs presented") was fixed in

Extract a tlsConfigForNode helper that mirrors derphttp.Client.tlsClient
so that sha256-raw and domain-fronting CertName values are dispatched
to tlsdial.SetConfigExpectedCertHash and tlsdial.SetConfigExpectedCert
respectively, falling back to HostName when CertName is empty.

The core fix here was originally written by @imnuke in #19965; that PR
also added a unit test in ipn/localapi/debugderp_test.go which is
replaced in this commit by a new vmtest that exercises the whole stack:
vnet now serves a self-signed cert valid for each fake DERP node's
HostName and exposes its SHA-256 fingerprint, and vmtest grows a new
SelfSignedDERPCertPinning EnvOption that swaps the test DERP map's
nodes to CertName="sha256-raw:<hex>" with InsecureForTests cleared.
TestSelfSignedDERPHashPinning then stands up two hard-NAT'd nodes, has
them communicate over DERP, and calls DebugDERPRegion on each. Before
this fix the test fails with the exact x509 hostname-mismatch error
from the original bug; after, it passes.

Updates #15579

Change-Id: I61f38ffebc7ac5abc962639db1ae88f5cd8633b1
Co-authored-by: Nuke <nuke@imnuke.dev>
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-02 12:02:40 -07:00
Brad Fitzpatrick
52400dc6f4 ipn/ipnlocal: add back a watchdog after earlier removal from engine
Commit 2b338dd6a8 removed watchdogEngine because it was weird
(so many methods) and increasingly unnecessary after we'd cleaned up
and simplified so much of the locking.

This adds back a watchdog, but an easier to maintain one that's more
idiomatic.

Updates #19759

Change-Id: I86c458473e126c0809f37696446ce7acf4cc4eb9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-02 11:57:12 -07:00
Brad Fitzpatrick
a6ab7efa4f ipn/ipnlocal, cmd/tailscale/cli: auto-renew TLS certs and warn while pending
The Tailscale daemon only refreshed TLS certs as a side effect of inbound
TLS handshakes or "tailscale cert" CLI calls. A node that doesn't see
inbound traffic during the renewal window silently rolls past expiry.

Add a once-per-hour background loop on LocalBackend that enumerates Serve
and Funnel HTTPS hostnames (filtered against the netmap's CertDomains so
we don't poke ACME for other nodes' service hostnames) and calls the
existing GetCertPEM path. The renewal decision (ARI window, then 2/3
expiry fallback) is unchanged; the loop just guarantees it runs.

For visibility during initial issuance or restart with a long-expired
cached cert, add a "tls-cert-pending" health Warnable that's set while
ACME is in flight and no usable cached cert exists. Async renewal of a
still-valid cert intentionally doesn't fire it. And then make the CLI "cert"
subcommand print out a warning if it's blocking due to a cert fetch
in flight, using that health info.

Fixes #19911
Fixes #19912

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I144e46c40e957b2e879587decace32a523a6eade
2026-06-01 16:31:54 -07:00
Simon Law
92bfda580c cmd/tailscale/cli: fix time in tailscale routecheck (#19956)
When running `tailscale netcheck`, the reported timestamp used to be
in UTC and formatted according to RFC 3339 with a `T` to separate the
date from the time:

	sfllaw@h2co3:~$ tailscale netcheck | head -n3

	Report:
		* Time: 2026-06-01T21:12:32.252620138Z

This is machine-readable time leaking out to the user interface. Times
in normal commands are formatted for humans to read:

	sfllaw@h2co3:~$ date
	Mon 01 Jun 2026 02:39:14 PM PDT
	sfllaw@h2co3:~$ journalctl -t tailscaled | tail -n1
	Jun 01 14:35:21 h2co3 tailscaled[3328921]: wgengine: sending TSMP disco key advertisement to 100.90.144.102
	sfllaw@h2co3:~$ timedatectl show
	Timezone=America/Los_Angeles
	LocalRTC=no
	CanNTP=yes
	NTP=yes
	NTPSynchronized=yes
	TimeUSec=Mon 2026-06-01 14:38:32 PDT
	RTCTimeUSec=Mon 2026-06-01 14:38:32 PDT
	sfllaw@h2co3:~$ uptime --since
	2026-05-15 07:37:45

This PR makes the times printed by the CLI commands consistent:

- For `tailscale routecheck`, it now prints local time as
  `2026-05-15 07:37:45-07:00`.
- For `netlogfmt`, it has always printed local time with a space,
  but now includes the time zone.
- All machine-readable outputs continue to be standard RFC 3339 in
  UTC, i.e. `--format=json`.

As part of a general cleanup, this PR also adds standard common
time.Format layouts as tstime constants.

Fixes #19928

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-01 16:12:08 -07:00
Simon Law
28801674a6 net/routecheck: introduce new package for checking peer reachability (#19639)
The routecheck package parallels the netcheck package, where the
former checks routes and routers while the latter checks networks.
Like netcheck, it compiles reports for other systems to consume.

Historically, the client has never known whether a peer is actually
reachable. Most of the time this doesn’t matter, since the client will
want to establish a WireGuard tunnel to any given destination.
However, if the client needs to choose between two or more nodes,
then it should try to choose a node that it can reach.

Suggested exit nodes are one such example, where the client filters
out any nodes that aren’t connected to the control plane. Sometimes an
exit node will get disconnected from the control plane: when the
network between the two is unreliable or when the exit node is too
busy to keep its control connection alive. In these cases, Control
disables the Node.Online flag for the exit node and broadcasts this
across the tailnet. Arguably, the client should never have relied on
this flag, since it only makes sense in the admin console.

This patch implements an initial routecheck client that can probe
every node that your client knows about. You should not ping scan your
visible tailnet, this method is for debugging only.

This patch also introduces a new OnNetMapToggle hook, which fires when
the netmap transitions from nil to non-nil, or vice versa. This
happens either when the client receives its first MapResponse after
connecting to the control plane, or when it clears the netmap while it
is disconnecting. Routecheck uses this to wait for a valid netmap
so it knows which peers to probe.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-01 10:33:08 -07:00
Brad Fitzpatrick
2ba426802f ipn/ipnlocal: fix 'tailscale status --peers=false' missing user profile
Fixes #19894

Change-Id: I310504987170e0742480c8a02706eb0dbf4ec3dc
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-31 20:34:43 -07:00
Brad Fitzpatrick
3e34e721e8 tsnet: add opt-in SSH support (Server.ListenSSH)
This adds tsnet.Server.ListenSSH which, if the SSH feature is linked,
returns a net.Listener whose Accept yields *tailssh.Session values (as
net.Conn). This lets tsnet apps accept incoming SSH connections to
implement custom TUI applications.

Basic apps can use net.Conn directly (Read/Write/Close). Rich apps
import ssh/tailssh and type-assert for peer identity, PTY, signals,
etc. If feature/ssh isn't imported, ListenSSH returns an error.

Includes a demo guess-the-number game in tsnet/example/ssh-game.

Updates tailscale/corp#37839

Change-Id: I4e7c3c96afb030cdf4da8f2d8b2253820628129a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-30 14:17:50 -07:00
Fran Bull
c9333854fb appc,feature/conn25: use custom scheme resolvers for conn25
Currently we are picking a peer for the split dns routes when we get a
netmap. Use the new custom scheme resolvers, installed per app in the
config in the netmap, to allow us to choose which connector peer should
handle a DNS request at the time the request is made.

Fixes tailscale/corp#39858

Signed-off-by: Fran Bull <fran@tailscale.com>
2026-05-29 12:23:47 -07:00
kari-ts
7355116c05 ipn/store: make WriteState(id, nil) delete key instead of adding nil entry (#19920)
All StateStore implementations store a nil value in the cache map when WriteState is called with a nil byte slice instead of deleting the key. This causes ReadState to return (nil, nil) instead of (nil, ErrStateNotExist), since the key is still present in the map.

This breaks reset-auth in Windows, Linux, and Android, and the node can't log back in without manually editing the state file. (macOS uses a different state store)
DeleteProfile, DeleteAllProfilesForUser, setUnattendedModeAsConfigured are impacted but don't seem to break because the deleted keys are not reread.

This deletes the key from the cache instead.

Fixes tailscale/corp#42477

Signed-off-by: kari-ts <kari@tailscale.com>
2026-05-29 11:22:14 -07:00
Brad Fitzpatrick
412c812d76 ipn/ipnlocal: use ACME ALPN for authorized Funnel non-CertDomain domains
If a user explicitly adds a non-ts.net (not a CertDomain domain) domain
like "foo.com" to their serve config as a web target that's also an allowed
funnel domain (using raw "tailscale serve set-config"), then use the new
ALPN cert fetching (from b553969b) to get certs for that domain.

This is just plumbing; there's no new product functionality to
actually enable this easily client-side, and it also has no visible
product surface to enable it server-side.

Updates tailscale/corp#41736

Change-Id: Ie2e421ac9611bce64bba3de6a454b2d505ea0e8a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-28 13:33:45 -07:00
Alex Chan
9d126aec34 all: remove network lock references from private method names
Updates tailscale/corp#37904

Change-Id: I312d46d958209ca3d1152d1877fb91a57c91798d
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-28 18:00:36 +01:00
Brendan Creane
8d90a6ab1e ipn/ipnlocal: add HTTP/2 Content-Type tests for serve reverse proxy (#19905)
Adds two tests exercising the HTTP/2-inbound -> plaintext HTTP/1.1 backend
path through serve's reverseProxy and through the full serveWebHandler
entry point (with a funnel serveHTTPContext).

Updates #19866

Signed-off-by: Brendan Creane <bcreane@gmail.com>
2026-05-28 09:46:36 -07:00
Alex Chan
f4a280cdbd all: update a few more references to network/tailnet lock
Updates tailscale/corp#37904

Change-Id: I746b06328e080fa2b9ff28a2d099f95645aa3d0b
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-28 16:44:16 +01:00
Alex Chan
446ae97491 ipn: improve --exit-node hostname error during startup
When parsing the `tailscale up --exit-node=ARG` argument, we try to
resolve hostnames by searching the list of peers. However, at startup,
the peer list is empty, causing hostname lookups to trivially fail with
an unhelpful "invalid value" erorr.

Improve the error message when the peer list is empty to inform the user
that hostnames cannot be resolved during startup, and advise them to use
the exit node's Tailscale IP address instead.

Also, clarify that hostnames must be peer hostnames, not arbitrary
hostnames.

Fixes #19882

Change-Id: I9390a427c2863d657cf46c5e33b43cb3c5363764
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-28 16:43:45 +01:00
Brad Fitzpatrick
c9fb05b6f5 ipn/ipnlocal: don't dup-suppress UserProfiles on IPNBus on profile switches
Fixes #19889

Change-Id: I324a735c13772c0c79ed7392c0baa5064b34823b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 14:47:02 -07:00
Brad Fitzpatrick
364b952d62 cmd/containerboot: track peers from IPN bus updates, stop using netmap.NetworkMap
Some tests in another repo were broken by tailscale/tailscale#19607.
This fixes them, by finishing off the rest of the migration away from
netmap.NetworkMap on the IPN bus in containerboot.

Containerboot used to rebuild a full NetworkMap-shaped view while
reacting to IPN bus notifications. Now it insteads has its own
netmapState type (immutable) of exactly what it needs to track, and
sends those immutable values around, making cheap edits of new
immutable values when an IPN bus edit arrives.

This should make cmd/containerboot scale to much larger tailnets now too.

Fixes #19852
Fixes tailscale/corp#42347
Updates #12542

Change-Id: I88adaf061f85f677f954a764935e6654329d75a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 14:12:48 -07:00
Brad Fitzpatrick
b553969b03 ipnlocal: try ACME TLS-ALPN for Funnel renewals
Use TLS-ALPN-01 for Funnel certificate renewals only when the node
already has a cached certificate, and fall back to DNS-01 with a fresh
order if the ALPN path is unavailable or fails.

Dynamically advertise acme-tls/1 only while an ACME challenge
certificate is pending, and add client metrics for DNS-01 and
TLS-ALPN-01 start/success/failure paths.

Updates tailscale/corp#41736
Fixes tailscale/corp#42320

Change-Id: I5adc6ea129237f9ef592f84fc1a8953c80bc9d5c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 09:30:23 -07:00
Brad Fitzpatrick
2c965ab540 types/netmap, ipn/ipnlocal, control/controlclient: rename NodeMutationAdd to NodeMutationUpsert
NodeMutationAdd was a misleading name: a PeersChanged entry in a
MapResponse can represent either a truly new peer or a full
replacement for an existing peer that couldn't be expressed as a
PeerChangedPatch. Calling it "Add" implied it was always a completely
new node, which is wrong.  (I'd changed my mind on the design of
mapping add/delete events to NodeMutations halfway through #19607 and
forgot to update the name, even though I'd updated half the docs)

Rename it to NodeMutationUpsert to reflect the actual semantics: the
node should be inserted or replaced in the peer map regardless of
whether it already existed.

Updates #19607
Updates #12542

Change-Id: Iebd3daddb3318cba02e115a1b184fcb3ee8f83d6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 08:37:14 -07:00
Brad Fitzpatrick
a8f40a2ca5 ipn/ipnlocal: add missing bus notify of peers on full netmap
The prior aa5da2e5f2 ("process node adds/removes in constant
time") commit missed a bus notification case, where new-style
subscribers set NotifyNoNetmap and then the controlclient map routing
sends a full update (rather than a delta). Those profiles + peers
need to be put on the bus too.

I noticed this only when porting the Android app over to use the
new bus stuff.

Updates #19607
Updates #12542

Change-Id: I82c35011d2c532222ca27f7d4e790522c31bd156
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 08:03:47 -07:00
Jordan Whited
e5a8cf3b18 control/controlknobs,feature/*,ipn/ipnlocal,tailcfg: add runtimemetrics
Emit runtime metrics as clientmetrics when the
NodeAttrEmitRuntimeMetrics NodeCapability is present.

We start small with just 2 metrics: heap bytes and total process memory.

Updates tailscale/corp#39434

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-05-26 16:02:01 -07:00
Simon Law
da8cd5cc7f ipn/ipnlocal: fix documentation typo, NodeAttrCacheNetworkMaps (#19851)
Updates #cleanup

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-22 22:19:10 -07:00
Simon Law
988615dbad ipn/ipnlocal,tstest/integration: pause the control client consistently (#19846)
There are two places where tailscaled transitions into a paused state:
1. tailscaled’s controlclient is initially created,
2. tailscale down, or the GUI equivalent, commands it to.

This patch unifies the implementation of both scenarios into
LocalBackend.shouldPauseControlClientLocked to prevent the
implementation from drifting.

The flaky tstest/integration.TestNoControlConnWhenDown test exposed
this mismatch, but only by accident. This patch also changes
TestNode.MustDown so that it runs `tailscale down` and then waits for
the testcontrol server to finish handling any associated /machine/map
requests.

Fixes #19831

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-22 17:58:44 -07:00
Brad Fitzpatrick
5295e3e119 ipn/{ipnstate,ipnlocal}: add integer NodeID to PeerStatus
In aa5da2e5f2 we made the IPN bus include deltas, including the
PeersRemoved, sending a slice of integer NodeIDs that were
removed. But when updating xcode, I realized there was no way to map
those integers to the stable node IDs used in other places.

I was consdering changing the just-added ipn.Notify.PeersRemoved from
an IntID to a string StableID, but then it doesn't match the MapResponse
wire protocol, which we've tried to match so far.

Instead, just add the integer ID as well. Callers can use whichever
world they want, having both. It's a little regrettable that we still
have two worlds of IDs, but oh well. Neither is really suitable to a
hypothetical future fully federated world of control servers anyway,
so we'll need a third type later anyway, so just live with the two we
have for now.

Updates #12542

Change-Id: Ib8fd48a265e1da1f8779152f141f624a7f7260e9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-22 08:16:55 -07:00
Simon Law
7dabebc691 net/traffic: switch rendezvous hashing from SHA256 to FNV-1a (#19821)
In PR tailscale/corp#30448, we originally decided to break ties using
SHA256 for our rendezvous hashing algorithm. Now that we’ve had some
experience with it, we think that FNV-1a is a better choice. It
distributes bits evenly, it’s much faster, and it doesn’t need to be
cryptographically secure. The FNV designers recommend FNV-1a over the
deprecated FNV-1.

This PR makes the switch and updates the related tests, since changing
the algorithm changes which stable pick gets selected. As of 2026-05,
this is the best time to make this change, since there are almost no
clients in the wild with traffic steering enabled.

Updates #17366
Updates tailscale/corp#29964
Updates tailscale/corp#29966
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-21 10:11:59 -07:00
Brad Fitzpatrick
aa5da2e5f2 ipn/ipnlocal, control/controlclient: process node adds/removes in constant time
For large tailnets (~50k+ nodes) with frequent peer churn (ephemeral
GitHub Actions workers etc.), tailscaled used to rebuild the full
netmap and fan it out on the IPN bus on every MapResponse that
added or removed a peer. There were two O(N) costs per delta: the
full netmap rebuild + every Notify.NetMap encode to every bus watcher.

This change tackles both:

  1. Plumb O(1) peer add/remove through the delta path. PeersChanged
     and PeersRemoved no longer prevent the delta happy path; instead,
     they mutate the per-node-backend peer map in place.

  2. Restrict ipn.Notify.NetMap emission to the platforms whose host
     GUIs still depend on it (Windows, macOS, iOS) and migrate
     in-tree consumers off it everywhere else:

     - Migrate reactive consumers (containerboot, kube agents,
       sniproxy, tsconsensus, etc.) off Notify.NetMap to the
       previously-added Notify.SelfChange signal so they no longer
       have to subscribe to the full netmap.
     - Add ipn.NotifyNoNetMap so GUI clients on "legacy-emit" platforms
       that have already migrated can opt out of the per-watcher
       NetMap encode.
     - Gate Notify.NetMap emission on the producer side by a compile-
       time GOOS check, so the supporting code is dead-code-eliminated
       on Linux and other geese where no GUI consumer needs it.

Re-running BenchmarkGiantTailnet from tstest/largetailnet, which was
added along with baseline numbers on unmodified main in ad5436af0d,
the per-delta cost (one peer add+remove pair) is now ~O(1) regardless
of tailnet size N:

    N         no-watcher (ms/op)            bus-watcher (ms/op)
              before    now     factor      before    now     factor
     10000        32   0.11       300x         166   0.13      1300x
     50000       222   0.11      2000x         865   0.13      6700x
    100000       504   0.12      4100x        1765   0.13     13400x
    250000      1551   0.12     12500x        4696   0.15     32400x

Updates #12542

Change-Id: I94e34b37331d1a8ec74c299deffadf4d061fda9e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-21 09:26:19 -07:00
Simon Law
7ebca58042 net/traffic,ipn/ipnlocal: extract traffic steering utilities (#19682)
The traffic package contains helpers for evaluating traffic steering
scores and picking appropriate nodes. These were extracted from
ipnlocal.suggestExitNodeUsingTrafficSteering so they can be reused by
the new routecheck package to probe exit nodes in priority order.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-21 08:28:27 -07:00
Brad Fitzpatrick
f3a117e813 net/tsdial: run happy eyeballs across A and AAAA in UserDial
When tailscaled is running in userspace-networking mode behind an
exit node (e.g. as a SOCKS5 proxy), it resolves a hostname and then
dials a single resolved IP through the tunnel. If the name has both
A and AAAA, Go's net.Resolver merges them and we pick ips[0], which
on an IPv6-native host is usually AAAA. If the exit node has no IPv6
egress (or vice versa), the dial fails silently through the tunnel
and the user sees a hang.

Resolve all candidates and race connect attempts across address
families with a 300ms happy-eyeballs delay, matching Go's net.Dialer
default and the existing pattern in net/dnscache (commit ee0a03b14).
First success wins; losers are cancelled and any conns they produce
are closed. A failBoost channel wakes the launcher when a connect
fails fast (e.g. ICMP "no route" via the tunnel) so we don't sit on
the 300ms timer when the answer is already known.

userDialResolve is refactored into userDialResolveAll (returns the
full candidate list) plus a thin single-IP wrapper for callers like
UserDialPlan that don't race. UserDial's per-IP dispatch (netstack
vs peer dialer vs SystemDial vs std) is extracted to dialOneUser so
each candidate can route correctly on its own merits.

Also fix serveDial in localapi to pass the original hostname to
UserDial rather than a pre-resolved IP, so the race can fire.

This fix is single-ended: it works against any exit node, including
old ones, with no protocol changes. The trade-off versus filtering
on the exit-node side via PeerAPI DoH is that every dial through an
unreachable-family exit node costs one failed connect attempt per
cache window, rather than zero, which is acceptable given the
simplicity.

Fixes #19792
Fixes #13257

Change-Id: I9d7645d0034caf3ee22ecdd8070798353f77e94b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-20 18:35:55 -07:00
M. J. Fromberger
c09407002f ipn/ipnlocal/netmapcache: add UpdateSelfOnly method (#19818)
Some netmap updates are guaranteed to affect only the "static" parts of the
netmap, and so should not require us to walk through all the peers and user
profiles when updating the cache. To support this, the new UpdateSelfOnly
method updates only the Self node and other tailnet settings that are not
dependent on the peers and profiles.

Use this when updating the cache on DERP home changes.

Updates #12542

Change-Id: Ifed522b29d579fb76e010b4ff738cc4e0a72d27f
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-05-20 16:29:04 -07:00
Simon Law
93dbd33ef7 ipn/ipnlocal: stub system interfaces for TestShouldUseOneCGNATRoute (#19807)
The TestShouldUseOneCGNATRoute test fails when the underlying system
interfaces don’t match what the underlying assumptions of the test.
That assumption was that there would only ever be one CGNAT interface:
the Tailscale one.

This breaks on Linux when border0 is installed because border0 also
creates an interface with a CGNAT route.

This patch stubs netmon.RegisterInterfaceGetter to replace the system
interfaces and netmon.SetTailscaleInterfaceProps to identify the test
data that defines the Tailscale interface.

This patch also tests the control knob override for CGNAT for every
combination of operating system and system interfaces, instead of just
a couple of combinations.

Fixes #19731

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-20 16:00:14 -07:00