Commit Graph

10583 Commits

Author SHA1 Message Date
Brad Fitzpatrick
81569e891f tstest/iosdeps: update import list to mirror ipn-go-bridge
The purpose of this package is to test the iOS dependency closure, but
it had drifted from the actual import list of the ipn-go-bridge package
in the corp repo (the Go side of the iOS / macOS app).

Update the imports to match ipn-go-bridge's GOOS=ios import list,
adding many missing packages including wgengine/netstack,
feature/{taildrop,syspolicy,condregister}, the util/syspolicy/*
subpackages, types/{key,lazy,logid,netmap}, tsd, safesocket,
util/{eventbus,must,set}, and several net/* and ipn/* packages.

Drop two now-stale BadDeps entries (for now!): database/sql/driver and
github.com/google/uuid are reached via wgengine/netstack ->
github.com/prometheus-community/pro-bing, which netstack imports on
darwin || ios for ICMP user-ping, so the iOS app already ships them.
But we should fix that later.

Updates #19633

Change-Id: Ic50779fdb195685a2e8ccd7c513eee91b0feeaf8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-04 14:05:24 -07:00
Brad Fitzpatrick
9bb7ca6116 cmd/vet/lowerell, drive/driveimpl: forbid variables named "l" or "I"
Add a new vet checker that rejects variables, parameters, named
return values, receivers, range/type-switch bindings, type
parameters, struct fields, and constants named "l" (lowercase ell)
or "I" (uppercase i). Both are hard to distinguish from the digit
"1" and from each other in too many fonts.

Rename the two pre-existing struct fields named "l" (both of type
net.Listener) in drive/driveimpl/drive_test.go to "ln", matching the
convention used elsewhere for net.Listener locals.

Rename the test-fixture struct fields "I" (single int label) to
"Int" in metrics/multilabelmap_test.go and util/deephash/deephash_test.go,
preserving the "first letters of types" convention used alongside
neighboring fields like I8/I16/U/U8.

Also teach pkgdoc_test.go to skip testdata/ directories, which
the go tool ignores; they are not real packages.

Fixes #19631

Change-Id: I71ad2fa990705f7a070406ebcdb8cefa7487d849
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-04 14:03:28 -07:00
Andrew Lytvynov
0cf899610c util/linuxfw/linuxfwtest: remove unused package (#19520)
Added in 2022, this appears to be unused now.

Updates #cleanup

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2026-05-04 12:33:12 -07:00
License Updater
ca2317439d licenses: update license notices
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
2026-05-04 10:34:27 -07:00
Jordan Whited
ce76f44df2 derp/derpserver: remove global rate limiter
Which can be unfair around varying packet sizes.

Updates tailscale/corp#40962

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-05-04 09:41:14 -07:00
Fernando Serboncini
29122506be misc/git_hook: propagate shared HOOK_VERSION (#19476)
Move HOOK_VERSION into the githook package and export it as
githook.HookVersion, so tailscale/corp can reference it via
the shared-code bump instead of having to bump HOOK_VERSION
by hand.

New launcher.sh composes the wanted version from 2 sources:
the shared HOOK_VERSION and an optional repo local version,
misc/git_hook/HOOK_VERSION, for repo-specific config bumps.

Updates tailscale/corp#40381

Change-Id: I7cf16889ba53cb564cc2df7dfd7588748f542c55

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-05-04 12:38:28 -04:00
George Jones
290a6cc03c appc, feature/conn25: handle exact and wildcard domains correctly (#19202)
Installed SplitDNS routes are always treated as wildcard domains,
so the domains that we pass to the local resolver should be normalized
and have any leading *. wildcard prefix removed.

When looking at DNS responses to see if the domain matches, we need to
consider both exact matches and wildcard matches. We now keep separate
maps of exact-match domains and wildcard domains, and when we match we
check to see if there's a match in the exact-match map, otherwise we
check against the wild card match map until we find a match, removing
a label after each check.

Rather than looking for matching self-hosted domains (domains serviced
by the connector being run on the self-node), the apps that are being
serviced by the connector on the self-node are tracked instead. When
checking to see if a DNS response should be rewritten, it is ignored
if any of the matching apps for the domain are in the self-hosted apps set.

Fixes tailscale/corp#39272

Signed-off-by: George Jones <george@tailscale.com>
2026-05-01 17:33:21 -04:00
Fran Bull
bdf3419e7d net/dns: add custom scheme resolvers
If another part of the client code registers a custom scheme with the
forwarder, the forwarder will check resolver addresses to see if they
match the scheme. If they do, the corresponding custom scheme handler
will be called to find the actual address for the resolver at this
moment. If the handler returns the empty string then that resolver will
be ignored.

This is useful if you want to dynamically determine where to send
certain DNS requests. It is being added to support new app connector
(conn25) work that would like to make sure it sends DNS requests to the
current connector peer in a high availability configuration.

Updates tailscale/corp#39858

Signed-off-by: Fran Bull <fran@tailscale.com>
2026-05-01 14:01:10 -07:00
Rollie Ma
78126c5d9f tailcfg: add node capability for services in desktop clients (#19605)
Add a node capability to help determine if the desktop clients should
show services list/menu/section

Updates: https://github.com/tailscale/corp/issues/40900

Change-Id: Ie34b3362f921d710173b2a0dd190354352bb26f0

Signed-off-by: Rollie Ma <rollie@tailscale.com>
2026-05-01 12:07:33 -07:00
Tom Meadows
ee10f9881c cmd/k8s-operator: add authkey reissuing to recorder reconciler (#19556)
also fixes memory leak with authKeyReissuing map on ProxyGroup
reconciler authkey reissue.

Updates #19311

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-05-01 18:26:55 +01:00
Alex Chan
3ced30b0b6 tka: clarify that this limit is on disablement *values* not *secrets*
Values get written into TKA state; secrets don't.

Updates #cleanup

Change-Id: Ief9831dcb1102f584a33b2e71b611b38ca463724
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-01 18:25:39 +01:00
Andrew Lytvynov
f15a4f4416 client/web: move API permission checks into handlers (#19576)
There are only a couple endpoints that check peer capabilities. Keeping
permission checks with the code that assumes they were performed, rather
than with the routing layer, feels easier to reason about.

Check that the caller is actually a peer and pass their capabilities via
a context value for handlers that want to check them.

Along with this, simplify the helper handler wrappers that are not
needed for most of the endpoints.

Updates #40851

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2026-05-01 09:01:53 -07:00
Brad Fitzpatrick
bbcb8650d4 cmd/tailscale/cli: fetch netmap via current-netmap debug action
Stop opening an IPN bus subscription with NotifyInitialNetMap purely to
read the current netmap once. Use the LocalAPI debug current-netmap
action (added in 159cf8707) instead, which returns the current netmap
synchronously without subscribing to the bus.

Updates #12542

Change-Id: I8aa2096d65aaea4dfe62634f03ce06b5470e0e51
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-01 07:53:51 -07:00
Brad Fitzpatrick
4c3ed5ab32 all: migrate code off Notify.NetMap to Notify.SelfChange
Move tailscaled's in-tree reactive users from of IPN bus Notify.NetMap
updates to the narrower Notify.SelfChange signal introduced earlier in
this series. Consumers that need additional state (peers, DNS config,
etc.) fetch it on demand via the LocalAPI.

It is a step toward the larger goal of not fanning Notify.NetMap out
to every bus watcher on Linux/non-GUI hosts.

A future change stops sending Notify.NetMap entirely on Linux and
non-GUI platforms. (eventually once macOS/iOS/Windows migrate to the
upcoming new Notify APIs, we'll remove ipn.Notify.NetMap entirely)

Updates #12542

Change-Id: I51ea9d86bdca1909d6ac0e7d5bd3934a3a4e8516
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-01 06:51:40 -07:00
Claus Lensbøl
ff9c3f0e00 tstest/natlab/vmtest: add test loading netmap cache from disk (#19598)
For testing the loading of netmap cache from disk, the cache needs to
exist. The simple solution is to start two nodes and connect them to
control, with the netmap caching capability set. Then cut the connection
to control, restart the nodes, and ping between them.

This tests that we can start from a cache and get to running state, but
also that we are able to establish a connection between the nodes.

For now this is not testing how the nodes are able to talk to each other
(DERP vs direct).

Updates #19597

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-05-01 09:46:19 -04:00
Brad Fitzpatrick
89a78dc9b7 client/local, ipn/localapi, ipn/ipnlocal: add PeerByID
Add a narrow LocalAPI accessor and matching client/LocalBackend method
to look up a single peer's current full [tailcfg.Node] by NodeID, in
O(1) time on the daemon side, without fetching the entire netmap.

Useful for callers that need the latest state of a single peer (e.g.
in response to a peer-mutation event on the IPN bus) without paying
for a full netmap fetch.

Updates #12542

Change-Id: I1cb2d350e6ad846a5dabc1f5368dfc8121387f7c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-01 06:20:46 -07:00
Alex Chan
cac94f51cc ipn/ipnlocal: don't compact TKA state on startup
Compacting on startup means nodes may compact at a different cadence
based on whether they're long-running or restarting frequently.

We already compact after every sync, which only occurs when the TKA
state has changed. Waiting for TKA changes to trigger compaction on
nodes means compaction will occur more consistently across a tailnet.

Updates tailscale/corp#33537

Change-Id: Ia0aa6d9e5e362e9ab08450fde69772841790d5b5
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-01 13:27:12 +01:00
Brad Fitzpatrick
a6c5d23742 ipn, ipn/ipnlocal: add Notify.SelfChange
Add a new bus signal that lets reactive consumers (containerboot, kube
agents, sniproxy, tsconsensus, etc.) react to self-node updates without
having to subscribe to the full netmap. Today those consumers either
watch Notify.NetMap (which on large tailnets is expensive to encode and
ship per watcher) or poll. SelfChange is a cheap, narrow alternative:
addresses, name, key expiry, capabilities, etc.

Consumers that need additional state can react to SelfChange and then
fetch the relevant bits on demand via existing LocalClient methods.

Producer-side, every netmap-bearing setControlClientStatus call now
also publishes SelfChange. Future changes will migrate individual
in-tree consumers off Notify.NetMap to this signal, and eventually
gate the legacy NetMap emission to platforms whose host GUIs still
require it.

Updates #12542

Change-Id: I4441650b0e085d663eb6bf26a03748b7d961ca49
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-30 14:47:03 -07:00
Brad Fitzpatrick
9f343fdc0c client/local, ipn/localapi, all: add CertDomains and DNSConfig accessors
Add two narrow LocalAPI accessors so callers don't have to subscribe to
the IPN bus and pull a full *netmap.NetworkMap just to read DNS-shaped
fields:

  - GET /localapi/v0/cert-domains returns DNS.CertDomains.
  - GET /localapi/v0/dns-config returns the full tailcfg.DNSConfig.

Migrate in-tree callers off the netmap-on-the-bus pattern:

  - kube/certs.waitForCertDomain still wakes on the IPN bus but now
    queries CertDomains via LocalClient.CertDomains rather than
    reading n.NetMap.DNS.CertDomains. The kube LocalClient interface
    and FakeLocalClient gain a CertDomains method.
  - cmd/tailscale dns status calls LocalClient.DNSConfig directly
    instead of opening a NotifyInitialNetMap watcher.
  - cmd/tailscale configure kubeconfig switches from a netmap watcher
    + serviceDNSRecordFromNetMap to LocalClient.DNSConfig +
    serviceDNSRecordFromDNSConfig.

This is part of a series moving callers away from depending on the
netmap traveling on the IPN bus, so the bus payload can shrink in a
later change.

Updates #12542

Change-Id: Ie10204e141d085fbac183b4cfe497226b670ad6c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-30 13:50:46 -07:00
Michael Ben-Ami
822299642b feature/conn25: centralize config on Conn25 with atomic access
We have two sources of truth for configuration state: the node view
(from the netmap/policy) and prefs (the --advertise-connector option).
These come with two independent update paths: onSelfChange for node view
changes and profileStateChange for pref changes.

Centralize config on Conn25 so that onSelfChange and profileStateChange
can update their independent parts without bundling changes together.
The old bundled approach required read-modify-write, which opened the
door to potential TOCTOU bugs. The node view config is
stored as an atomic.Pointer[config] and the prefs-derived field
(advertise-connector) becomes an independent atomic.Bool. onSelfChange
creates a fresh config and stores it atomically. profileStateChange sets
the bool.

This also establishes clearer lines of responsibility:

 - Configuration state lives on Conn25. Methods that need to read
   config (isConnectorDomain, mapDNSResponse, the IPMapper methods)
   are on Conn25, and use the atomics for synchronization.

 - "Active" state (address allocations, transit IP mappings) lives on
   client and connector, and use a mutex for synchronization on that
   state, without conflicting with configuration synchronization.
   It's fine for active state to be out of sync with config — e.g. a
   transit IP allocated for an app should still be tracked, and gracefully
   expired, even if the app is removed from the node view.
   Removing config responsibility from client/connector makes these
   cases clearer to handle.

 - In cases where the client or connector does need access to
   config-derived state, e.g. a client reconfiguring its IP pools from
   the IPSets in the config, we can use closures for the
   client or connector to get just the latest state it needs from the
   config. See getIPSets() in this commit.

 - As of this commit, the connector doesn't need config-derived state at
   all.

Fixes tailscale/corp#40872

Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
2026-04-30 16:29:56 -04:00
Brad Fitzpatrick
159cf8707a ipn/ipnlocal, all: split LocalBackend.NetMap into NetMapNoPeers / NetMapWithPeers
Add two narrower accessors alongside the existing
[LocalBackend.NetMap], with docs that distinguish their semantics:

  - NetMapNoPeers: cheap (returns the cached *netmap.NetworkMap with
    a possibly-stale Peers slice). For callers that only read non-Peers
    fields like SelfNode, DNS, PacketFilter, capabilities.
  - NetMapWithPeers: documented as returning an up-to-date Peers slice.
    For callers that genuinely need to iterate Peers or call
    PeerByXxx.

Mark the existing NetMap deprecated and point readers at the two new
accessors. NetMap, NetMapNoPeers, and NetMapWithPeers all currently
return the same value (b.currentNode().NetMap()): this commit is a
no-op behaviorally, just a renaming and migration of in-tree callers.
A subsequent change in the same series will switch
NetMapWithPeers to actually rebuild the Peers slice from the live
per-node-backend peers map (O(N) per call), at which point the
distinction between the two new accessors becomes load-bearing.

Migrate in-tree callers to the appropriate accessor based on what
fields they read:

  - NetMapNoPeers (most common): localapi handlers, peerapi accept,
    GetCertPEMWithValidity, web client noise request, doctor DNS
    resolver check, tsnet CertDomains/TailscaleIPs, ssh/tailssh
    SSH-policy/cap reads, several LocalBackend internals
    (isLocalIP, allowExitNodeDNSProxyToServeName, pauseForNetwork
    nil-check, serve config).
  - NetMapWithPeers: writeNetmapToDiskLocked (persist full netmap to
    disk for fast restart), PeerByTailscaleIP lookup.

Tests still call the legacy NetMap; they'll see the deprecation
warning but otherwise behave identically.

Also add two pieces of plumbing the next change in this series will
need, but which are already useful on their own:

  - [client/local.GetDebugResultJSON]: a generic [Client.DebugResultJSON]
    that decodes directly into a target type T, avoiding the
    marshal/unmarshal roundtrip callers otherwise need.
  - localapi "current-netmap" debug action: returns the current
    netmap (with peers) as JSON. Documented as debug-only — the
    netmap.NetworkMap shape is internal and may change without notice.

This commit is part of a series breaking up a larger change for
review; on its own it is a no-op refactor.

Updates #12542

Change-Id: Idbb30707414f8da3149c44ca0273262708375b02
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-30 11:14:06 -07:00
Brad Fitzpatrick
92179b1fc7 cmd/hello: split server into helloserver package
Move the template, request handler, and HTTP/HTTPS server wiring out
of package main and into a new cmd/hello/helloserver package so the
server can be embedded in other binaries. The main package now only
constructs a helloserver.Server with the production addresses and
calls Run.

While here, drop the -http, -https, and -test-ip flags along with the
dev-mode template and fake-data fallbacks they enabled; the binary is
only run in production.

Updates tailscale/corp#32398

Change-Id: Id1d38b981733334cafc596021130f36e1c1eed67
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-30 08:40:55 -07:00
David Bond
644c3224e9 cmd/{containerboot,k8s-operator}: don't return pointers to maps (#19593)
This commit modifies the usage of the `egressservices.Configs` type
within containerboot and the k8s operator.

Originally it was being thrown around as a pointer which is not required
as maps are already pointers under the hood.

Signed-off-by: David Bond <davidsbond93@gmail.com>
2026-04-30 16:11:00 +01:00
Brad Fitzpatrick
815bb291c9 cmd/tailscale/cli: allow tag without "tag:" prefix in 'tailscale up'
If a user passes --advertise-tags=foo,bar (with no colons in any
segment), automatically prepend "tag:" client-side so it goes on the
wire as "tag:foo,tag:bar". Segments that already contain a colon are
left untouched and must be fully-qualified ("tag:foo"), which keeps
the door open for future colon-bearing syntax.

This was originally added in cd07437ad (2020-10-28) and then reverted
in 1be01ddc6 (2020-11-10) over forward-compatibility concerns. But
then it was realized in 2026-04-29 that this was always safe for
future extensiblity anyway (tags can't contain colons-- tag:foo:bar is
invalid anyway, per the 2020 CheckTag restrictions). So if we wanted
to perhaps some hypothetical --advertise-tags=tagset:setfoo or "group:foo",
we'd still have syntax to do, as it can't conflict with tag:group:foo.

Avery signed off on this on Slack: "Ok, I withdraw my objection to
auto-qualifying tag names in advertise-tags and I hope I won't regret
it :)"

Updates #861

Change-Id: I06935b0d3ae909894c95c9c2e185b7d6a219ff32
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-30 07:13:48 -07:00
Brad Fitzpatrick
f343b496c3 wgengine, all: remove LazyWG, use wireguard-go callback API for on-demand peers
Replace the UAPI text protocol-based wireguard configuration with
wireguard-go's new direct callback API (SetPeerLookupFunc,
SetPeerByIPPacketFunc, RemoveMatchingPeers, SetPrivateKey).

Instead of computing a trimmed wireguard config ahead of time upon
control plane updates and pushing it via UAPI, install callbacks so
wireguard-go creates peers on demand when packets arrive. This removes
all the LazyWG trimming machinery: idle peer tracking, activity maps,
noteRecvActivity callbacks, the KeepFullWGConfig control knob, and the
ts_omit_lazywg build tag.

For incoming packets, PeerLookupFunc answers wireguard-go's questions
about unknown public keys by looking up the peer in the full config.
For outgoing packets, PeerByIPPacketFunc (installed from
LocalBackend.lookupPeerByIP) maps destination IPs to node public keys
using the existing nodeByAddr index.

Updates tailscale/corp#12345

Change-Id: I4cba80979ac49a1231d00a01fdba5f0c2af95dd8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 19:46:19 -07:00
Brad Fitzpatrick
b313bffbe7 control/tsp, tstest/integration/testcontrol: deflake TestMapAgainstTestControl
The test was flaky under stress with "AddRawMapResponse N: node not
connected" failures. The root cause was in testcontrol's addDebugMessage:
it conflated "no streaming poll registered" with "wake-up channel buffer
momentarily full". The single-slot updatesCh is just a lossy wake-up
signal, but the streaming serveMap loop has fast paths
(takeRawMapMessage and the hasPendingRawMapMessage continue) that don't
drain it. A stale notification could remain buffered, causing the next
sendUpdate to fail even though msgToSend had been queued and the
streaming poll would still pick it up.

Detect the real failure case (no streaming poll) by checking
s.updates[nodeID] directly, and treat sendUpdate's buffer-full result as
benign — the message is in msgToSend, which is the source of truth.

Also plumb an optional *health.Tracker through tsp.ClientOpts to the
underlying ts2021.Client and supply one in the tests, eliminating the
"## WARNING: (non-fatal) nil health.Tracker (being strict in CI)" stack
dumps emitted by controlhttp.(*Dialer).forceNoise443 under CI.

Fixes #19583

Change-Id: Ib2334376585e8d6562f000a0b71dea0117acb0ff
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 16:11:00 -07:00
Claus Lensbøl
978b6a81b2 ipn/ipnlocal: always ReSTUN when starting up without a cache (#19586)
78627c1 introduced starting up and preserving the DERP server from
cache, but also changed it so the initial ReSTUN would not fire when
setting the DERPMap.

Change this so when not working from a cache, the ReSTUN will always
fire during startup.

Updates #19585

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-04-29 18:56:57 -04:00
Jordan Whited
c0a9728fe2 derp/derpserver: fix Server.UpdateRateLimits docs
As of 0e9f9e2bd it is possible to have an infinity per-client limit,
with finite global.

Updates tailscale/corp#40962

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-04-29 14:43:12 -07:00
Jordan Whited
0e9f9e2bd8 derp/derpserver: support global rate limiting independent of per-client
This commit enables the operator to set a global rate limit without any
per-client.

Updates tailscale/corp#40962

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-04-29 14:15:53 -07:00
Brad Fitzpatrick
15cba0a3f6 tstest/natlab/vmtest: add TestDiscoKeyChange
Add a vmtest that brings up two gokrazy nodes A and B behind two
One2OneNAT networks (so direct UDP works in both directions and any
slowness can't be blamed on NAT traversal), establishes a WireGuard
tunnel A → B with TSMP, then rotates B's disco key four times and
asserts that the data plane recovers in both directions after each
rotation. All pings are TSMP (the data-plane ping; disco pings would
not exercise the WireGuard tunnel itself).

The five pings:

  1. A → B  (initial; brings up the tunnel; 30s budget)
  2. B → A  after rotate (LocalAPI rotate-disco-key debug action)
  3. A → B  after rotate (LocalAPI)
  4. B → A  after restart (SIGKILL; gokrazy supervisor respawns)
  5. A → B  after restart (SIGKILL)

Each post-rotation ping gets a 15-second budget. Two unavoidable
multi-second waits dominate today:

  - The rotate-then-a→b phase takes ~10s on main because of LazyWG.
    After B's WantRunning bounce, B's wgengine resets its
    sentActivityAt/recvActivityAt maps and trims A out of the
    wireguard-go config as an "idle peer"; B only re-adds A on
    inbound activity, by which point A's first few TSMP packets
    have been silently dropped at B's tundev. The
    bradfitz/rm_lazy_wg branch removes that trimming entirely
    (verified locally: this phase drops to <100ms there).

  - The restart phases take ~5s for wireguard-go's RekeyTimeout
    handshake retry. After SIGKILL+respawn the first WG handshake
    init from the restarted node sometimes goes into the void
    (likely the brief peer-removed window in the receiver's
    two-step maybeReconfigWireguardLocked reconfig during which
    the peer is absent from wireguard-go), and wg-go's 5s+jitter
    retransmit timer is the next opportunity to retry. That retry
    succeeds and the staged TSMP packet flushes. Intrinsic to the
    protocol's retransmit policy.

Once LazyWG is removed and the first-handshake-after-reconfig race
is fixed, the budget should drop to 5s.

Supporting changes:

  ipn/ipnlocal: DebugRotateDiscoKey now toggles WantRunning off and
  back on after rotating the disco key. magicsock.Conn.RotateDiscoKey
  only resets local disco state; without also dropping wireguard-go
  session keys, peers keep encrypting with their stale per-peer
  session against us until their rekey timer fires (WireGuard has no
  data-plane signaling to invalidate sessions). Bouncing WantRunning
  runs the engine through Reconfig(empty) → authReconfig, which
  drops every peer's WG session so the next packet either way
  triggers a fresh handshake.

  ipn/ipnlocal, ipn/localapi: add a debug-only "peer-disco-keys"
  LocalAPI action ([LocalBackend.DebugPeerDiscoKeys]) that returns
  a map[NodePublic]DiscoPublic from the current netmap. Tests reach
  it via [local.Client.DebugResultJSON]. We do not surface disco
  keys via [ipnstate.PeerStatus] because adding a non-comparable
  [key.DiscoPublic] field there breaks reflect-based test helpers
  (e.g. TestFilterFormatAndSortExitNodes' use of cmp.Diff), and
  general LocalAPI clients have no need for disco keys. Since the
  debug LocalAPI is gated behind the ts_omit_debug build tag, this
  endpoint is automatically stripped from small binaries.

  cmd/tta: add /restart-tailscaled handler (Linux-only, via /proc walk)
  to drive the SIGKILL phase. On gokrazy the supervisor respawns
  tailscaled within a second.

  tstest/integration/testcontrol: add Server.AllOnline. When set,
  every peer entry in MapResponses is marked Online=true. Several
  disco-key handling fast paths in controlclient and wgengine
  (removeUnwantedDiscoUpdates, removeUnwantedDiscoUpdatesFromFull
  NetmapUpdate, the wgengine tsmpLearnedDisco fast path) only fire
  for online peers; without this flag, tests exercising disco-key
  rotation only hit the offline-peer code paths, which mask issues
  and are several seconds slower in this scenario. Finer-grained
  per-node online tracking can be added later.

  tstest/natlab/vmtest: add Env.RotateDiscoKey,
  Env.RestartTailscaled, Env.PeerDiscoKey, Node.Name, an
  [AllOnline] EnvOption that plumbs through to
  testcontrol.Server.AllOnline, and an exported
  Env.Ping(from, to, type, timeout). Ping replaces the unexported
  helper so callers can specify both a ping type (PingDisco for
  warming peer state, PingTSMP for asserting end-to-end
  connectivity) and a deadline. PeerDiscoKey returns its LocalAPI
  error so callers inside tstest.WaitFor can retry transient
  failures rather than fataling the test.

Updates #12639
Updates #13038

Change-Id: I3644f27fc30e52990ba25a3983498cc582ddb958
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 12:58:00 -07:00
Brad Fitzpatrick
22ff402da9 wgengine/magicsock: restore SetDERPMap signature, add SetDERPMapWithoutReSTUN
Commit 78627c132f changed the signature of magicsock.Conn.SetDERPMap to
take an additional bool doReStun parameter. Avoid both the boolean
parameter and the API signature change by restoring SetDERPMap to its
original single-argument form and adding a new SetDERPMapWithoutReSTUN
method for the cache-loading caller that wants to skip the post-set
ReSTUN.

Updates #19490

Change-Id: I97d9e82156bfc546ccf59756d1ea52f039b5de06
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 12:46:15 -07:00
Adriano Sela Aviles
1cd8bcc827 tailcfg: extend services model for client application actions
Updates: tailscale/corp#40648
Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
2026-04-29 11:33:13 -07:00
Brad Fitzpatrick
70f0b261b6 go.mod, gokrazy: bump to fork of gokrazy/gokrazy init process for syslog change
When we switched to monogok in 371d6369cd, we lost our gokrazy fork's
change to let the syslog be configured from the Linux cmdline.

That's sent upstream in gokrazy/gokrazy#275 but still in review. Meanwhile,
revert to a fork, while still keeping monogok. Monogok was updated to
support an alternate init package, which is now hosted temporarily at
https://github.com/tailscale/ts-gokrazy

This means we can rip out the log polling loop out of pending PR #19568
and go ack to using syslog.

Updates #13038

Change-Id: I36931ee8eecc40d6165ad036c6181dfb07b86ba2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 11:27:41 -07:00
Alex Valiushko
01d0bdd253 cmd/derper,derp: add metrics for rate limit hits (#19560)
Expvars track count of rate limiters exceeding their threshold.
Covers (1) global rate limiter and (2) total of local rate limiters.

Also publish optional rate-limit metrics during ExpVar() call
if -rate-config is specified. Fixes current rate-limit metrics
being published outside of "derp" in /debug/vars.

Updates tailscale/corp#38509

Change-Id: Ic7f5a1e890d0d7d3d7b679daa4b5f8926a6a6964
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
2026-04-29 10:29:09 -07:00
Claus Lensbøl
be7cce74ba wgengine/userspace: do not fall back to old key on tsmpLearned mismatch (#19575)
The mismatch behaviour of falling back to a previous key could end up
breaking connections when the netmap update took longer than the 2
seconds allowed in controlClient.auto for netmap updates, or if the
controlClient context was canceled. This could end up breaking
legitimate updates to the netmap for disco keys coming from control.

Instead, log the event, and let the connection be reset to that of the
key as that is safer.

Issue found by @bradfitz.

Updates #19574

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-04-29 13:23:04 -04:00
Brad Fitzpatrick
fd6ae2fad4 tstest/natlab/vmtest: serialize per-platform setup with sync.Once
Two cloud-platform nodes (e.g. sr-a and sr-b in TestSiteToSite) boot in
parallel via errgroup and both call ensureCompiled and the inline image
preparation block, racing to Begin() the same shared *Step (which is
deduped by name in Env.Step). The second goroutine panics:

    panic: Step "Compile linux_amd64 binaries": Begin called in state running
    panic: Step "Prepare ubuntu-24.04 image": Begin called in state done

ensureCompiled had a TOCTOU dedup attempt (released compileMu before
doing the work, only added to the compiled set at the end), and image
preparation had no dedup at all.

Replace the compiled set with a per-key map[string]*sync.Once for each
of compile and image preparation, so concurrent callers serialize on
the Once and only the first executes Begin/work/End.

Fixes commit 02ffe5baa8.

Updates #13038

Change-Id: If710bcc9e0aafebf0ad5b61553bae11458d976d7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 09:54:58 -07:00
Brad Fitzpatrick
02ffe5baa8 tstest/natlab/vmtest: add macOS VM snapshot caching for fast test starts
Cache a pre-booted macOS VM snapshot on disk so subsequent test runs
restore from the snapshot instead of cold-booting. The snapshot is keyed
by the Tart base image digest and a code version constant
(macOSSnapshotCodeVersion); bumping either invalidates the cache.

Snapshot preparation (one-time):
- Boot the Tart base image with a NAT NIC (--nat-nic flag)
- Wait for SSH, compile and install cmd/tta as a LaunchDaemon
- TTA polls the host via AF_VSOCK for an IP assignment; during prep
  the host replies "wait"
- Disconnect NIC, save VM state via SIGINT

Test fast path (cached, ~7s to agent connected):
- APFS clone the snapshot, write test-specific config.json
- Launch Host.app with --disconnected-nic --attach-network --assign-ip
- VZ restores from SaveFile.vzvmsave (~5s with 4GB RAM)
- TTA's vsock poll gets the IP config, sets static IP via ifconfig
  (bypasses DHCP entirely), switches driver addr to the IP directly
  (bypasses DNS), and resets the dial context so the reverse-dial
  reconnects immediately
- TTA agent connects to test driver within ~2s of IP assignment

Key optimizations:
- 4GB RAM instead of 8GB: halves SaveFile.vzvmsave (1.4GB vs 2.4GB),
  halves restore time (5.5s vs 11s)
- AF_VSOCK IP assignment: bypasses macOS DHCP (~5-7s saved)
- Direct IP dial: bypasses DNS resolution for test-driver.tailscale
- Dial context reset: cancels stale in-flight dials from snapshot
- Kill instead of SIGINT for test VM cleanup (no state save needed)
- Parallel VM launches

Also:
- Add TestDriverIPv4/TestDriverPort constants to vnet
- Add --nat-nic and --assign-ip flags to Host.app
- Fix SIGINT handler: retain DispatchSource globally, use dispatchMain()
- Add vsock listener (port 51011) to Host.app for IP config protocol
- Add disconnectNetwork() to VMController for clean snapshot state
- Fix Makefile: set -o pipefail so xcodebuild failures aren't swallowed

Updates #13038

Change-Id: Icbab73b57af7df3ae96136fb49cda2536310f31b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 08:17:13 -07:00
M. J. Fromberger
7b53550fe6 control/controlclient: fix a nil-indirection bug in DERP key pruning (#19565)
Upon deciding to update the LastSeen timestamp, we weren't checking that the
field we are replacing into was non-nil. Rather than add an additional check,
just allocate a fresh pointer for the updated time.

Updates #19564

Change-Id: I589ebe65175fc7677c04a31dd6c4670e2531ee62
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-04-29 07:57:38 -07:00
David Bond
a29e42135b cmd/k8s-operator: add nodeSelector to DNSConfig resource (#19429)
This commit modifies the `DNSConfig` resource to allow customisation of
the `spec.nodeSelector` field in the nameserver pods.

Closes: https://github.com/tailscale/tailscale/issues/19419

Signed-off-by: David Bond <davidsbond93@gmail.com>
2026-04-29 15:56:33 +01:00
Brad Fitzpatrick
4cec06b8f2 tstest/natlab/vmtest: add macOS VM screenshot streaming to web UI
When --vmtest-web is set, Host.app is launched with --screenshot-port 0
to start a localhost HTTP server that captures the VZVirtualMachineView
display. The Go test harness parses the SCREENSHOT_PORT=<port> line from
stdout, then polls every 2 seconds for JPEG thumbnails and pushes them
over WebSocket to the web dashboard.

Clicking a screenshot thumbnail opens a full-resolution image proxied
through the web UI's /screenshot/{node} endpoint.

Screenshot events are excluded from the EventBus history (they're large
and only the latest matters, stored in NodeStatus.Screenshot).

Updates #13038

Change-Id: I9bc67ddd1cc72948b33c555d4be3d8db06a41f6d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 07:48:26 -07:00
Claus Lensbøl
78627c132f wgengine/magicsock,ipn/ipnlocal: store and load homeDERP from cache (#19491)
With netmap caching, the home DERP of the self node was neither saved to
the cache or loaded from it, making nodes not stick to a DERP when
starting without a connection to control.

Instead, make sure that when a cache is available, load that cache,
before looking for DERP servers. This is implemented by allowing a skip
of ReSTUN in setting the DERP map (we must have a DERP map before
setting the home DERP), so the DERP from cache will set itself and be
sticky until a connection to control is established.

Making DERP only change when connected to control is handled by existing
code from f072d017bd.

Updates #19490

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-04-29 10:24:09 -04:00
Alex Chan
1841a93ab2 ssh/tailssh: mark TestSSHRecordingCancelsSessionsOnUploadFailure as flaky (again)
This test is still flaking on macOS, so mark it as such so we can track
and investigate further.

Updates #7707

Change-Id: I640da3c1068a90a9815caab2df9431bceb01f846
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-04-29 14:22:09 +01:00
Alex Chan
bb91bb842c all: remove everything related to non-seamless key renewal
Seamless key renewal has been the default in all clients since 1.90.
We retained the ability to disable it from the control plane as a
precaution, but we haven't seen any issues that require us to disable it.

We're now removing all the code for non-seamless key renewal, because we
don't expect to turn it on again, and indeed it's been untested in the
field for three releases so might contain latent bugs!

Updates tailscale/corp#33042

Change-Id: I4b80bf07a3a50298d1c303743484169accc8844b
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-04-29 10:03:26 +01:00
Noel O'Brien
40088602c9 cmd/hello: remove hello.ipn.dev (#19567)
Fixes #19566

Signed-off-by: Noel O'Brien <noel@tailscale.com>
2026-04-28 17:54:29 -07:00
Brad Fitzpatrick
b2d4ba04b6 tstest/natlab/vmtest: add macOS VM support using Tart base images
Add macOS VM support to the vmtest framework using Tart's pre-built
macOS images (ghcr.io/cirruslabs/macos-tahoe-base) instead of building
from IPSW. The Tart image has SIP disabled and SSH enabled.

At test time, the Tart base image's disk, NVRAM, and hardware identity
are APFS-cloned into a tailmac-compatible directory layout, and the VM
is booted headlessly via tailmac's Host.app (Virtualization.framework)
with its NIC connected to vnet's dgram socket.

New features:
- tailmac.go: ensureTartImage (auto-pull), cloneTartToTailmac (format
  conversion), startTailMacVM (launch + cleanup)
- NoAgent() node option for VMs without TTA installed
- LANPing() for ICMP reachability testing via TTA's /ping endpoint
- IsMacOS field on OSImage, with GOOS/GOARCH support
- Dgram socket listener in Start() for macOS VMs
- Fix ReadFromUnix error spam on dgram socket close in vnet

TestMacOSAndLinuxCanPing verifies a macOS Tart VM and a gokrazy Linux
VM can ping each other on the same vnet LAN.

Updates #13038

Change-Id: I5e73a27878abf009f780fdf11a346fc857711cff
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-28 12:51:40 -07:00
Brad Fitzpatrick
ec7b11d986 tstest/natlab/vmtest, cmd/tta: add TestTaildrop
Add a vmtest that brings up two Ubuntu nodes, each behind its own
EasyNAT, joined to the tailnet. The sender pushes a small file via
"tailscale file cp" and the receiver fetches it via "tailscale file
get --wait", asserting that the filename and contents round-trip
unchanged.

To make Taildrop work in vmtest, three small pieces were needed:

The Linux/FreeBSD cloud-init now starts tailscaled with --statedir as
well as --state=mem:, so the daemon has a VarRoot to host Taildrop's
incoming-files directory. State itself remains in-memory (so nothing
persists across reboots); only the var-root scratch space is on disk.

vmtest.New grows a variadic EnvOption parameter and a SameTailnetUser
helper. When the option is passed, Start sets AllNodesSameUser=true
on the embedded testcontrol.Server. Cross-node Taildrop requires the
sender and receiver to share a Tailnet user (or have an explicit
PeerCapabilityFileSharingTarget granted between them, which we don't
plumb here), so TestTaildrop opts in. Existing tests don't.

cmd/tta gains /taildrop-send and /taildrop-recv handlers that wrap
"tailscale file cp" and "tailscale file get --wait", plus
Env.SendTaildropFile and Env.RecvTaildropFile helpers in vmtest that
drive them.

Updates #13038

Change-Id: I8f5f70f88106e6e2ee07780dd46fe00f8efcfdf1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-28 12:27:55 -07:00
Brad Fitzpatrick
4b8e0ede6d tstest/natlab/{vmtest,vnet}, cmd/tta: add TestMullvadExitNode
Add a vmtest that brings up a Tailscale client, an Ubuntu VM acting
as a Mullvad-style plain-WireGuard exit node, and a non-Tailscale
webserver, each on its own NAT'd vnet network with a distinct WAN
IP. The test exercises Tailscale's IsWireGuardOnly peer code path:
the way the control plane wires Mullvad exit nodes into a client's
netmap, including the per-client SelfNodeV4MasqAddrForThisPeer
source-IP rewrite that lets a Tailscale CGNAT IP egress through a
plain-WireGuard tunnel that has no idea what Tailscale is.

The mullvad VM doesn't run wireguard-tools or kernel WireGuard;
instead, a new TTA endpoint /wg-server-up creates a real Linux TUN
named wg0, drives it with wireguard-go (already vendored), and
configures the kernel side (ip addr/up, ip_forward, iptables NAT
MASQUERADE) so decrypted traffic from the peer egresses with the
mullvad VM's WAN IP. Userspace vs kernel WireGuard makes no
difference on the wire — what's being tested is Tailscale's
plain-WireGuard exit-node code path, not the kernel module — and
this lets the test avoid downloading and installing .deb packages
inside the VM.

Adds Env.BringUpMullvadWGServer (calls /wg-server-up, returns the
generated WG public key as a key.NodePublic), Env.SetExitNodeIP
(EditPrefs ExitNodeIP directly, for exit nodes whose IPs aren't
discoverable via TTA), Env.ControlServer (exposes the underlying
testcontrol.Server so tests can UpdateNode / SetMasqueradeAddresses
to inject custom peers), and Env.Status (fetches a node's tailscale
status, used to read the client's pubkey so we can pin it as the
WG server's only allowed peer).

The test verifies that the webserver's echoed source IP is the
client's WAN with no exit node selected, the mullvad VM's WAN with
the WG-only peer selected as exit, and the client's WAN again after
clearing.

Updates #13038

Change-Id: I5bac4e0d832f05929f12cb77fa9946d7f5fb5ef1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-28 11:31:48 -07:00
Andrew Lytvynov
da0a277565 client/web: fail /api/routes requests with empty flags (#19548)
If both ExitNode and AdvertiseRoutes flags are empty, then the request
is invalid and should fail. Previously it would wipe out any existing
values configured for these prefs because of the assumption in the
handler that exactly one of them is set.

Updates https://github.com/tailscale/corp/issues/40851

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2026-04-28 11:16:47 -07:00
Brad Fitzpatrick
f7f8b0a0a5 cmd/tailscale/cli: drive "file cp" progress and offline warning from peerAPI
The Online bit in PeerStatus comes from control's last-known state and
can lag reality, so gating "tailscale file cp" on it is both unreliable
and pushes correctness onto the server. Just try the push directly.

In runCp, when the target's PeerStatus says it's offline, no longer
fail upfront; getTargetStableID returns the StableID anyway. Replace
the static "is offline" warning with a 3-second timer armed for the
first file: if the timer fires before peerAPI bytes have flowed, we
print a warning to stderr. The wording depends on whether control
reported the peer offline ("is reportedly offline; trying anyway") or
online ("is not replying; trying anyway"). The warning is printed with
a leading vt100 clear-line and a trailing newline so it doesn't get
painted over by the progress redraw and so the next progress redraw
lands on a fresh line below it.

Both the timer disarm and the progress display now read from
tailscaled's OutgoingFile.Sent (subscribed via WatchIPNBus) instead of
the local-body counter. That's the difference between bytes-acked-by-
local-tailscaled (what countingReader.n was measuring; useless for
detecting an unreachable peer because for small files net/http buffers
the entire body into the unix-socket conn before the peerAPI dial has
even started) and bytes-pulled-toward-peerAPI (what tailscaled is
actually doing, reflected in OutgoingFile.Sent). The previous code
reported 100% within milliseconds for a 3 KiB file even when the peer
was unreachable.

Add --update-interval (default 250ms) to control the progress repaint
cadence; zero or negative disables the progress display entirely. The
printer now also stops repainting once it observes Sent at full size
with a near-zero rate for >2s, so a stuck transfer doesn't keep
clobbering whatever the rest of runCp is trying to print.

Updates #18740

Change-Id: I189bd1c2cd8e094d372c4fee23114b1d2f8024b4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-28 11:03:58 -07:00
Brad Fitzpatrick
88cb6f58f8 tool/updateflakes, cmd/nardump: replace update-flake.sh with Go tool
Consolidate go.mod.sri and go.toolchain.rev.sri into a single
flakehashes.json file at the repo root, owned by a new Go program at
tool/updateflakes. The JSON is consumed by flake.nix via
builtins.fromJSON and by any future Go code via the FlakeHashes
struct that defines its schema.

Each block records its input fingerprint alongside the SRI it
produced: the goModSum (a sha256 over go.mod and go.sum) for the
vendor block, and the literal rev string from go.toolchain.rev for
the toolchain block. updateflakes regenerates a block only when its
recorded fingerprint disagrees with the current input.

Doing the gating by content rather than file mtimes avoids the usual
mtime hazards across git checkouts, clones, and merges. It also
means re-runs with no input changes are essentially free, and a
re-run that touches only one input pays only for that one block.

The two blocks have no shared state -- vendor invokes go mod vendor
into one tempdir, toolchain fetches and extracts a tarball into
another -- so they run concurrently via errgroup. Cold time is
bounded by the slower of the two rather than their sum.

Also takes the opportunity to fold the toolchain fetch into a single
curl|tar pipeline (no intermediate .tar.gz on disk).

Split cmd/nardump into a thin package main and a new package nardump
library at cmd/nardump/nardump that holds the NAR encoder and SRI
helper. tool/updateflakes imports the library directly rather than
building and exec'ing the nardump binary at runtime. The library
uses fs.ReadLink (Go 1.25+) instead of os.Readlink, so it no longer
requires the caller to chdir into the FS root for symlink targets to
resolve. WriteNAR now wraps its writer in a bufio.Writer internally
(unless the caller already passed one) and flushes on return, so
callers don't pay for tiny writes against slow underlying writers.

The cache-busting line in flake.nix and shell.nix is known to live
at end of file, so updateCacheBust walks the lines in reverse.

make tidy timings on this machine, before: ~14s every run.
After:

  warm (no input changes):       0.05s
  vendor block stale only:       1.4s
  toolchain block stale only:    5.0s
  cold (no flakehashes.json):    5.0s

Updates #6845

Change-Id: I0340608798f1614abf147a491bf7c68a198a0db4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-28 10:18:32 -07:00