Commit Graph

1910 Commits

Author SHA1 Message Date
Brad Fitzpatrick
988b0905bb wgengine/wglog: stop using netmap.NetworkMap here too
This applies the same treatment from 8f210454dd (netlog) to wglog,
ending use of netmap.NetworkMap and instead getting the canonical data
from LocalBackend/nodeBackend.

This is a dependency to removing the netmap.NetworkMap from
upstream callers, like wgengine.Engine in general.

Updates #12542

Change-Id: Icb5af0799322def048a6f594b49f7d11273f025d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-23 09:06:37 -07:00
Brad Fitzpatrick
e0677ccc76 net/tstun, wgengine/filter: track UDP flow state for injected packets
Outbound packets produced by netstack (used by tailscaled with
--tun userspace-networking, by tsnet, and by the SOCKS5/HTTP proxies)
enter the wrapper via InjectOutbound{,PacketBuffer} and take the
injectedRead path, which bypasses Filter.RunOut.

RunOut's side effect for UDP/SCTP is to insert the reverse-flow tuple
into the connection-tracking LRU so that Filter.RunIn admits inbound
replies that no explicit ACL rule covers. Skipping it on the injected
path meant a netstack-side dial of UDP would send fine but the reply
would be dropped as "no matching rule". The kernel-TUN path was
already fine because it goes through RunOut.

Fixes #14229
Fixes #20064

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I816ef55c493a12ff4f561cd89c095559b5c2743b
2026-06-22 15:57:37 -07:00
Anton Tolchanov
f442cda999 ipn/ipnlocal: consider all DERP regions for exit node recommendations
When recommending an exit node, suggestExitNodeLocked ranks candidates by
the latency to their home DERP region, taken from the most recent netcheck
report. But netcheck alternates between full reports, which probe every
region, and incremental reports, which only re-probe the home region and a
handful of the fastest regions. When the most recent report is incremental,
the suggestion fell back to a random for exit nodes that are far away.

Now we rank candidates against the best recent latency, tracked by the
`netcheck.Client` - the same data that is used to pick the preferred
DERP. It uses a history of measurements which includes a full netcheck
report, so should cover all DERP regions.

Updates tailscale/corp#17516

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2026-06-22 12:28:09 +02:00
Jordan Whited
54005752a5 wgengine/magicsock: suppress TSMP disco advert when bestAddr is peer relay
Updates #20156

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-06-18 11:43:00 -07:00
Jordan Whited
be2f554dd3 control/controlknobs,wgengine/magicsock: disable TSMP disco advert if netmap caching is disabled
Updates #20081

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-06-17 18:45:38 -07:00
Brad Fitzpatrick
8f210454dd wgengine/netlog: stop using netmap.NetworkMap type, use LocalBackend
The Logger previously took a *netmap.NetworkMap at Startup and on every
ReconfigNetworkMap call, denormalizing it into per-IP and self lookup
maps. That denormalization is O(n) over all peers and ran on every
netmap update, contributing to the broader quadratic behavior we want
to eliminate when a single peer is added or removed.

Instead, this makes netlog ask LocalBackend (well, nodeBackend) for
the info it needs, letting us remove the netmap.NetworkMap type
entirely from the netlog package.

This is a dependency to removing the netmap.NetworkMap type from
upstream callers, like wgengine.Engine in general.

Updates #12542

Change-Id: Ib5f2de96e788a667332c0a6f7ac833b3d0053b5c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-17 15:11:57 -07:00
Naman Sood
47333e9487 feature/conn25: recreate transit IP mappings when connector loses them
Mappings from transit IPs to real IPs are stored ephemerally in the
connector, so they're lost on restart. When we send a packet to the
connector with a transit IP it does not recognize, it sends us a TSMP
message saying so (see #19883). If we (the client) know of such a
mapping, we now re-send it to the connector so that a connection can
proceed.

Fixes tailscale/corp#34256.

Signed-off-by: Naman Sood <mail@nsood.in>
2026-06-17 13:50:51 -04:00
Brad Fitzpatrick
6596d237a3 ipn/ipnlocal: add wireguard session state metrics + publish on IPN bus
Updates #19989
Updates tailscale/corp#42874

Change-Id: I843ed95bc7b0f5cd38ba1467332c6b022901e254
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-15 11:41:18 -07:00
Steve Avery
4c4ec3d468 net/packet,wgengine/filter: handle IPv6 fragment extension header
decode6 didn't parse the IPv6 Fragment extension header (Next Header 44),
so any source-fragmented IPv6 packet was classified as an unknown protocol
and matched no ACL rule. The filter then silently dropped it and counted it
as an "acl" drop, even on allow-all tailnets, blackholing large UDP (DNS,
WebRTC, etc.) over a tailnet's IPv6 addresses. IPv4 fragments were already
handled by decode4.

Parse the fragment header the same way: read the first fragment's transport
ports so the filter matches it like an unfragmented packet, pass later
fragments through as ipproto.Fragment, and reject overlapping-fragment
offsets (RFC 1858) and first fragments too short to hold the transport
header as unknown.

Fixes #20083

Signed-off-by: Steve Avery <hello@stevenavery.com>
2026-06-15 11:18:00 -07:00
Alex Valiushko
7d18a06292 go.mod,wgengine/magicsock: pull wireguard-go fix for roaming endpoints (#20118)
Bumps wireguard-go pin to include the roaming endpoints fix, and
two internal enhancements.

Pulls stock wireguard-go for non-tailscale simulation in tests,
to use its endpoint discovery mechanism.

Updates #20082

Change-Id: I2ff282cb7fe4ab099ce5e780a1d40ae86a6a6964
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
2026-06-12 10:50:35 -07:00
Michael Ben-Ami
a9ea6336fa wgengine: delete Conn25 packet hooks
Package features/conn25 wires up the hooks directly on the tun wrapper
without needing to go through the userspace engine, so this codepath is
unused and not needed.

Updates #cleanup

Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
2026-06-12 13:43:55 -04:00
M. J. Fromberger
b23089a5ef wgengine/magicsock: update netmap cache flag on receipt of a delta (#20117)
Since deltas are only (at present) received from the control plane, processing
a delta signifies we are no longer operating on a netmap fully loaded from
cache, even if most of the netmap is still in the same configuration.

Updates #12542

Change-Id: I84132c4bf2dde6e5c1c57144645edb986b051dca
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-06-12 09:05:12 -07:00
Claus Lensbøl
92ab4866d5 wgengine/magicsock: increase discoKeyAdvertisementInterval to 2 minutes (#20084)
The 1 minute timeout was hitting timers inside wireguard-go, leading
stale connections hanging forever. Increasing the timeout to 2 minutes
makes a small subset of cached connections establish direct connections
slightly slower.

Updates to wireguard-go will allow a better hook for when to send these
messages in the future. This change only makes fixes the error mode but
if we have better triggers coming in wireguard-go, we should be using
those.

Updates #20081

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-06-10 16:25:02 -04:00
Claus Lensbøl
2690d58e47 wgengine/magicsock,tstest/natlab/vmtest: only send callMeMaybe with endpoints (#20088)
9be21088f4 changed sending disco pings so
a callMeMaybe would be not be gated by endpoints existing if the node
was running off of a cached netmap.

This commit partly reverts that change, but keeps in a few bug fixes in
that commit and the tests that was introduced and now skipped.

The behaviour prior to 9be21088f4 is
retained.

Updates #20085

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-06-10 16:19:51 -04:00
M. J. Fromberger
eda975a9e4 wgengine/magicsock: emit first-netmap latency for uncached resets too (#20029)
This is a refinement of #19916. Previously, we would only emit a latency log
when going from a cached netmap to an uncached one (i.e., from the control
plane). We would like to know the latency in both conditions, though, so
instead use the validity of the previous self state.

Updates #12639
Updates tailscale/projects#27

Change-Id: I6bbeb5d3162f1f98cdb3dcd244f67ef31c170957
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-06-05 14:13:16 -07:00
Mike O'Driscoll
6a709216b9 ipn/ipnlocal,wgengine/magicsock: re-report NetInfo to new control client (#20025)
magicsock de-duplicates NetInfo callbacks against c.netInfoLast, a cache
that lives on the long-lived magicsock.Conn. That cache survives a control
client swap (interactive login or profile switch), where only the control
client (and its own per-client NetInfo dedup) is replaced. As a result, the
first netcheck after the swap produces a structurally-identical NetInfo
(same PreferredDERP, same NAT shape), magicsock suppresses it as unchanged,
and the new control session never learns our home DERP. Peers can't reach
the node over DERP until some unrelated NetInfo field happens to change.

Add Conn.ResetNetInfoLast to clear the dedup cache, and call it from
LocalBackend.setControlClientLocked whenever a control client is installed,
so the next netcheck re-reports the current NetInfo to the new client.

netInfoLast is only a dedup/optimization cache (all readers nil-guard, and
it is recomputed by every netcheck), so clearing it can only add a delivery,
never lose or misroute one; it is scoped to control-client lifecycle events,
not steady-state operation.

Updates #17887
Fixes #20024

Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
2026-06-05 13:36:00 -04:00
Brad Fitzpatrick
52400dc6f4 ipn/ipnlocal: add back a watchdog after earlier removal from engine
Commit 2b338dd6a8 removed watchdogEngine because it was weird
(so many methods) and increasingly unnecessary after we'd cleaned up
and simplified so much of the locking.

This adds back a watchdog, but an easier to maintain one that's more
idiomatic.

Updates #19759

Change-Id: I86c458473e126c0809f37696446ce7acf4cc4eb9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-02 11:57:12 -07:00
M. J. Fromberger
a3bec699dc wgengine/magicsock,types/logger: add latency logs for initial peer contacts (#19916)
In order to allow us to measure the performance effects of client-side netmap
caching, both with and without the feature enabled, add logs to record how long
it takes after a client restart or profile switch for the node to establish
contact with peers, relative to the first uncached netmap.

We do this by keeping track of a timestamp when the connection is constructed,
and logging a record for "new" peer contacts that records how long (in
microseconds) it took from the time the peer was recorded as a candidate.  The
message includes whether the contact was via DERP or direct, and whether a
cached netmap was in use at the time.

This builds on and extends the counters from #19699, but here we include new
contacts whether or not a cached netmap is in use, so that we can establish a
baseline for comparison.

Updates #12639
Updates tailscale/projects#27

Change-Id: I4f6d050e221f3881848d05a0425c4a5d1a59294c
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-06-02 07:34:17 -07:00
Simon Law
28801674a6 net/routecheck: introduce new package for checking peer reachability (#19639)
The routecheck package parallels the netcheck package, where the
former checks routes and routers while the latter checks networks.
Like netcheck, it compiles reports for other systems to consume.

Historically, the client has never known whether a peer is actually
reachable. Most of the time this doesn’t matter, since the client will
want to establish a WireGuard tunnel to any given destination.
However, if the client needs to choose between two or more nodes,
then it should try to choose a node that it can reach.

Suggested exit nodes are one such example, where the client filters
out any nodes that aren’t connected to the control plane. Sometimes an
exit node will get disconnected from the control plane: when the
network between the two is unreliable or when the exit node is too
busy to keep its control connection alive. In these cases, Control
disables the Node.Online flag for the exit node and broadcasts this
across the tailnet. Arguably, the client should never have relied on
this flag, since it only makes sense in the admin console.

This patch implements an initial routecheck client that can probe
every node that your client knows about. You should not ping scan your
visible tailnet, this method is for debugging only.

This patch also introduces a new OnNetMapToggle hook, which fires when
the netmap transitions from nil to non-nil, or vice versa. This
happens either when the client receives its first MapResponse after
connecting to the control plane, or when it clears the netmap while it
is disconnecting. Routecheck uses this to wait for a valid netmap
so it knows which peers to probe.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-01 10:33:08 -07:00
James Tucker
25b8ed8d9e control/controlknobs,net/{batching,tstun},wgengine: add nodecaps to disable UDP & TUN GRO/GSO
Add four control-plane node attributes that let us disable UDP GSO/GRO
on the magicsock UDP socket and UDP/TCP GRO on the Tailscale TUN
device.

These complement the pre-existing TS_DEBUG_DISABLE_UDP_{GRO,GSO} and
TS_TUN_DISABLE_{UDP,TCP}_GRO envknobs. They exist so we can mitigate
upstream Linux kernel regressions on a deployed fleet without
requiring a client release, after two incidents (#13041, #19777) where
buggy kernel patches landed upstream and the fix took an excessively
long time to reach downstream distros.

Knob changes are reacted to in setNetworkMapInternal / SetNetworkMap via
a comparison against a cached "last applied" value and only an actual
transition triggers work: magicsock Rebind()+ReSTUN for UDP,
ApplyGROKnobs for TUN. The TUN side is gated by buildfeatures.HasGRO and
is one-way (wireguard-go GRO disablement is sticky); re-enabling
requires a client restart.

Updates #13041
Updates #19777

Change-Id: I802993070afa659cc06809bb0bfbb7f8a0cdb273
Signed-off-by: James Tucker <james@tailscale.com>
2026-05-27 17:10:14 -07:00
Claus Lensbøl
9be21088f4 wgengine/{,magicsock},tstest/natlab/vmtest: send disco on cached netmap (#19878)
Originally found when adding tests for working with cached netmaps, and
finding the added tests to be flakey.

When working off of a cached netmap, if a node exists in the cached
netmap but does not yet have any endpoints, DERP connections are
available but not direct ones. By sending callMeMaybe to nodes
without endpoints in the cached netmap, we can establish direct
connections for this edge case.

Aditionally, ensure that TSMP disco advert messages are not sent if the
endpoint does not have a valid address yet.

Fixes #19843
Updates #19597

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-05-27 13:05:12 -04:00
Brad Fitzpatrick
aa5da2e5f2 ipn/ipnlocal, control/controlclient: process node adds/removes in constant time
For large tailnets (~50k+ nodes) with frequent peer churn (ephemeral
GitHub Actions workers etc.), tailscaled used to rebuild the full
netmap and fan it out on the IPN bus on every MapResponse that
added or removed a peer. There were two O(N) costs per delta: the
full netmap rebuild + every Notify.NetMap encode to every bus watcher.

This change tackles both:

  1. Plumb O(1) peer add/remove through the delta path. PeersChanged
     and PeersRemoved no longer prevent the delta happy path; instead,
     they mutate the per-node-backend peer map in place.

  2. Restrict ipn.Notify.NetMap emission to the platforms whose host
     GUIs still depend on it (Windows, macOS, iOS) and migrate
     in-tree consumers off it everywhere else:

     - Migrate reactive consumers (containerboot, kube agents,
       sniproxy, tsconsensus, etc.) off Notify.NetMap to the
       previously-added Notify.SelfChange signal so they no longer
       have to subscribe to the full netmap.
     - Add ipn.NotifyNoNetMap so GUI clients on "legacy-emit" platforms
       that have already migrated can opt out of the per-watcher
       NetMap encode.
     - Gate Notify.NetMap emission on the producer side by a compile-
       time GOOS check, so the supporting code is dead-code-eliminated
       on Linux and other geese where no GUI consumer needs it.

Re-running BenchmarkGiantTailnet from tstest/largetailnet, which was
added along with baseline numbers on unmodified main in ad5436af0d,
the per-delta cost (one peer add+remove pair) is now ~O(1) regardless
of tailnet size N:

    N         no-watcher (ms/op)            bus-watcher (ms/op)
              before    now     factor      before    now     factor
     10000        32   0.11       300x         166   0.13      1300x
     50000       222   0.11      2000x         865   0.13      6700x
    100000       504   0.12      4100x        1765   0.13     13400x
    250000      1551   0.12     12500x        4696   0.15     32400x

Updates #12542

Change-Id: I94e34b37331d1a8ec74c299deffadf4d061fda9e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-21 09:26:19 -07:00
Brad Fitzpatrick
2703f91174 wgengine/magicsock: fix data race in TestSetDERPMapDoReStun
SetDERPMap spawns a goroutine that calls ReSTUN, which logs via the
test logger. If the test returns before that goroutine logs, the
goroutine races with testing cleanup.

Use tstest.WhileTestRunningLogger so the goroutine's logf call becomes
a no-op once the test finishes.

Fixes #19829

Change-Id: I1097f98e40ffd1c5dd7fb7a715c918255853e3c6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-21 08:51:50 -07:00
Brad Fitzpatrick
2b338dd6a8 wgengine, cmd/tailscaled, control/controlclient: remove Engine watchdog
The Engine watchdog wrapped every wgengine.Engine method call in a
goroutine with a 45s timeout and crashed the process on timeout. It
was added years ago to surface deadlocks during development, but the
underlying deadlocks have long since been fixed, and even when it did
fire it produced obscure stack traces (from inside the watchdog
goroutine, not the original caller) without buying much.

Audit of userspaceEngine's methods shows none have cyclic locking or
unbounded blocking now that ResetAndStop no longer loops waiting for
DERPs to drain (fa49009ee). The watchdog is dead weight; remove it
along with the TS_DEBUG_DISABLE_WATCHDOG escape hatch.

Updates #19759

Change-Id: Iba9d718fe1f8718a6631296e336b138c31b99ff1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-15 16:49:28 -07:00
Fernando Serboncini
c355618e73 wgengine/router/osrouter: skip netfilter add-ons when chain setup fails (#19757)
linuxRouter has two blocks (connmark rules and the CGNAT drop rule) that
gate on cfg.NetfilterMode, the requested config state. This may cause an
error when setNetfilterModeLocked fails, since it may keep assuming this
config is valid.

We now gate both blocks on r.netfilterMode, matching the pattern used by
SNAT, stateful, and loopback paths.

Fixes #19737

Change-Id: Ia6003a082db99c376e662132d725661afbac0ee9

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-05-15 09:32:30 -04:00
Brad Fitzpatrick
fa49009eee wgengine: simplify ResetAndStop, drop drain loop
Since f343b496c3 ("wgengine, all: remove LazyWG, use wireguard-go
callback API for on-demand peers"), Reconfig is fully synchronous:
magicConn.UpdatePeers, wgdev.RemovePeer, router.Set, and dns.Set all
return when the work is done, and the peer list is updated under
wgLock before Reconfig returns. So after Reconfig with empty configs,
len(st.Peers) is already 0.

The old loop also waited for st.DERPs to drain to 0, but UpdatePeers
only edits maps; active DERP connections idle out on their own
timeout. The sole caller (LocalBackend.stopEngineAndWait) doesn't
inspect st.DERPs anyway; it just hands the Status to
setWgengineStatusLocked. So the drain-wait was for nothing observable
and could theoretically (or at least appear to readers to) loop
forever holding b.mu. Remove that reader confusion by removing
the backoff loop entirely.

Updates #19759

Change-Id: Ibfac3f0baabcad7604b713c934a8fc37932e0a50
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-14 15:45:38 -07:00
Fernando Serboncini
2a06fb66d0 cmd/cloner: preserve nil-valued entries when cloning map (#19749)
The codegen path for map-of-slice-of-pointer fields, skipped
nil-valued entries. That dropped the key from the map.

This broke how dns.Config.Routes uses nil values sentinels.

Fixes #19730
Fixes #19732
Fixes #19746
Fixes #19744

Change-Id: Ic6400227f4ab21b3ca0e8c0eeecf9b83d145a9ab

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-05-14 10:30:59 -04:00
Simon Law
e4e59a2af0 wgengine/netstack: stop inject goroutine from leaking in Impl.Start (#19721)
This patch fixes a data race in wgengine/netstack that surfaced while
running both TestTCPForwardLimits and TestTCPForwardLimits_PerClient.
Because these two tests both setup the TS_DEBUG_NETSTACK envknob, a
race happens because netstack.Impl.Close leaked its inject goroutine.
The inject goroutine also reads the TS_DEBUG_NETSTACK envknob, so if
it is still running when the next test starts, then it will break.

This patch also cleans up the tests a bit, ensuring that neither of
them run in T.Parallel. It also adds a T.Cleanup call to clear the
envknob.

Fixes #19720

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-13 08:13:40 -07:00
M. J. Fromberger
9f48567bf1 ipn/ipnlocal,wgengine/magicsock: add basic counters for cached peer connectivity (#19699)
Add new clientmetric counters for establishing contact with peers while using
cached network map data. To do this, instrument the magicsock.Conn with a bit
to indicate whether its peer data came from a cached netmap. If so, there are
two conditions we will count as establishing connectivity to a peer:

  - Receipt of a CallMeMaybe from a peer via disco.
  - Establishing a valid endpoint address for a peer.

In vmtest, add Env.ClientMetrics to scrape metrics from the specified node.
Use this to check that counters were updated in caching tests.

Updates https://github.com/tailscale/projects/issues/13
Updates #12639

Change-Id: Ie8cf3244ac8af4f5bcfe4d0d944078da2ba08990
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-05-12 12:01:05 -07:00
Brad Fitzpatrick
d06cc56987 wgengine/magicsock: add more docs, checks to Test32bitAlignment
Per recent chat with @raggi about all this, I went and looked at this
test again.

Updates #cleanup

Change-Id: Icb7d87b1ed2cebf481ee4e358a3aa603e63fb8a4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-06 15:29:44 -07:00
Brad Fitzpatrick
883d4fd2cd wgengine/netstack, net/ping: stop using pro-bing and use our net/ping instead
Fixes #19633
Fixes #13760

Change-Id: I0fa9423523a3a0fb1dfcde57de0f26e51723ff97
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-04 14:05:24 -07:00
Brad Fitzpatrick
f343b496c3 wgengine, all: remove LazyWG, use wireguard-go callback API for on-demand peers
Replace the UAPI text protocol-based wireguard configuration with
wireguard-go's new direct callback API (SetPeerLookupFunc,
SetPeerByIPPacketFunc, RemoveMatchingPeers, SetPrivateKey).

Instead of computing a trimmed wireguard config ahead of time upon
control plane updates and pushing it via UAPI, install callbacks so
wireguard-go creates peers on demand when packets arrive. This removes
all the LazyWG trimming machinery: idle peer tracking, activity maps,
noteRecvActivity callbacks, the KeepFullWGConfig control knob, and the
ts_omit_lazywg build tag.

For incoming packets, PeerLookupFunc answers wireguard-go's questions
about unknown public keys by looking up the peer in the full config.
For outgoing packets, PeerByIPPacketFunc (installed from
LocalBackend.lookupPeerByIP) maps destination IPs to node public keys
using the existing nodeByAddr index.

Updates tailscale/corp#12345

Change-Id: I4cba80979ac49a1231d00a01fdba5f0c2af95dd8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 19:46:19 -07:00
Brad Fitzpatrick
22ff402da9 wgengine/magicsock: restore SetDERPMap signature, add SetDERPMapWithoutReSTUN
Commit 78627c132f changed the signature of magicsock.Conn.SetDERPMap to
take an additional bool doReStun parameter. Avoid both the boolean
parameter and the API signature change by restoring SetDERPMap to its
original single-argument form and adding a new SetDERPMapWithoutReSTUN
method for the cache-loading caller that wants to skip the post-set
ReSTUN.

Updates #19490

Change-Id: I97d9e82156bfc546ccf59756d1ea52f039b5de06
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 12:46:15 -07:00
Claus Lensbøl
be7cce74ba wgengine/userspace: do not fall back to old key on tsmpLearned mismatch (#19575)
The mismatch behaviour of falling back to a previous key could end up
breaking connections when the netmap update took longer than the 2
seconds allowed in controlClient.auto for netmap updates, or if the
controlClient context was canceled. This could end up breaking
legitimate updates to the netmap for disco keys coming from control.

Instead, log the event, and let the connection be reset to that of the
key as that is safer.

Issue found by @bradfitz.

Updates #19574

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-04-29 13:23:04 -04:00
Claus Lensbøl
78627c132f wgengine/magicsock,ipn/ipnlocal: store and load homeDERP from cache (#19491)
With netmap caching, the home DERP of the self node was neither saved to
the cache or loaded from it, making nodes not stick to a DERP when
starting without a connection to control.

Instead, make sure that when a cache is available, load that cache,
before looking for DERP servers. This is implemented by allowing a skip
of ReSTUN in setting the DERP map (we must have a DERP map before
setting the home DERP), so the DERP from cache will set itself and be
sticky until a connection to control is established.

Making DERP only change when connected to control is handled by existing
code from f072d017bd.

Updates #19490

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-04-29 10:24:09 -04:00
Mike O'Driscoll
33342aec32 The connmark save/restore rules in mangle/PREROUTING restore the Tailscale bypass fwmark (0x80000) onto reply packets so that rp_filter's reverse-path check routes through the main table instead of table 52. However, the kernel only uses the packet's fwmark during the rp_filter lookup when net.ipv4.conf.all.src_valid_mark=1. (#19537)
On systems where this sysctl defaults to 0 (including GCP VMs), rp_filter performs its lookup with fwmark=0, hits rule 5270 then table 52 and routes to 0.0.0.0/0 dev tailscale0, and drops every reply packet arriving on the physical interface as a martian. This breaks all connectivity when using an exit node: DERP, DNS, control plane, and even the cloud metadata service.

Set src_valid_mark=1 when enabling the connmark rules so the rp_filter workaround actually works in these cases.

Updates #3310
Updates tailscale/corp#37846

Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
2026-04-27 13:52:45 -04:00
James Tucker
1b40911611 wgengine/netstack: absorb all quad-100 traffic locally, never leak to peers
Previously, handleLocalPackets intercepted traffic to the Tailscale
service IP (100.100.100.100 / fd7a:115c:a1e0::53) only for an allow-list
of ports: TCP 53/80/8080 and UDP 53. Any other port returned
filter.Accept, letting the packet fall through to the ACL filter and
wireguard-go, which would attempt a peer lookup. No peer owns the
quad-100 AllowedIP, so after ~5s pendopen.go would log:

    open-conn-track: timeout opening ...; no associated peer node

This is the common "conntrack error no peer found for 100.100.100.100:853"
log spam seen in the wild (e.g. from systemd-resolved or another
resolver speculatively trying DoT on quad-100). It also leaks quad-100
packets onto the tailnet.

Remove the port allow-list so handleLocalPackets absorbs every quad-100
packet into netstack regardless of IP protocol or port. Traffic never
reaches the conntrack / peer-routing layers.

With the allow-list gone, acceptTCP needs a corresponding guard: on a
quad-100 TCP port we don't serve, execution used to fall through to the
isTailscaleIP case (quad-100 is in the tailscale IP range), which
rewrote the dial target to 127.0.0.1:<port> and forwardTCP'd the
connection to whatever happened to be listening on the host's loopback
at that port. Add a hittingServiceIP case that RSTs cleanly instead,
placed before the isTailscaleIP fallthrough.

TestQuad100UnservedTCPPortDoesNotForward is a new integration test that
injects a TCP SYN to 100.100.100.100:853 via handleLocalPackets, stubs
forwardDialFunc, and asserts the dialer is not invoked; it catches
regressions of the acceptTCP recursion/loopback-redirection case.

Fixes #15796
Fixes #19421
Updates #3261
Updates #11305

Signed-off-by: James Tucker <james@tailscale.com>
2026-04-24 12:42:16 -07:00
Claus Lensbøl
ee76a7d3f8 wgengine/magicsock: do not send TSMP disco when connected (#19497)
When there is an active connection between devices, do not send new
disco keys via TSMP.

Updates #12639

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-04-23 12:23:57 -04:00
Brad Fitzpatrick
311dd3839d wgengine/magicsock: replace peers slice with peersByID map; add Upsert/RemovePeer
Replace Conn.peers (sorted views.Slice) with peersByID, a
map[tailcfg.NodeID]tailcfg.NodeView. The only caller that needed
the sorted slice (the disco message receive path's binary search)
becomes a single map lookup. Drop nodesEqual.

Add Conn.UpsertPeer / Conn.RemovePeer for O(1) single-peer endpoint
work. RemovePeer also performs a targeted single-disco-key cleanup
(previously that scan was O(discoInfo)).

Extract the shared per-peer upsert body as upsertPeerLocked; still
used by SetNetworkMap's bulk path. SetNetworkMap is documented as
the bulk / initial / self-change path; UpsertPeer and RemovePeer
are preferred for single-peer changes.

Make the relay server set update O(1) per peer: add serverUpsertCh
/ serverRemoveCh to relayManager with matching run-loop handlers.
UpsertPeer / RemovePeer evaluate the per-peer relay predicate
locally and dispatch upsert or remove. The full-rebuild
updateRelayServersSet stays for the initial netmap, filter
changes, and fallback.

Move the hasPeerRelayServers atomic from Conn onto relayManager,
next to the serversByNodeKey map it summarizes. The run loop is
now the single writer and needs no back-pointer to Conn;
endpoint's two hot-path readers take one extra hop to
de.c.relayManager.hasPeerRelayServers but the cost is the same
atomic load.

No callers use UpsertPeer/RemovePeer yet; a subsequent change will
plumb per-peer add/remove through the incremental map update path.

Updates #12542

Change-Id: If6a3442fe29ccbd77890ea61b754a4d1ad6ef225
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-22 15:07:11 -07:00
Alex Valiushko
d3ba1480f5 magicsock: invalidate endpoint on trust timeout (#19415)
Endpoint's best address was cleared on trustBestAddrUntil expiry
only if it was a udprelay connection. This generalizes invalidation
to also cover direct UDP.

Trust deadline is checked in two cases:

On disco ping timeout from the endpoint's best address.
Traffic goes DERP-only, heartbeats to the old address stop.
The discovery pings are still in flight, handled by the following.

On disco ping success from an alternative. BestAddr switches to the
working path, trust refreshed, eager discovery stops. The still
in flight pongs are handled by betterAddr().

Updates #19407


Change-Id: Ic41ed18edb4a6e4350a2d49271ba01566a6a6964

Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
2026-04-15 19:22:07 -07:00
Avery Pennarun
effbe67fe3 wgengine/magicsock: remove pickPort, use port 0 to avoid TOCTOU race
pickPort would bind a UDP socket on :0 to get a free port, close
the socket, then hope to rebind to the same port in NewConn. This
is a TOCTOU race that can cause flaky test failures when another
process grabs the port in between.

Instead, pass Port: 0 to NewConn and let the OS assign the port
atomically, then read back the assigned port via conn.LocalPort().

Fixes #19409

Change-Id: Ie44b599fb93c361e29a05f2171ad747c46f82b7a
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-14 18:08:47 -07:00
Naman Sood
6301a6ce4b util/linuxfw,wgengine/router: allow incoming CGNAT range traffic with nodeattr
Clients with the newly added node attribute
`"disable-linux-cgnat-drop-rule"` will not automatically drop inbound
traffic on non-Tailscale network interfaces with the source IP in the
CGNAT IP range. This is an initial proof-of-concept for enabling
connectivity with off-Tailnet CGNAT endpoints.

Fixes tailscale/corp#36270.

Signed-off-by: Naman Sood <mail@nsood.in>
2026-04-14 16:45:06 -04:00
Fernando Serboncini
5834058269 wgengine: replace reflect.DeepEqual with typed Equal for maybeReconfigInputs (#19365)
reflect.DeepEqual is expensive and allocates heavily. Replace it with
a field-by-field comparison that does zero allocations.

Adds tests and benchmarks for the new Equal method.

Fixes #19363

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-04-14 13:16:21 -04:00
Brad Fitzpatrick
6aa10576c9 wgengine/magicsock: deflake TestTwoDevicePing compare-metrics-stats
The compare-metrics-stats subtest reset two independent counting
systems (physical connection counters and expvar.Int user metrics)
non-atomically. Background WireGuard keepalives arriving between the
resets could increment one system but not the other, causing
off-by-one packet/byte mismatches in either direction.

Replace the reset-then-compare pattern with snapshot-and-delta:
snapshot both systems before pings, snapshot again after, and compare
the deltas. This eliminates the non-atomic reset window entirely.
As a belt-and-suspenders safety net, tolerate a difference of exactly
one packet (and corresponding bytes) from a stray keepalive that
could still arrive in the narrow window between the two snapshots.

flakestress passes with ~5900 runs (~2800 without -race, ~3100 with
-race) but it also passed previously too. This is an annoying one to
repro.

Fixes #11762

Change-Id: I3447ad67e71c8146e85eed38b7a665033ef9e284
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-14 06:57:24 -07:00
Brad Fitzpatrick
9fbe4b3ed2 all: fix six tests that failed with -count=2
Avery found a bunch of tests that fail with -count=2.

Updates tailscale/corp#40176 (tracks making our CI detect them)

Change-Id: Ie3e4398070dd92e4fe0146badddf1254749cca20
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Co-authored-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13 18:52:57 -07:00
Brad Fitzpatrick
50b8cfbde2 wgengine/netstack: fix data race on in-flight connection test globals
The maxInFlightConnectionAttemptsForTest and
maxInFlightConnectionAttemptsPerClientForTest globals were plain ints
read by background gVisor TCP handler goroutines (via
wrapTCPProtocolHandler) and written by tstest.Replace cleanup in
TestTCPForwardLimits_PerClient. When a gVisor goroutine outlived the
test cleanup window, the race detector caught the unsynchronized
access.

The race-prone code was introduced in c5abbcd4b4 (2024-02-26,
"wgengine/netstack: add a per-client limit for in-flight TCP
forwards") which added both the plain int globals and the
TestTCPForwardLimits_PerClient test that writes them via
tstest.Replace. It is not obvious why this has only recently started
being detected as a data race; likely some combination of gVisor
version bumps, Go toolchain scheduler changes, and additional
TCP-injecting subtests (e.g. 03461ea7f, 2026-01-30) increased
goroutine churn enough to hit the window.

Change both globals to atomic.Int32 and replace tstest.Replace (which
does non-atomic *target = old on cleanup) with explicit Store/Cleanup
pairs.

Fixes #19118

Change-Id: Id26ba6fbfb2e4ade319976db80af8e16c7c8778e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-13 15:24:35 -07:00
Amal Bansode
b4c0d67f8b wgengine/router/osrouter: fix privileged tests missing fake netfilter runner
These test failures were never caught by CI because the package in question
was missing from our privileged tests list. tailscale/corp#40007 covers improving
our process around this.

Fixes #19316

Signed-off-by: Amal Bansode <amal@tailscale.com>
2026-04-10 14:51:55 -07:00
Brad Fitzpatrick
5e81840b57 tstest: add RequireRoot helper
Start using a common helper for tests to declare that they require root.

This is step 1. A later step will then make this helper track which tests were
skipped so a subsequent pass will run these test as root.

Updates tailscale/corp#40007

Change-Id: I4979e1def0fa3691d38c83f48c89aaa443e7f62e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-10 10:48:50 -07:00
Tom Meadows
5341b26328 wgengine/netstack: allow UDP listeners to receive traffic on Service VIP addresses (#18972)
Fixes UDP listeners on VIP Service addresses not receiving inbound traffic.

- Modified shouldProcessInbound to check for registered UDP transport endpoints when processing packets to service VIPs
- Uses FindTransportEndpoint to determine if a UDP listener exists for the destination VIP/port
- Supports both IPv4 and IPv6

The aim was to mirror the existing TCP logic, providing feature parity for UDP-based services on VIP Services.

Fixes #18971

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-04-08 10:53:50 +01:00
Brad Fitzpatrick
a182b864ac tsd, all: add Sys.ExtraRootCAs, plumb through TLS dial paths
Add ExtraRootCAs *x509.CertPool to tsd.System and plumb it through
the control client, noise transport, DERP, and wgengine layers so
that platforms like Android can inject user-installed CA certificates
into Go's TLS verification.

tlsdial.Config now honors base.RootCAs as additional trusted roots,
tried after system roots and before the baked-in LetsEncrypt fallback.
SetConfigExpectedCert gets the same treatment for domain-fronted DERP.

The Android client will set sys.ExtraRootCAs with a pool built from
x509.SystemCertPool + user-installed certs obtained via the Android
KeyStore API, replacing the current SSL_CERT_DIR environment variable
approach.

Updates #8085

Change-Id: Iecce0fd140cd5aa0331b124e55a7045e24d8e0c2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-07 18:10:54 -07:00