Add four control-plane node attributes that let us disable UDP GSO/GRO
on the magicsock UDP socket and UDP/TCP GRO on the Tailscale TUN
device.
These complement the pre-existing TS_DEBUG_DISABLE_UDP_{GRO,GSO} and
TS_TUN_DISABLE_{UDP,TCP}_GRO envknobs. They exist so we can mitigate
upstream Linux kernel regressions on a deployed fleet without
requiring a client release, after two incidents (#13041, #19777) where
buggy kernel patches landed upstream and the fix took an excessively
long time to reach downstream distros.
Knob changes are reacted to in setNetworkMapInternal / SetNetworkMap via
a comparison against a cached "last applied" value and only an actual
transition triggers work: magicsock Rebind()+ReSTUN for UDP,
ApplyGROKnobs for TUN. The TUN side is gated by buildfeatures.HasGRO and
is one-way (wireguard-go GRO disablement is sticky); re-enabling
requires a client restart.
Updates #13041
Updates #19777
Change-Id: I802993070afa659cc06809bb0bfbb7f8a0cdb273
Signed-off-by: James Tucker <james@tailscale.com>
bind() on NETLINK_ROUTE sockets does not work on Android 11+ (https://developer.android.com/identity/user-data-ids#mac-11-plus) . Since system/bin/ip uses bind(), likelyHomeRouterIPHelper() always fails on Andoroid 11+, so that GatewayAndSelfIP never caches the result, causing repeated ip process spawns on every periodic ReSTUN.
This replaces the system/bin/ip fallback with a cached gateway IP pushed from Android’s ConnectivityManager via LinkProperties.getRoutes(). This is the same patterm used by UpdateLastKnownDefaultRouteInterface for the interface name (see https://github.com/tailscale/tailscale/pull/11784/). We keep the proc/net/route path as a fallback for early startup before NetworkChangeCallback has fired.
Updates tailscale/tailscale#18622
Updates tailscale/tailscale#13352
Signed-off-by: kari-ts <kari@tailscale.com>
It is sometimes useful when diagnosing subtle and specific performance
problems to rule out GRO/GSO independently and/or toggle them to
influence packet pacing.
Updates #17835
Updates tailscale/corp#31164
Signed-off-by: James Tucker <james@tailscale.com>
Add support for configuring egress to destinations reachable via 4via6
subnet routes. This change affects standalone egress proxy only- egress
ProxyGroup needs IPv6 support before being able to support 4via6. Egress may
be configured using either the synthesized 4via6 address or the MagicDNS
name (in the form
<IPv4-address-with-hyphens-instead-of-dots>-via-<siteid>[.*]).
Also update the Connector to validate and advertise 4via6 subnet routes.
Export net/netutil.ValidateViaPrefix so it can be reused by the Connector
validation logic.
Updates #19334
Signed-off-by: Becky Pauley <becky@tailscale.com>
In PR tailscale/corp#30448, we originally decided to break ties using
SHA256 for our rendezvous hashing algorithm. Now that we’ve had some
experience with it, we think that FNV-1a is a better choice. It
distributes bits evenly, it’s much faster, and it doesn’t need to be
cryptographically secure. The FNV designers recommend FNV-1a over the
deprecated FNV-1.
This PR makes the switch and updates the related tests, since changing
the algorithm changes which stable pick gets selected. As of 2026-05,
this is the best time to make this change, since there are almost no
clients in the wild with traffic steering enabled.
Updates #17366
Updates tailscale/corp#29964
Updates tailscale/corp#29966
Updates tailscale/corp#33033
Signed-off-by: Simon Law <sfllaw@tailscale.com>
The traffic package contains helpers for evaluating traffic steering
scores and picking appropriate nodes. These were extracted from
ipnlocal.suggestExitNodeUsingTrafficSteering so they can be reused by
the new routecheck package to probe exit nodes in priority order.
Updates #17366
Updates tailscale/corp#33033
Signed-off-by: Simon Law <sfllaw@tailscale.com>
When tailscaled is running in userspace-networking mode behind an
exit node (e.g. as a SOCKS5 proxy), it resolves a hostname and then
dials a single resolved IP through the tunnel. If the name has both
A and AAAA, Go's net.Resolver merges them and we pick ips[0], which
on an IPv6-native host is usually AAAA. If the exit node has no IPv6
egress (or vice versa), the dial fails silently through the tunnel
and the user sees a hang.
Resolve all candidates and race connect attempts across address
families with a 300ms happy-eyeballs delay, matching Go's net.Dialer
default and the existing pattern in net/dnscache (commit ee0a03b14).
First success wins; losers are cancelled and any conns they produce
are closed. A failBoost channel wakes the launcher when a connect
fails fast (e.g. ICMP "no route" via the tunnel) so we don't sit on
the 300ms timer when the answer is already known.
userDialResolve is refactored into userDialResolveAll (returns the
full candidate list) plus a thin single-IP wrapper for callers like
UserDialPlan that don't race. UserDial's per-IP dispatch (netstack
vs peer dialer vs SystemDial vs std) is extracted to dialOneUser so
each candidate can route correctly on its own merits.
Also fix serveDial in localapi to pass the original hostname to
UserDial rather than a pre-resolved IP, so the race can fire.
This fix is single-ended: it works against any exit node, including
old ones, with no protocol changes. The trade-off versus filtering
on the exit-node side via PeerAPI DoH is that every dial through an
unreachable-family exit node costs one failed connect attempt per
cache window, rather than zero, which is acceptable given the
simplicity.
Fixes#19792Fixes#13257
Change-Id: I9d7645d0034caf3ee22ecdd8070798353f77e94b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If the context given to DialContext has a shorter lifetime than the OS
TCP SYN timeout, and TCP SYNs are dropped from the path to the remote,
DialContext would never fall back to try IPv6 after IPv4.
Instead, use the normal happy eyeballs race if there is more than one
address. This does remove the implicit prioritization of IPv4 over IPv6
in cases where there is only a single IPv4 remote address.
Updates #13346
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
The codegen path for map-of-slice-of-pointer fields, skipped
nil-valued entries. That dropped the key from the map.
This broke how dns.Config.Routes uses nil values sentinels.
Fixes#19730Fixes#19732Fixes#19746Fixes#19744
Change-Id: Ic6400227f4ab21b3ca0e8c0eeecf9b83d145a9ab
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
A missing hosts file is not a fatal error. We should log it, but still proceed
and create a new one instead of failing the DNS reconfiguration completely.
Fixes#19733
Signed-off-by: Nick Khyl <nickk@tailscale.com>
If another part of the client code registers a custom scheme with the
forwarder, the forwarder will check resolver addresses to see if they
match the scheme. If they do, the corresponding custom scheme handler
will be called to find the actual address for the resolver at this
moment. If the handler returns the empty string then that resolver will
be ignored.
This is useful if you want to dynamically determine where to send
certain DNS requests. It is being added to support new app connector
(conn25) work that would like to make sure it sends DNS requests to the
current connector peer in a high availability configuration.
Updates tailscale/corp#39858
Signed-off-by: Fran Bull <fran@tailscale.com>
Replace the UAPI text protocol-based wireguard configuration with
wireguard-go's new direct callback API (SetPeerLookupFunc,
SetPeerByIPPacketFunc, RemoveMatchingPeers, SetPrivateKey).
Instead of computing a trimmed wireguard config ahead of time upon
control plane updates and pushing it via UAPI, install callbacks so
wireguard-go creates peers on demand when packets arrive. This removes
all the LazyWG trimming machinery: idle peer tracking, activity maps,
noteRecvActivity callbacks, the KeepFullWGConfig control knob, and the
ts_omit_lazywg build tag.
For incoming packets, PeerLookupFunc answers wireguard-go's questions
about unknown public keys by looking up the peer in the full config.
For outgoing packets, PeerByIPPacketFunc (installed from
LocalBackend.lookupPeerByIP) maps destination IPs to node public keys
using the existing nodeByAddr index.
Updates tailscale/corp#12345
Change-Id: I4cba80979ac49a1231d00a01fdba5f0c2af95dd8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The darwinConfigurator writes split DNS resolver files to
/etc/resolver/$SUFFIX using os.WriteFile with string concatenation.
A crafted MatchDomain value containing path traversal sequences
(e.g. "../evil") could write files outside the resolver directory.
Use os.OpenRoot to confine all file operations in SetDNS and
removeResolverFiles to the resolver directory. os.Root rejects any
path component that escapes the root, returning an error instead of
following the traversal.
Also parametrize the resolver directory path on the struct to enable
testing with t.TempDir(), and add tests.
As far as I can tell, this would require a malicious controlplane to
exploit, but still worth fixing.
Updates tailscale/corp#39751
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
Add a tsdial.Dialer.UserDialPlan method that resolves an address and
reports whether the dialer would route it via Tailscale. The LocalAPI
/dial handler now uses this to skip proxying for addresses that aren't
Tailscale routes (e.g. localhost), returning a Dial-Self response with
the resolved address so the client can dial it directly. This avoids
an unnecessary round-trip through the daemon for local connections.
The client's UserDial handles the new response by dialing the resolved
address itself, and the server passes the pre-resolved IP:port for
Tailscale dials to avoid redundant DNS lookups.
Thanks to giacomo and Moyao for pointing this out!
Updates tailscale/corp#39702
Change-Id: I78d640f11ccd92f43ddd505cbb0db8fee19f43a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The cloner's codegen for map[K][]*V fields was doing a shallow
append (copying pointer values) instead of cloning each element.
This meant that cloned structs aliased the original's pointed-to
values through the map's slice entries.
Mirror the existing standalone-slice logic that checks
ContainsPointers(sliceType.Elem()) and generates per-element
cloning for pointer, interface, and struct types.
Regenerate net/dns and tailcfg which both had affected
map[...][]*dnstype.Resolver fields.
Fixes#19284
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
The test had two problems:
1. runFileWatcher passed hardcoded "/etc/" to the inotify watcher,
but the test filesystem uses a temp directory prefix. The watcher
was watching the real /etc/, never seeing the test's file writes.
2. The test's watchFile used gonotify.NewDirWatcher which creates
goroutines that block on real inotify syscalls. These don't work
inside synctest's fake-time bubble. The test only passed standalone
by accident: gonotify walks /etc/ on startup producing fake events
that happened to trigger trample detection at the right time.
Fix the path issue by adding ActualPath to the wholeFileFS interface,
which translates logical paths (like "/etc/resolv.conf") to real
filesystem paths (respecting any test prefix). Use it in
runFileWatcher so the inotify watch targets the correct directory.
Replace gonotify in the test with a one-shot timer that synctest can
advance through fake time, reliably triggering the trample check.
Fixes#19400
Change-Id: Idb252881ec24d0ab3b3c1d154dbdaf532db837d4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Avery found a bunch of tests that fail with -count=2.
Updates tailscale/corp#40176 (tracks making our CI detect them)
Change-Id: Ie3e4398070dd92e4fe0146badddf1254749cca20
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Co-authored-by: Avery Pennarun <apenwarr@tailscale.com>
Add ExtraRootCAs *x509.CertPool to tsd.System and plumb it through
the control client, noise transport, DERP, and wgengine layers so
that platforms like Android can inject user-installed CA certificates
into Go's TLS verification.
tlsdial.Config now honors base.RootCAs as additional trusted roots,
tried after system roots and before the baked-in LetsEncrypt fallback.
SetConfigExpectedCert gets the same treatment for domain-fronted DERP.
The Android client will set sys.ExtraRootCAs with a pool built from
x509.SystemCertPool + user-installed certs obtained via the Android
KeyStore API, replacing the current SSL_CERT_DIR environment variable
approach.
Updates #8085
Change-Id: Iecce0fd140cd5aa0331b124e55a7045e24d8e0c2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Investigating battery costs on a busy tailnet I noticed a large number
of nodes regularly reconnecting to control and DERP. In one case I was
able to analyze closely `pmset` reported the every-minute wake-ups being
triggered by bluetooth. The node was by side effect reconnecting to
control constantly, and this was at times visible to peers as well.
Three changes here improve the situation:
- Short time jumps (less than 10 minutes) no longer produce "major
network change" events, and so do not trigger full rebind/reconnect.
- Many "incidental" fields on interfaces are ignored, like MTU, flags
and so on - if the route is still good, the rest should be manageable.
- Additional log output will provide more detail about the cause of
major network change events.
Updates #3363
Signed-off-by: James Tucker <james@tailscale.com>
The cloner and viewer code generators didn't handle named types
with basic underlying types (map/slice) that have their own Clone
or View methods. For example, a type like:
type Map map[string]any
func (m Map) Clone() Map { ... }
func (m Map) View() MapView { ... }
When used as a struct field, the cloner would descend into the
underlying map[string]any and fail because it can't clone the any
(interface{}) value type. Similarly, the viewer would try to create
a MapFnOf view and fail.
Fix the cloner to check for a Clone method on the named type
before falling through to the underlying type handling.
Fix the viewer to check for a View method on named map/slice types,
so the type author can provide a purpose-built safe view that
doesn't leak raw any values. Named map/slice types without a View
method fall through to normal handling, which correctly rejects
types like map[string]any as unsupported.
Updates tailscale/corp#39502 (needed by tailscale/corp#39594)
Change-Id: Iaef0192a221e02b4b8e409c99ef8398090327744
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a new vet analyzer that checks t.Run subtest names don't contain
characters requiring quoting when re-running via "go test -run". This
enforces the style guide rule: don't use spaces or punctuation in
subtest names.
The analyzer flags:
- Direct t.Run calls with string literal names containing spaces,
regex metacharacters, quotes, or other problematic characters
- Table-driven t.Run(tt.name, ...) calls where tt ranges over a
slice/map literal with bad name field values
Also fix all 978 existing violations across 81 test files, replacing
spaces with hyphens and shortening long sentence-like names to concise
hyphenated forms.
Updates #19242
Change-Id: Ib0ad96a111bd8e764582d1d4902fe2599454ab65
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a new tailcfg.NodeCapability (NodeAttrCacheNetworkMaps) to control whether
a node with support for caching network maps will attempt to do so. Update the
capability version to reflect this change (mainly as a safety measure, as the
control plane does not currently need to know about it).
Use the presence (or absence) of the node attribute to decide whether to create
and update a netmap cache for each profile. If caching is disabled, discard the
cached data; this allows us to use the presence of a cached netmap as an
indicator it should be used (unless explicitly overridden). Add a test that
verifies the attribute is respected. Reverse the sense of the environment knob
to be true by default, with an override to disable caching at the client
regardless what the node attribute says.
Move the creation/update of the netmap cache (when enabled) until after
successfully applying the network map, to reduce the possibility that we will
cache (and thus reuse after a restart) a network map that fails to correctly
configure the client.
Updates #12639
Change-Id: I1df4dd791fdb485c6472a9f741037db6ed20c47e
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
On Linux batching.Conn will now write a vector of
coalesced buffers via sendmmsg(2) instead of copying
fragments into a single buffer.
Scatter-gather I/O has been available on Linux since the
earliest days (reworked in 2.6.24). Kernel passes fragments
to the driver if it supports it, otherwise linearizes
upon receiving the data.
Removing the copy overhead from userspace yields up to 4-5%
packet and bitrate improvement on Linux with GSO enabled:
46Gb/s 4.4m pps vs 44Gb/s 4.2m pps w/32 Peer Relay client flows.
Updates tailscale/corp#36989
Change-Id: Idb2248d0964fb011f1c8f957ca555eab6a6a6964
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
By polling RTM_GETSTATS via netlink. RTM_GETSTATS is a relatively
efficient and targeted (single device) polling method available since
Linux v4.7.
The tundevstats "feature" can be extended to other platforms in the
future, and it's trivial to add new rtnl_link_stats64 counters on
Linux.
Updates tailscale/corp#38181
Signed-off-by: Jordan Whited <jordan@tailscale.com>
When we don't care about the payload value and are just checking whether
a set contains an IP/prefix, we can use `bart.Lite` for the same lookup
times but a lower memory footprint.
Fixes#19075
Change-Id: Ia709e8b718666cc61ea56eac1066467ae0b6e86c
Signed-off-by: Alex Chan <alexc@tailscale.com>
When racing multiple upstream DNS resolvers, a REFUSED (RCode 5) response
from a broken or misconfigured resolver could win the race and be returned
to the client before healthier resolvers had a chance to respond with a
valid answer. This caused complete DNS failure in cases where, e.g., a
broken upstream resolver returned REFUSED quickly while a working resolver
(such as 1.1.1.1) was still responding.
Previously, only SERVFAIL (RCode 2) was treated as a soft error. REFUSED
responses were returned as successful bytes and could win the race
immediately. This change also treats REFUSED as a soft error in the UDP
and TCP forwarding paths, so the race continues until a better answer
arrives. If all resolvers refuse, the first REFUSED response is returned
to the client.
Additionally, SERVFAIL responses from upstream resolvers are now returned
verbatim to the client rather than replaced with a locally synthesized
packet. Synthesized SERVFAIL responses were authoritative and guaranteed
to include a question section echoing the original query; upstream
responses carry no such guarantees but may include extended error
information (e.g. RFC 8914 extended DNS errors) that would otherwise
be lost.
Fixes#19024
Signed-off-by: Brendan Creane <bcreane@gmail.com>
When a client starts up without being able to connect to control, it
sends its discoKey to other nodes it wants to communicate with over
TSMP. This disco key will be a newer key than the one control knows
about.
If the client that can connect to control gets a full netmap, ensure
that the disco key for the node not connected to control is not
overwritten with the stale key control knows about.
This is implemented through keeping track of mapSession and use that for
the discokey injection if it is available. This ensures that we are not
constantly resetting the wireguard connection when getting the wrong
keys from control.
This is implemented as:
- If the key is received via TSMP:
- Set lastSeen for the peer to now()
- Set online for the peer to false
- When processing new keys, only accept keys where either:
- Peer is online
- lastSeen is newer than existing last seen
If mapSession is not available, as in we are not yet connected to
control, punt down the disco key injection to magicsock.
Ideally, we will want to have mapSession be long lived at some point in
the near future so we only need to inject keys in one location and then
also use that for testing and loading the cache, but that is a yak for
another PR.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Two methods could deadlock during shutdown when closing the wrapper.
Ensure that the writers are aware of the wrapper being closed.
Fixes#19037
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
These were previously swappable for historical reasons that are no
longer relevant.
Removing the indirection enables future inlining optimizations if we
simplify further.
Updates tailscale/corp#38703
Signed-off-by: Jordan Whited <jordan@tailscale.com>
For the purpose of improved observability of UDP socket receive buffer
overflows on Linux.
Updates tailscale/corp#37679
Signed-off-by: Jordan Whited <jordan@tailscale.com>
After switching from cellular to wifi without ipv6, ForeachInterface still sees rmnet prefixes, so HaveV6 stays true, and magicsock keeps attempting ipv6 connections that either route through cellular or time out for users on wifi without ipv6
This:
-Adds SetAndroidBindToNetworkFunc, a callback to bind the socket to the selected Android Network object
Updates tailscale/tailscale#6152
Signed-off-by: kari-ts <kari@tailscale.com>
ReadFromUDPAddrPort worked if UDP GRO was unsupported, but we don't
actually want attempted usage, nor does any exist today. Future work
on tailscale/corp#37679 would have required more complexity in this
method, vs clarifying the API intents.
Updates tailscale/corp#37679
Signed-off-by: Jordan Whited <jordan@tailscale.com>
I omitted a lot of the min/max modernizers because they didn't
result in more clear code.
Some of it's older "for x := range 123".
Also: errors.AsType, any, fmt.Appendf, etc.
Updates #18682
Change-Id: I83a451577f33877f962766a5b65ce86f7696471c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Fix its/it's, who's/whose, wether/whether, missing apostrophes
in contractions, and other misspellings across the codebase.
Updates #cleanup
Change-Id: I20453b81a7aceaa14ea2a551abba08a2e7f0a1d8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
OpenWrt is changing to using alpine like `apk` for package installation
over its previous opkg. Additionally, they are not using the same repo
files as alpine making installation fail.
Add support for the new repository files and ensure that the required
package detection system uses apk.
Updates #18535
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Go 1.26's url.Parser is stricter and made our tests elsewhere fail
with this scheme because when these listen addresses get shoved
into a URL, it can't parse back out.
I verified this makes tests elsewhere pass with Go 1.26.
Updates #18682
Change-Id: I04dd3cee591aa85a9417a0bbae2b6f699d8302fa
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
runtime.NumCPU() returns the number of CPUs on the host, which in
containerized environments is the node's CPU count rather than the
container's CPU limit. This causes excessive memory allocation in
pods with low CPU requests running on large nodes, as each socket's
packetReadLoop allocates significant buffer memory.
Use runtime.GOMAXPROCS(0) instead, which is container-aware since
Go 1.25 and respects CPU limits set via cgroups.
Fixes#18774
Signed-off-by: Daniel Pañeda <daniel.paneda@clickhouse.com>
PR #18860 adds firewall rules in the mangle table to save outbound packet
marks to conntrack and restore them on reply packets before the routing
decision. When reply packets have their marks restored, the kernel uses
the correct routing table (based on the mark) and the packets pass the
rp_filter check.
This makes the risk check and reverse path filtering warnings unnecessary.
Updates #3310Fixestailscale/corp#37846
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
To be less spammy in stable, add a nob that disables the creation and
processing of TSMPDiscoKeyAdvertisements until we have a proper rollout
mechanism.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Windows and macOS are not covered by this change, as neither have safely
distinct names to make it easy to do so. This covers the requested case
on Linux.
Updates #18824
Signed-off-by: James Tucker <james@tailscale.com>
When an exit node has been set and a new default route is added,
create a new rtable in the default rdomain and add the current
default route via its physical interface. When control() is
requesting a connection not go through the exit-node default route,
we can use the SO_RTABLE socket option to force it through the new
rtable we created.
Updates #17321
Signed-off-by: joshua stein <jcs@jcs.org>