Commit Graph

1403 Commits

Author SHA1 Message Date
Alex Chan
c3c2aa7093 all: don't repeat the the word "the" unnecessarily
Updates #cleanup

Change-Id: Ic1f430cd5dbf6cc1a385c59074a5d5cabe6fca57
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-06-18 16:32:08 +01:00
BeckyPauley
35a1a413f9 cmd/{containerboot,k8s-operator}: add 4via6 support in singleton egress (#19983)
Add support for configuring egress to destinations reachable via 4via6
subnet routes, using either the synthesized 4via6 address or the MagicDNS
name (in the form <IPv4-with-hyphens>-via-<siteID>[.*]).

Also update the Connector to validate and advertise 4via6 subnet routes.

Export net/netutil.ValidateViaPrefix so it can be reused by the Connector
validation logic.

This change only affects standalone egress proxies — ProxyGroup egress
requires IPv6 support before it can use 4via6.

Updates #19334

Change-Id: I6faecd6eb61ab55fc0cd97fe417af6b6a12fe7fc

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-06-18 16:13:10 +01:00
Naman Sood
47333e9487 feature/conn25: recreate transit IP mappings when connector loses them
Mappings from transit IPs to real IPs are stored ephemerally in the
connector, so they're lost on restart. When we send a packet to the
connector with a transit IP it does not recognize, it sends us a TSMP
message saying so (see #19883). If we (the client) know of such a
mapping, we now re-send it to the connector so that a connection can
proceed.

Fixes tailscale/corp#34256.

Signed-off-by: Naman Sood <mail@nsood.in>
2026-06-17 13:50:51 -04:00
James Tucker
26b2ed0a6a net/packet: clarify minFragBlks reuse for IPv6 and test chained ext header
Follow-up cleanups to the IPv6 fragment extension header support added in
the previous commit:

- Document that minFragBlks is sized for IPv4 but intentionally reused by
  decode6 for IPv6 fragments, where it is conservative (IPv6 fragments
  carry no per-fragment IP header) and only ever rejects more later
  fragments as Unknown, never fewer.

- Add a TestDecode case for a first fragment reachable only through a
  chained extension header (base Next Header = Hop-by-Hop Options, which
  chains to Fragment). decode6 only parses the Fragment header when it is
  the base header's immediate Next Header, so this must classify as
  Unknown. The test locks in that scoping decision.

Updates #20083
Updates #20140

Change-Id: Ibece03c6baf2385b0cc399f179819b08cbe921cc
Signed-off-by: James Tucker <james@tailscale.com>
2026-06-16 10:16:06 -07:00
Steve Avery
4c4ec3d468 net/packet,wgengine/filter: handle IPv6 fragment extension header
decode6 didn't parse the IPv6 Fragment extension header (Next Header 44),
so any source-fragmented IPv6 packet was classified as an unknown protocol
and matched no ACL rule. The filter then silently dropped it and counted it
as an "acl" drop, even on allow-all tailnets, blackholing large UDP (DNS,
WebRTC, etc.) over a tailnet's IPv6 addresses. IPv4 fragments were already
handled by decode4.

Parse the fragment header the same way: read the first fragment's transport
ports so the filter matches it like an unfragmented packet, pass later
fragments through as ipproto.Fragment, and reject overlapping-fragment
offsets (RFC 1858) and first fragments too short to hold the transport
header as unknown.

Fixes #20083

Signed-off-by: Steve Avery <hello@stevenavery.com>
2026-06-15 11:18:00 -07:00
Alex Chan
abe5fbbf49 all: make this spelling mistake non-existant
Updates #cleanup

Change-Id: I088aa91218354f6208190c8f6673f9c5a98e65fc
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-06-11 10:37:50 +01:00
BeckyPauley
60b935e30f net/dns/resolver: remove deprecated 4via6 magic-dns formats (#20057)
This removes deprecated magic-dns formats for 4via6 subnet routers.

These are superseded by the current format: Q-R-S-T-via-X.

Fixes #20053

Change-Id: I0eed1f057f856f248c4dc8ce3b751f6c7edcfbfd

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-06-09 18:10:47 +01:00
Doug Bryant
2767100bc2 net/netmon: skip RTM_MISS route messages on darwin (#20050)
macOS 26.4 emits RTM_MISS on the routing socket for every failed route
lookup. skipRouteMessage never inspected the message type, so each miss
woke the monitor as a link change and triggered a netcheck. On networks
without an IPv6 default route the netcheck's IPv6 DERP probes fail and
emit more RTM_MISS messages, sustaining the loop indefinitely: netchecks
run at roughly 40x the intended rate, with sustained probe traffic and
corresponding CPU and battery cost.

RTM_MISS scales with traffic volume, not network state, and is never
the leading signal for a topology change: route withdrawals emit
RTM_DELETE synchronously before any subsequent lookup can miss, so
ignoring it loses no signal. Other routing daemons (bird, dhcpcd, frr)
ignore it as well.

Same fix as coder/tailscale@e956a95074.

Fixes #19324

Signed-off-by: Doug Bryant <dougbryant@anthropic.com>
2026-06-08 10:45:13 -07:00
BeckyPauley
98f1ac0880 cmd/k8s-operator, net/netutil: revert 4via6 changes (#19990)
Reverts support 4via6 in egress proxy and connector (#19863)

Updates #19334

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-06-03 20:20:36 +01:00
Brendan Creane
b26dadf1b5 net/dns/resolver: skip DNS health warning when doing split DNS (#19959)
When MagicDNS is enabled but no global upstream resolvers are configured,
the forwarder only handles specific suffixes and defers other names to the
system resolver. A query it has no resolver for is expected in that case, so
don't raise the dns-forward-failing warning unless a default "." route makes
Tailscale the default resolver.

Fixes #19931

Signed-off-by: Brendan Creane <bcreane@gmail.com>
2026-06-03 09:14:48 -07:00
Brad Fitzpatrick
52400dc6f4 ipn/ipnlocal: add back a watchdog after earlier removal from engine
Commit 2b338dd6a8 removed watchdogEngine because it was weird
(so many methods) and increasingly unnecessary after we'd cleaned up
and simplified so much of the locking.

This adds back a watchdog, but an easier to maintain one that's more
idiomatic.

Updates #19759

Change-Id: I86c458473e126c0809f37696446ce7acf4cc4eb9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-02 11:57:12 -07:00
Achille Roussel
7f3bbc9865 net/netutil: add NewDefaultTransport to avoid http.DefaultTransport panics
Several packages built their HTTP transports with

    http.DefaultTransport.(*http.Transport).Clone()

The standard library only documents http.DefaultTransport as an
http.RoundTripper, so an application is free to replace it with a
RoundTripper that is not a *http.Transport (e.g. an instrumented or
tracing wrapper). When such an application embeds tsnet.Server, the
unchecked type assertion panics as soon as tsnet brings up its control
connection, DNS bootstrap, or log uploader.

Add netutil.NewDefaultTransport, which returns a clone of the global
when it is still the standard *http.Transport (preserving existing
behavior) and otherwise returns a fresh transport mirroring the stdlib
defaults. Route every clone site through it.

Updates #19937

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Achille Roussel <achille.roussel@gmail.com>
2026-06-01 12:28:36 -07:00
Simon Law
2d6844c565 cmd/tailscale/cli: add routecheck command (#19641)
Introduce a new `tailscale routecheck` command which prints a report
of high-availability routers that are reachable.

This command rhymes with the `tailscale netcheck` command and but
instead of reporting on local network conditions, `routecheck` reports
on remote connectivity.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-01 11:50:24 -07:00
Naman Sood
da51072b98 feature/conn25: send TSMP message to client for no IP mapping on connector
When a connector receives a packet from a client on a transit IP that it
can't find a real IP mapping for, it drops the packet. This commit
starts notifying the client of this dropping over TSMP, so the client
can tell the connector to re-establish the transit IP-real IP binding.

Updates tailscale/corp#34256.

Signed-off-by: Naman Sood <mail@nsood.in>
2026-06-01 14:46:27 -04:00
Simon Law
2ee9eacb94 client/local,ipn/localapi: add /localapi/v0/routecheck endpoint (#19640)
In order to support a `tailscale routecheck` command, we introduce the
`/localapi/v0/routecheck` endpoint to the local API. This endpoint
returns the most recent report collected by the routecheck client.
If `force=true` is an argument in the query string, then this endpoint
will actively probe before returning the report.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-01 11:06:14 -07:00
Simon Law
28801674a6 net/routecheck: introduce new package for checking peer reachability (#19639)
The routecheck package parallels the netcheck package, where the
former checks routes and routers while the latter checks networks.
Like netcheck, it compiles reports for other systems to consume.

Historically, the client has never known whether a peer is actually
reachable. Most of the time this doesn’t matter, since the client will
want to establish a WireGuard tunnel to any given destination.
However, if the client needs to choose between two or more nodes,
then it should try to choose a node that it can reach.

Suggested exit nodes are one such example, where the client filters
out any nodes that aren’t connected to the control plane. Sometimes an
exit node will get disconnected from the control plane: when the
network between the two is unreliable or when the exit node is too
busy to keep its control connection alive. In these cases, Control
disables the Node.Online flag for the exit node and broadcasts this
across the tailnet. Arguably, the client should never have relied on
this flag, since it only makes sense in the admin console.

This patch implements an initial routecheck client that can probe
every node that your client knows about. You should not ping scan your
visible tailnet, this method is for debugging only.

This patch also introduces a new OnNetMapToggle hook, which fires when
the netmap transitions from nil to non-nil, or vice versa. This
happens either when the client receives its first MapResponse after
connecting to the control plane, or when it clears the netmap while it
is disconnecting. Routecheck uses this to wait for a valid netmap
so it knows which peers to probe.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-01 10:33:08 -07:00
Jordan Whited
8a294e3c34 net/batching: reset Buffers len in WriteBatchTo
In case we land on this branch during a goto retry. Also, protect
Geneve offset from mutation across retries.

Fixes #19927

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-05-31 06:12:53 -07:00
Simon Law
5d935c8900 net/traffic: add fuzz test for sorting nodes by traffic score (#19893)
In PR #19682, we introduced the traffic package which provides a
traffic.Scores.SortNodes method that uses rendezvous hashing to
break ties by equally distribute the “best” node for any given client.

This PR adds a fuzzer to make sure this algorithm is not wildly unfair.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-29 11:55:49 -07:00
Jordan Whited
8b58bd6c64 net/batching: implement NodeAttrNeverGSOEqualTail
This NodeCapability works around the UDP GSO bugs introduced by
torvalds/linux@b10b446 (v7.0-rc1). These bugs were later fixed by
torvalds/linux@78effd8 and torvalds/linux@5f17ae0 (v7.1-rc5). These
Linux kernel bugs cause mangled UDP headers and UDP checksums, resulting
in high levels of packet loss.

The aforementioned bugs have already made their way downstream into
various distros, e.g. Ubuntu 26.04 LTS. Impacted users are now dealing
with poor UDP performance in tailscaled, and in any other software that
makes use of UDP GSO.

Not all users of the affected kernels are impacted as the relevant
kernel code path sits between kernel and netdev driver, and behaviors
vary by driver/device capability.

We cannot detect impact at runtime, as this would require gathering all
netdevs, and performing loopback tests. This is invasive and in many
cases impossible.

So, we are left to choose between disabling UDP GSO for all users on
affected kernels, whether they experience real impact or not, or try
and work around the bugs. Disabling UDP GSO for a user that is not
impacted can cut max throughput in half, and consume more CPU cycles.

This commit attempts to workaround the bugs by avoiding UDP GSO when
batches are small, and injecting a 1-byte sentinel tail payload when
they are large. This tail payload is smaller than "GSO size", which
sidesteps the primary trigger of all fragments in a batch being
equal in length.

The end result is slightly increased payload and packet overhead, but
functional UDP GSO for all Linux 7.0-7.1.4 users, regardless of
netdev/driver.

Updates #19777

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-05-29 11:36:35 -07:00
James Tucker
25b8ed8d9e control/controlknobs,net/{batching,tstun},wgengine: add nodecaps to disable UDP & TUN GRO/GSO
Add four control-plane node attributes that let us disable UDP GSO/GRO
on the magicsock UDP socket and UDP/TCP GRO on the Tailscale TUN
device.

These complement the pre-existing TS_DEBUG_DISABLE_UDP_{GRO,GSO} and
TS_TUN_DISABLE_{UDP,TCP}_GRO envknobs. They exist so we can mitigate
upstream Linux kernel regressions on a deployed fleet without
requiring a client release, after two incidents (#13041, #19777) where
buggy kernel patches landed upstream and the fix took an excessively
long time to reach downstream distros.

Knob changes are reacted to in setNetworkMapInternal / SetNetworkMap via
a comparison against a cached "last applied" value and only an actual
transition triggers work: magicsock Rebind()+ReSTUN for UDP,
ApplyGROKnobs for TUN. The TUN side is gated by buildfeatures.HasGRO and
is one-way (wireguard-go GRO disablement is sticky); re-enabling
requires a client restart.

Updates #13041
Updates #19777

Change-Id: I802993070afa659cc06809bb0bfbb7f8a0cdb273
Signed-off-by: James Tucker <james@tailscale.com>
2026-05-27 17:10:14 -07:00
kari-ts
1a17ec1988 net/netmon: in Android, replace system/bin/ip call with cached LinkProperties gateway (#19804)
bind() on NETLINK_ROUTE sockets does not work on Android 11+ (https://developer.android.com/identity/user-data-ids#mac-11-plus) . Since system/bin/ip uses bind(), likelyHomeRouterIPHelper() always fails on Andoroid 11+, so that GatewayAndSelfIP never caches the result, causing repeated ip process spawns on every periodic ReSTUN.

This replaces the system/bin/ip fallback with a cached gateway IP pushed from Android’s ConnectivityManager via LinkProperties.getRoutes(). This is the same patterm used by UpdateLastKnownDefaultRouteInterface for the interface name (see https://github.com/tailscale/tailscale/pull/11784/). We keep the proc/net/route path as a fallback for early startup before NetworkChangeCallback has fired.

Updates tailscale/tailscale#18622
Updates tailscale/tailscale#13352

Signed-off-by: kari-ts <kari@tailscale.com>
2026-05-27 15:42:48 -07:00
James Tucker
dea49bb4da net/batching: add envknobs to disable UDP GRO & GSO
It is sometimes useful when diagnosing subtle and specific performance
problems to rule out GRO/GSO independently and/or toggle them to
influence packet pacing.

Updates #17835
Updates tailscale/corp#31164

Signed-off-by: James Tucker <james@tailscale.com>
2026-05-27 12:05:00 -07:00
BeckyPauley
0ed6da2826 cmd/k8s-operator, net/netutil: support 4via6 in egress proxy and connector (#19863)
Add support for configuring egress to destinations reachable via 4via6
subnet routes. This change affects standalone egress proxy only- egress
ProxyGroup needs IPv6 support before being able to support 4via6. Egress may
be configured using either the synthesized 4via6 address or the MagicDNS
name (in the form
<IPv4-address-with-hyphens-instead-of-dots>-via-<siteid>[.*]).

Also update the Connector to validate and advertise 4via6 subnet routes.
Export net/netutil.ValidateViaPrefix so it can be reused by the Connector
validation logic.

Updates #19334

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-05-27 10:54:35 +01:00
Adrian Dewhurst
5d8f401956 net/dns: fix handling non-IP single split DNS
Fixes #19834

Change-Id: I4d48efed00cd080b14c6fd713ff21e53a5a6ee3c
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
2026-05-22 20:45:58 -04:00
Simon Law
7dabebc691 net/traffic: switch rendezvous hashing from SHA256 to FNV-1a (#19821)
In PR tailscale/corp#30448, we originally decided to break ties using
SHA256 for our rendezvous hashing algorithm. Now that we’ve had some
experience with it, we think that FNV-1a is a better choice. It
distributes bits evenly, it’s much faster, and it doesn’t need to be
cryptographically secure. The FNV designers recommend FNV-1a over the
deprecated FNV-1.

This PR makes the switch and updates the related tests, since changing
the algorithm changes which stable pick gets selected. As of 2026-05,
this is the best time to make this change, since there are almost no
clients in the wild with traffic steering enabled.

Updates #17366
Updates tailscale/corp#29964
Updates tailscale/corp#29966
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-21 10:11:59 -07:00
Simon Law
7ebca58042 net/traffic,ipn/ipnlocal: extract traffic steering utilities (#19682)
The traffic package contains helpers for evaluating traffic steering
scores and picking appropriate nodes. These were extracted from
ipnlocal.suggestExitNodeUsingTrafficSteering so they can be reused by
the new routecheck package to probe exit nodes in priority order.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-21 08:28:27 -07:00
Brad Fitzpatrick
f3a117e813 net/tsdial: run happy eyeballs across A and AAAA in UserDial
When tailscaled is running in userspace-networking mode behind an
exit node (e.g. as a SOCKS5 proxy), it resolves a hostname and then
dials a single resolved IP through the tunnel. If the name has both
A and AAAA, Go's net.Resolver merges them and we pick ips[0], which
on an IPv6-native host is usually AAAA. If the exit node has no IPv6
egress (or vice versa), the dial fails silently through the tunnel
and the user sees a hang.

Resolve all candidates and race connect attempts across address
families with a 300ms happy-eyeballs delay, matching Go's net.Dialer
default and the existing pattern in net/dnscache (commit ee0a03b14).
First success wins; losers are cancelled and any conns they produce
are closed. A failBoost channel wakes the launcher when a connect
fails fast (e.g. ICMP "no route" via the tunnel) so we don't sit on
the 300ms timer when the answer is already known.

userDialResolve is refactored into userDialResolveAll (returns the
full candidate list) plus a thin single-IP wrapper for callers like
UserDialPlan that don't race. UserDial's per-IP dispatch (netstack
vs peer dialer vs SystemDial vs std) is extracted to dialOneUser so
each candidate can route correctly on its own merits.

Also fix serveDial in localapi to pass the original hostname to
UserDial rather than a pre-resolved IP, so the race can fire.

This fix is single-ended: it works against any exit node, including
old ones, with no protocol changes. The trade-off versus filtering
on the exit-node side via PeerAPI DoH is that every dial through an
unreachable-family exit node costs one failed connect attempt per
cache window, rather than zero, which is acceptable given the
simplicity.

Fixes #19792
Fixes #13257

Change-Id: I9d7645d0034caf3ee22ecdd8070798353f77e94b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-20 18:35:55 -07:00
Claus Lensbøl
ee0a03b140 net/dnscache: run happy eyeballs with more than one dest IP (#19770)
If the context given to DialContext has a shorter lifetime than the OS
TCP SYN timeout, and TCP SYNs are dropped from the path to the remote,
DialContext would never fall back to try IPv6 after IPv4.

Instead, use the normal happy eyeballs race if there is more than one
address. This does remove the implicit prioritization of IPv4 over IPv6
in cases where there is only a single IPv4 remote address.

Updates #13346

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-05-19 12:59:11 -04:00
Fernando Serboncini
2a06fb66d0 cmd/cloner: preserve nil-valued entries when cloning map (#19749)
The codegen path for map-of-slice-of-pointer fields, skipped
nil-valued entries. That dropped the key from the map.

This broke how dns.Config.Routes uses nil values sentinels.

Fixes #19730
Fixes #19732
Fixes #19746
Fixes #19744

Change-Id: Ic6400227f4ab21b3ca0e8c0eeecf9b83d145a9ab

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-05-14 10:30:59 -04:00
Nick Khyl
32f984f54c net/dns: create a new hosts file if it doesn't exist on Windows
A missing hosts file is not a fatal error. We should log it, but still proceed
and create a new one instead of failing the DNS reconfiguration completely.

Fixes #19733

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2026-05-13 16:10:36 -05:00
Brad Fitzpatrick
883d4fd2cd wgengine/netstack, net/ping: stop using pro-bing and use our net/ping instead
Fixes #19633
Fixes #13760

Change-Id: I0fa9423523a3a0fb1dfcde57de0f26e51723ff97
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-04 14:05:24 -07:00
Fran Bull
bdf3419e7d net/dns: add custom scheme resolvers
If another part of the client code registers a custom scheme with the
forwarder, the forwarder will check resolver addresses to see if they
match the scheme. If they do, the corresponding custom scheme handler
will be called to find the actual address for the resolver at this
moment. If the handler returns the empty string then that resolver will
be ignored.

This is useful if you want to dynamically determine where to send
certain DNS requests. It is being added to support new app connector
(conn25) work that would like to make sure it sends DNS requests to the
current connector peer in a high availability configuration.

Updates tailscale/corp#39858

Signed-off-by: Fran Bull <fran@tailscale.com>
2026-05-01 14:01:10 -07:00
Brad Fitzpatrick
f343b496c3 wgengine, all: remove LazyWG, use wireguard-go callback API for on-demand peers
Replace the UAPI text protocol-based wireguard configuration with
wireguard-go's new direct callback API (SetPeerLookupFunc,
SetPeerByIPPacketFunc, RemoveMatchingPeers, SetPrivateKey).

Instead of computing a trimmed wireguard config ahead of time upon
control plane updates and pushing it via UAPI, install callbacks so
wireguard-go creates peers on demand when packets arrive. This removes
all the LazyWG trimming machinery: idle peer tracking, activity maps,
noteRecvActivity callbacks, the KeepFullWGConfig control knob, and the
ts_omit_lazywg build tag.

For incoming packets, PeerLookupFunc answers wireguard-go's questions
about unknown public keys by looking up the peer in the full config.
For outgoing packets, PeerByIPPacketFunc (installed from
LocalBackend.lookupPeerByIP) maps destination IPs to node public keys
using the existing nodeByAddr index.

Updates tailscale/corp#12345

Change-Id: I4cba80979ac49a1231d00a01fdba5f0c2af95dd8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-29 19:46:19 -07:00
Andrew Dunham
33714211c8 net/dns: use os.Root to prevent path traversal in darwin resolver
The darwinConfigurator writes split DNS resolver files to
/etc/resolver/$SUFFIX using os.WriteFile with string concatenation.
A crafted MatchDomain value containing path traversal sequences
(e.g. "../evil") could write files outside the resolver directory.

Use os.OpenRoot to confine all file operations in SetDNS and
removeResolverFiles to the resolver directory. os.Root rejects any
path component that escapes the root, returning an error instead of
following the traversal.

Also parametrize the resolver directory path on the struct to enable
testing with t.TempDir(), and add tests.

As far as I can tell, this would require a malicious controlplane to
exploit, but still worth fixing.

Updates tailscale/corp#39751

Signed-off-by: Andrew Dunham <andrew@tailscale.com>
2026-04-28 11:08:22 -04:00
Brad Fitzpatrick
0e10a3f580 net/tsdial, ipn/localapi, client/local: let clients dial non-Tailscale addresses directly
Add a tsdial.Dialer.UserDialPlan method that resolves an address and
reports whether the dialer would route it via Tailscale. The LocalAPI
/dial handler now uses this to skip proxying for addresses that aren't
Tailscale routes (e.g. localhost), returning a Dial-Self response with
the resolved address so the client can dial it directly. This avoids
an unnecessary round-trip through the daemon for local connections.

The client's UserDial handles the new response by dialing the resolved
address itself, and the server passes the pre-resolved IP:port for
Tailscale dials to avoid redundant DNS lookups.

Thanks to giacomo and Moyao for pointing this out!

Updates tailscale/corp#39702

Change-Id: I78d640f11ccd92f43ddd505cbb0db8fee19f43a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-27 09:33:27 -07:00
Andrew Dunham
d52ae45e9b cmd/cloner: deep-clone pointer elements in map-of-slice values
The cloner's codegen for map[K][]*V fields was doing a shallow
append (copying pointer values) instead of cloning each element.
This meant that cloned structs aliased the original's pointed-to
values through the map's slice entries.

Mirror the existing standalone-slice logic that checks
ContainsPointers(sliceType.Elem()) and generates per-element
cloning for pointer, interface, and struct types.

Regenerate net/dns and tailcfg which both had affected
map[...][]*dnstype.Resolver fields.

Fixes #19284

Signed-off-by: Andrew Dunham <andrew@tailscale.com>
2026-04-17 11:36:05 -04:00
Brad Fitzpatrick
49eb1b5d26 net/dns: fix TestDNSTrampleRecovery failure under flakestress
The test had two problems:

1. runFileWatcher passed hardcoded "/etc/" to the inotify watcher,
   but the test filesystem uses a temp directory prefix. The watcher
   was watching the real /etc/, never seeing the test's file writes.

2. The test's watchFile used gonotify.NewDirWatcher which creates
   goroutines that block on real inotify syscalls. These don't work
   inside synctest's fake-time bubble. The test only passed standalone
   by accident: gonotify walks /etc/ on startup producing fake events
   that happened to trigger trample detection at the right time.

Fix the path issue by adding ActualPath to the wholeFileFS interface,
which translates logical paths (like "/etc/resolv.conf") to real
filesystem paths (respecting any test prefix). Use it in
runFileWatcher so the inotify watch targets the correct directory.

Replace gonotify in the test with a one-shot timer that synctest can
advance through fake time, reliably triggering the trample check.

Fixes #19400

Change-Id: Idb252881ec24d0ab3b3c1d154dbdaf532db837d4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-14 06:55:35 -07:00
Brad Fitzpatrick
9fbe4b3ed2 all: fix six tests that failed with -count=2
Avery found a bunch of tests that fail with -count=2.

Updates tailscale/corp#40176 (tracks making our CI detect them)

Change-Id: Ie3e4398070dd92e4fe0146badddf1254749cca20
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Co-authored-by: Avery Pennarun <apenwarr@tailscale.com>
2026-04-13 18:52:57 -07:00
Brad Fitzpatrick
a182b864ac tsd, all: add Sys.ExtraRootCAs, plumb through TLS dial paths
Add ExtraRootCAs *x509.CertPool to tsd.System and plumb it through
the control client, noise transport, DERP, and wgengine layers so
that platforms like Android can inject user-installed CA certificates
into Go's TLS verification.

tlsdial.Config now honors base.RootCAs as additional trusted roots,
tried after system roots and before the baked-in LetsEncrypt fallback.
SetConfigExpectedCert gets the same treatment for domain-fronted DERP.

The Android client will set sys.ExtraRootCAs with a pool built from
x509.SystemCertPool + user-installed certs obtained via the Android
KeyStore API, replacing the current SSL_CERT_DIR environment variable
approach.

Updates #8085

Change-Id: Iecce0fd140cd5aa0331b124e55a7045e24d8e0c2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-07 18:10:54 -07:00
James Tucker
21695cdbf8 ipn/ipnlocal,net/netmon: make frequent darkwake more efficient
Investigating battery costs on a busy tailnet I noticed a large number
of nodes regularly reconnecting to control and DERP. In one case I was
able to analyze closely `pmset` reported the every-minute wake-ups being
triggered by bluetooth. The node was by side effect reconnecting to
control constantly, and this was at times visible to peers as well.

Three changes here improve the situation:
- Short time jumps (less than 10 minutes) no longer produce "major
  network change" events, and so do not trigger full rebind/reconnect.
- Many "incidental" fields on interfaces are ignored, like MTU, flags
  and so on - if the route is still good, the rest should be manageable.
- Additional log output will provide more detail about the cause of
  major network change events.

Updates #3363

Signed-off-by: James Tucker <james@tailscale.com>
2026-04-06 15:46:51 -07:00
Brad Fitzpatrick
86f42ea87b cmd/cloner, cmd/viewer: handle named map/slice types with Clone/View methods
The cloner and viewer code generators didn't handle named types
with basic underlying types (map/slice) that have their own Clone
or View methods. For example, a type like:

    type Map map[string]any
    func (m Map) Clone() Map { ... }
    func (m Map) View() MapView { ... }

When used as a struct field, the cloner would descend into the
underlying map[string]any and fail because it can't clone the any
(interface{}) value type. Similarly, the viewer would try to create
a MapFnOf view and fail.

Fix the cloner to check for a Clone method on the named type
before falling through to the underlying type handling.

Fix the viewer to check for a View method on named map/slice types,
so the type author can provide a purpose-built safe view that
doesn't leak raw any values. Named map/slice types without a View
method fall through to normal handling, which correctly rejects
types like map[string]any as unsupported.

Updates tailscale/corp#39502 (needed by tailscale/corp#39594)

Change-Id: Iaef0192a221e02b4b8e409c99ef8398090327744
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-05 20:20:32 -07:00
Brad Fitzpatrick
5ef3713c9f cmd/vet: add subtestnames analyzer; fix all existing violations
Add a new vet analyzer that checks t.Run subtest names don't contain
characters requiring quoting when re-running via "go test -run". This
enforces the style guide rule: don't use spaces or punctuation in
subtest names.

The analyzer flags:
- Direct t.Run calls with string literal names containing spaces,
  regex metacharacters, quotes, or other problematic characters
- Table-driven t.Run(tt.name, ...) calls where tt ranges over a
  slice/map literal with bad name field values

Also fix all 978 existing violations across 81 test files, replacing
spaces with hyphens and shortening long sentence-like names to concise
hyphenated forms.

Updates #19242

Change-Id: Ib0ad96a111bd8e764582d1d4902fe2599454ab65
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-05 15:52:51 -07:00
M. J. Fromberger
211ef67222 tailcfg,ipn/ipnlocal: regulate netmap caching via a node attribute (#19117)
Add a new tailcfg.NodeCapability (NodeAttrCacheNetworkMaps) to control whether
a node with support for caching network maps will attempt to do so. Update the
capability version to reflect this change (mainly as a safety measure, as the
control plane does not currently need to know about it).

Use the presence (or absence) of the node attribute to decide whether to create
and update a netmap cache for each profile. If caching is disabled, discard the
cached data; this allows us to use the presence of a cached netmap as an
indicator it should be used (unless explicitly overridden). Add a test that
verifies the attribute is respected. Reverse the sense of the environment knob
to be true by default, with an override to disable caching at the client
regardless what the node attribute says.

Move the creation/update of the netmap cache (when enabled) until after
successfully applying the network map, to reduce the possibility that we will
cache (and thus reuse after a restart) a network map that fails to correctly
configure the client.

Updates #12639

Change-Id: I1df4dd791fdb485c6472a9f741037db6ed20c47e
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-04-01 15:02:53 -07:00
Alex Chan
4ace87a965 net,tsnet: fix the capitalisation of "Wireshark"
See https://www.wireshark.org/; there's no intercapped S.

Updates #cleanup

Change-Id: I7c89a3fc6fb0436d0ce0e25a620bde7e310e89d2
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-03-26 19:39:29 +00:00
Alex Valiushko
330a17b7d7 net/batching: use vectored writes on Linux (#19054)
On Linux batching.Conn will now write a vector of
coalesced buffers via sendmmsg(2) instead of copying
fragments into a single buffer.

Scatter-gather I/O has been available on Linux since the
earliest days (reworked in 2.6.24). Kernel passes fragments
to the driver if it supports it, otherwise linearizes
upon receiving the data.

Removing the copy overhead from userspace yields up to 4-5%
packet and bitrate improvement on Linux with GSO enabled:
46Gb/s 4.4m pps vs 44Gb/s 4.2m pps w/32 Peer Relay client flows.

Updates tailscale/corp#36989


Change-Id: Idb2248d0964fb011f1c8f957ca555eab6a6a6964

Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
2026-03-25 16:38:54 -07:00
Greg Steuck
954a2dfd31 net/dns: fix duplicate search line entries (OpenBSD, primarily)
Fixes #12360

Signed-off-by: Greg Steuck <greg@nest.cx>
2026-03-25 10:19:02 -07:00
Jordan Whited
f0ba1f3909 net/udprelay: remove experimental label from package docs
Update #cleanup

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-03-24 10:41:17 -07:00
Jordan Whited
9c36a71a90 feature/*,net/tstun: add tundev_txq_drops clientmetric on Linux
By polling RTM_GETSTATS via netlink. RTM_GETSTATS is a relatively
efficient and targeted (single device) polling method available since
Linux v4.7.

The tundevstats "feature" can be extended to other platforms in the
future, and it's trivial to add new rtnl_link_stats64 counters on
Linux.

Updates tailscale/corp#38181

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-03-24 09:44:58 -07:00
Alex Chan
1d0fde6fc2 all: use bart.Lite instead of bart.Table where appropriate
When we don't care about the payload value and are just checking whether
a set contains an IP/prefix, we can use `bart.Lite` for the same lookup
times but a lower memory footprint.

Fixes #19075

Change-Id: Ia709e8b718666cc61ea56eac1066467ae0b6e86c
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-03-24 14:45:23 +00:00
Brendan Creane
0b4c0f2080 net/dns/resolver: treat DNS REFUSED responses as soft errors in forwarder race (#19053)
When racing multiple upstream DNS resolvers, a REFUSED (RCode 5) response
from a broken or misconfigured resolver could win the race and be returned
to the client before healthier resolvers had a chance to respond with a
valid answer. This caused complete DNS failure in cases where, e.g., a
broken upstream resolver returned REFUSED quickly while a working resolver
(such as 1.1.1.1) was still responding.

Previously, only SERVFAIL (RCode 2) was treated as a soft error. REFUSED
responses were returned as successful bytes and could win the race
immediately. This change also treats REFUSED as a soft error in the UDP
and TCP forwarding paths, so the race continues until a better answer
arrives. If all resolvers refuse, the first REFUSED response is returned
to the client.

Additionally, SERVFAIL responses from upstream resolvers are now returned
verbatim to the client rather than replaced with a locally synthesized
packet. Synthesized SERVFAIL responses were authoritative and guaranteed
to include a question section echoing the original query; upstream
responses carry no such guarantees but may include extended error
information (e.g. RFC 8914 extended DNS errors) that would otherwise
be lost.

Fixes #19024

Signed-off-by: Brendan Creane <bcreane@gmail.com>
2026-03-23 10:40:05 -07:00