Commit Graph

10695 Commits

Author SHA1 Message Date
M. J. Fromberger
e8be3b7989 wgengine/magicsock,types/logger: add latency logs for initial peer contacts
In order to allow us to measure the performance effects of client-side netmap
caching, both with and without the feature enabled, add logs to record how long
it takes after a client restart or profile switch for the node to establish
contact with peers, relative to the first uncached netmap.

We do this by keeping track of a timestamp when the connection is constructed,
and logging a record for "new" peer contacts that records how long (in
microseconds) it took from the time the peer was recorded as a candidate.  The
message includes whether the contact was via DERP or direct, and whether a
cached netmap was in use at the time.

This builds on and extends the counters from #19699, but here we include new
contacts whether or not a cached netmap is in use, so that we can establish a
baseline for comparison.

Updates #12639
Updates tailscale/projects#27

Change-Id: I4f6d050e221f3881848d05a0425c4a5d1a59294c
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
2026-05-29 14:38:06 -07:00
Brad Fitzpatrick
412c812d76 ipn/ipnlocal: use ACME ALPN for authorized Funnel non-CertDomain domains
If a user explicitly adds a non-ts.net (not a CertDomain domain) domain
like "foo.com" to their serve config as a web target that's also an allowed
funnel domain (using raw "tailscale serve set-config"), then use the new
ALPN cert fetching (from b553969b) to get certs for that domain.

This is just plumbing; there's no new product functionality to
actually enable this easily client-side, and it also has no visible
product surface to enable it server-side.

Updates tailscale/corp#41736

Change-Id: Ie2e421ac9611bce64bba3de6a454b2d505ea0e8a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-28 13:33:45 -07:00
Tom Proctor
788a49eca5 .github/workflows: run vet on GitHub-hosted runners (#19913)
The github-ci-vm machine that runs our self-hosted CI for this repo is
only designed for the `vm` job in test.yml. That uses a different cache
dir which is causing github-ci-vm's small disk to fill up. Switch to
ubuntu 24.04 like the rest of our CI for this repo that doesn't require
anything special.

Updates tailscale/corp#40465

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2026-05-28 21:30:46 +01:00
James Tucker
524a374f01 tsnet: wait for peer in netmap before pinging in setupTwoClientTest
If we dispatch a ping too early (after a later patch removes a 250ms
blockage) then the ping may be lost due to the peers not yet knowing
about each other. The ping is retained in order to setup and ensure a
wireguard session prior to test flow.

Updates #19822

Change-Id: I6cfea28931646a9387b6ffc2654e72cd846f4e55
Signed-off-by: James Tucker <james@tailscale.com>
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-28 11:27:54 -07:00
Brad Fitzpatrick
c086992f4f cmd/tailscale/cli: add whoami subcommand
Add a "tailscale whoami" subcommand that is equivalent to running
"tailscale whois $(tailscale ip -4)" but more ergonomic. It supports
the --json flag just like whois, and shares the WhoIsResponse
rendering code with whois.

Fixes #19907

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I8f33ba7a5608bab7dffa8213303beb5f345936d3
2026-05-28 10:49:17 -07:00
Alex Chan
9d126aec34 all: remove network lock references from private method names
Updates tailscale/corp#37904

Change-Id: I312d46d958209ca3d1152d1877fb91a57c91798d
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-28 18:00:36 +01:00
Brendan Creane
8d90a6ab1e ipn/ipnlocal: add HTTP/2 Content-Type tests for serve reverse proxy (#19905)
Adds two tests exercising the HTTP/2-inbound -> plaintext HTTP/1.1 backend
path through serve's reverseProxy and through the full serveWebHandler
entry point (with a funnel serveHTTPContext).

Updates #19866

Signed-off-by: Brendan Creane <bcreane@gmail.com>
2026-05-28 09:46:36 -07:00
Alex Chan
f4a280cdbd all: update a few more references to network/tailnet lock
Updates tailscale/corp#37904

Change-Id: I746b06328e080fa2b9ff28a2d099f95645aa3d0b
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-28 16:44:16 +01:00
Alex Chan
446ae97491 ipn: improve --exit-node hostname error during startup
When parsing the `tailscale up --exit-node=ARG` argument, we try to
resolve hostnames by searching the list of peers. However, at startup,
the peer list is empty, causing hostname lookups to trivially fail with
an unhelpful "invalid value" erorr.

Improve the error message when the peer list is empty to inform the user
that hostnames cannot be resolved during startup, and advise them to use
the exit node's Tailscale IP address instead.

Also, clarify that hostnames must be peer hostnames, not arbitrary
hostnames.

Fixes #19882

Change-Id: I9390a427c2863d657cf46c5e33b43cb3c5363764
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-28 16:43:45 +01:00
dragondscv
4b8115bb2c cmd/containerboot: clamp MSS to PMTU for proxy group pods (#19686)
Single-pod ingress/egress proxies already called ClampMSSToPMTU when
setting up forwarding rules, but the proxy group (HA) code paths in
egressservices.go and ingressservices.go did not. This caused TCP
connections through proxy group pods to suffer from MSS/MTU mismatch
issues in environments where path MTU discovery is not working.

Add ClampMSSToPMTU calls in the egress sync loop (alongside the existing
EnsureSNATForDst call) and in addDNATRuleForSvc (alongside the existing
EnsureDNATRuleForSvc call), mirroring what the single-pod forwarding
rules already do.

Also add MSS clamping assertions to TestSyncIngressConfigs and track
ClampMSSToPMTU calls in FakeNetfilterRunner.

Fixes issue #19812 https://github.com/tailscale/tailscale/issues/19812.
Tracking internal ticket TSS-86326.

Signed-off-by: Jay Tung <ltung@crusoeenergy.com>
Co-authored-by: Jay Tung <ltung@crusoeenergy.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 12:57:38 +01:00
Brad Fitzpatrick
782c73bf41 cmd/containerboot: fix data race in TestContainerBoot
Parallel subtests share *ipn.Notify pointers (e.g. runningNotify).
When multiple subtests reached the same phase concurrently, they
all wrote to the shared notify's InitialStatus field without
synchronization, triggering the race detector.

Fix by shallow-copying *ipn.Notify before setting InitialStatus,
so each test iteration works on its own copy.

Updates #19380

Change-Id: I9dd40037e02146166f006f4f7c1ddcc47adba191
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 18:40:03 -07:00
James Tucker
25b8ed8d9e control/controlknobs,net/{batching,tstun},wgengine: add nodecaps to disable UDP & TUN GRO/GSO
Add four control-plane node attributes that let us disable UDP GSO/GRO
on the magicsock UDP socket and UDP/TCP GRO on the Tailscale TUN
device.

These complement the pre-existing TS_DEBUG_DISABLE_UDP_{GRO,GSO} and
TS_TUN_DISABLE_{UDP,TCP}_GRO envknobs. They exist so we can mitigate
upstream Linux kernel regressions on a deployed fleet without
requiring a client release, after two incidents (#13041, #19777) where
buggy kernel patches landed upstream and the fix took an excessively
long time to reach downstream distros.

Knob changes are reacted to in setNetworkMapInternal / SetNetworkMap via
a comparison against a cached "last applied" value and only an actual
transition triggers work: magicsock Rebind()+ReSTUN for UDP,
ApplyGROKnobs for TUN. The TUN side is gated by buildfeatures.HasGRO and
is one-way (wireguard-go GRO disablement is sticky); re-enabling
requires a client restart.

Updates #13041
Updates #19777

Change-Id: I802993070afa659cc06809bb0bfbb7f8a0cdb273
Signed-off-by: James Tucker <james@tailscale.com>
2026-05-27 17:10:14 -07:00
Brad Fitzpatrick
94af1b00fb cmd/testwrapper, tstest: move test sharding out of test code
Previously, sharding required tests to opt in by calling tstest.Shard,
which used a process-global counter to assign each test to a shard.
This had two problems: most tests didn't call it, so they ran on every
shard (defeating the purpose), and shard assignments were unstable
(depended on call order, so adding a test could reshuffle others).

Remove tstest.Shard and tstest.SkipOnUnshardedCI entirely. Instead,
have testwrapper implement sharding automatically for all tests: when
TS_TEST_SHARD=N/M is set, it uses "go list -json" (no compilation) to
find test source files, scans them for top-level Test/Benchmark/
Example/Fuzz function names, and filters by fnv32a(name) % M == N-1.
The filtered names are passed as an anchored -run regex to go test.

Using go list instead of "go test -list" avoids linking the test binary
twice (Go's build cache does not cache test binary linking).

Fixes #19886

Change-Id: I62ab7b3d757324d4c5fd0b5de50c1e3742681791
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 16:53:17 -07:00
James Scott
db60aa8eca logtail: gate "logtail started" behind TS_DEBUG_LOGTAIL envknob (#19891)
Gates the unnecessary "logtail started" message behind
the debug envknob TS_DEBUG_LOGTAIL. This is extra log spam that isn't
needed unless we are debugging.

Updates tailscale/corp#40908

Signed-off-by: James Scott <jim@tailscale.com>
2026-05-27 15:48:44 -07:00
kari-ts
1a17ec1988 net/netmon: in Android, replace system/bin/ip call with cached LinkProperties gateway (#19804)
bind() on NETLINK_ROUTE sockets does not work on Android 11+ (https://developer.android.com/identity/user-data-ids#mac-11-plus) . Since system/bin/ip uses bind(), likelyHomeRouterIPHelper() always fails on Andoroid 11+, so that GatewayAndSelfIP never caches the result, causing repeated ip process spawns on every periodic ReSTUN.

This replaces the system/bin/ip fallback with a cached gateway IP pushed from Android’s ConnectivityManager via LinkProperties.getRoutes(). This is the same patterm used by UpdateLastKnownDefaultRouteInterface for the interface name (see https://github.com/tailscale/tailscale/pull/11784/). We keep the proc/net/route path as a fallback for early startup before NetworkChangeCallback has fired.

Updates tailscale/tailscale#18622
Updates tailscale/tailscale#13352

Signed-off-by: kari-ts <kari@tailscale.com>
2026-05-27 15:42:48 -07:00
Brad Fitzpatrick
c9fb05b6f5 ipn/ipnlocal: don't dup-suppress UserProfiles on IPNBus on profile switches
Fixes #19889

Change-Id: I324a735c13772c0c79ed7392c0baa5064b34823b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 14:47:02 -07:00
Brad Fitzpatrick
364b952d62 cmd/containerboot: track peers from IPN bus updates, stop using netmap.NetworkMap
Some tests in another repo were broken by tailscale/tailscale#19607.
This fixes them, by finishing off the rest of the migration away from
netmap.NetworkMap on the IPN bus in containerboot.

Containerboot used to rebuild a full NetworkMap-shaped view while
reacting to IPN bus notifications. Now it insteads has its own
netmapState type (immutable) of exactly what it needs to track, and
sends those immutable values around, making cheap edits of new
immutable values when an IPN bus edit arrives.

This should make cmd/containerboot scale to much larger tailnets now too.

Fixes #19852
Fixes tailscale/corp#42347
Updates #12542

Change-Id: I88adaf061f85f677f954a764935e6654329d75a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 14:12:48 -07:00
Fran Bull
80dc7a8d07 feature/conn25: disallow addrs assignment overwriting.
We don't want addr assignments to be lost from the collection before
they can be returned to the IP pools, otherwise we will get orphan
addresses marked inUse in the pools that will never be returned.

Fixes tailscale/corp#39975

Signed-off-by: Fran Bull <fran@tailscale.com>
2026-05-27 13:54:40 -07:00
Patrick O'Doherty
8501be1990 go.mod: bump dependencies to resolve govulncheck warnings (#19884)
Bump the following:
  go get -u github.com/moby/spdystream@v0.5.1
  go get -u golang.org/x/crypto@v0.52.0
  go get -u golang.org/x/net@v0.55.0

to resolve open govulncheck warnings.

Updates #cleanup

Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
2026-05-27 12:24:59 -07:00
James Tucker
dea49bb4da net/batching: add envknobs to disable UDP GRO & GSO
It is sometimes useful when diagnosing subtle and specific performance
problems to rule out GRO/GSO independently and/or toggle them to
influence packet pacing.

Updates #17835
Updates tailscale/corp#31164

Signed-off-by: James Tucker <james@tailscale.com>
2026-05-27 12:05:00 -07:00
James Tucker
d1912167dc feature/taildrop: replace outgoing-file progress channel with synchronous reporter
serveFilePut tracked outgoing-file progress through an unbuffered
progressUpdates channel whose close was owned by the request goroutine
while writers were spread across manifest parsing, the
progresstracking.Reader callback, singleFilePut failure paths, and the
success path. That writer-closes mismatch made the
send-on-closed-channel panic effectively unfixable in place.

Replace it with a request-scoped outgoingProgress reporter. Transfer
code reports state by method call; the reporter coalesces hot-path
updates and is flushed once via defer in serveFilePut. With no
producer channel to close, the panic is structurally impossible.

Fixes #19115
Fixes #19817

Change-Id: I8f00d982d2c79880dfc1f8104c5eed06e94b5a6c
Signed-off-by: James Tucker <james@tailscale.com>
2026-05-27 12:00:34 -07:00
Brad Fitzpatrick
f277bfb09d release/dist/synology: add GOARM=7,softfloat mode for hi3535
Fixes #6860

Change-Id: I36f3101e75dab35d03e76693555ac93da893f8d5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 10:54:15 -07:00
Claus Lensbøl
9be21088f4 wgengine/{,magicsock},tstest/natlab/vmtest: send disco on cached netmap (#19878)
Originally found when adding tests for working with cached netmaps, and
finding the added tests to be flakey.

When working off of a cached netmap, if a node exists in the cached
netmap but does not yet have any endpoints, DERP connections are
available but not direct ones. By sending callMeMaybe to nodes
without endpoints in the cached netmap, we can establish direct
connections for this edge case.

Aditionally, ensure that TSMP disco advert messages are not sent if the
endpoint does not have a valid address yet.

Fixes #19843
Updates #19597

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-05-27 13:05:12 -04:00
Brad Fitzpatrick
b553969b03 ipnlocal: try ACME TLS-ALPN for Funnel renewals
Use TLS-ALPN-01 for Funnel certificate renewals only when the node
already has a cached certificate, and fall back to DNS-01 with a fresh
order if the ALPN path is unavailable or fails.

Dynamically advertise acme-tls/1 only while an ACME challenge
certificate is pending, and add client metrics for DNS-01 and
TLS-ALPN-01 start/success/failure paths.

Updates tailscale/corp#41736
Fixes tailscale/corp#42320

Change-Id: I5adc6ea129237f9ef592f84fc1a8953c80bc9d5c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 09:30:23 -07:00
Jordan Whited
4aef023765 cmd/tailscaled,types/logger: remove TS_DEBUG_MEMORY and associated logger
Commit e5a8cf3b1 added feature/runtimemetrics, which emits heap bytes
and total process memory as clientmetrics when the
NodeAttrEmitRuntimeMetrics capability is set. That subsumes the job of
the TS_DEBUG_MEMORY envknob, whose only effect is to prefix every log
line with Go heap+stack and Maxrss via logger.RusagePrefixLog.

Updates tailscale/corp#39434

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-05-27 09:09:05 -07:00
Artem Leshchev
5652b6c9c0 cmd/k8s-operator: fix token exchange for identity federation (#19845)
tailscale-client-go-v2 natively supports identity federation authentication,
and in #19010 the required authentication provider is used, but the manual
token exchange was never removed, so we were exchanging JWT token to an auth
token, and then were trying to use that auth token for exchange once again.
This commit removes the legacy mechanism, fully relying on
tailscale-client-go-v2 to handle authentication.

Fixes #19844

Signed-off-by: Artem Leshchev <matshch@avride.ai>
2026-05-27 16:45:07 +01:00
License Updater
77010351f0 licenses: update license notices
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
2026-05-27 08:38:44 -07:00
Brad Fitzpatrick
2c965ab540 types/netmap, ipn/ipnlocal, control/controlclient: rename NodeMutationAdd to NodeMutationUpsert
NodeMutationAdd was a misleading name: a PeersChanged entry in a
MapResponse can represent either a truly new peer or a full
replacement for an existing peer that couldn't be expressed as a
PeerChangedPatch. Calling it "Add" implied it was always a completely
new node, which is wrong.  (I'd changed my mind on the design of
mapping add/delete events to NodeMutations halfway through #19607 and
forgot to update the name, even though I'd updated half the docs)

Rename it to NodeMutationUpsert to reflect the actual semantics: the
node should be inserted or replaced in the peer map regardless of
whether it already existed.

Updates #19607
Updates #12542

Change-Id: Iebd3daddb3318cba02e115a1b184fcb3ee8f83d6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 08:37:14 -07:00
Brad Fitzpatrick
a8f40a2ca5 ipn/ipnlocal: add missing bus notify of peers on full netmap
The prior aa5da2e5f2 ("process node adds/removes in constant
time") commit missed a bus notification case, where new-style
subscribers set NotifyNoNetmap and then the controlclient map routing
sends a full update (rather than a delta). Those profiles + peers
need to be put on the bus too.

I noticed this only when porting the Android app over to use the
new bus stuff.

Updates #19607
Updates #12542

Change-Id: I82c35011d2c532222ca27f7d4e790522c31bd156
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 08:03:47 -07:00
Jason Dillingham
0e2b3f31af cmd/k8s-operator: stabilize StaticEndpoints order in ProxyGroup reconciles (#19755)
findStaticEndpoints built its return slice by iterating nodes.Items in
the order returned by r.List, which is not guaranteed to be stable
across calls. When the resulting set of addresses already matched the
existing config Secret, the slice could still permute between
reconciles, making the marshalled config Secret differ byte-for-byte.
That tripped the DeepEqual check on the config Secret, which rewrote
the Secret, which fired a watch event, which re-enqueued the
ProxyGroup, looping forever.

Detect this case and return the existing currAddrs slice unchanged
when the resulting set is the same, preserving the "use the currently
used IPs first" intent without spurious writes.

Fixes #19700

Signed-off-by: Jason Dillingham <jasonmdillingham@gmail.com>
2026-05-27 14:28:04 +01:00
Erisa A
e2a0d45418 cmd/tailscale/cli: fix time parsing in debug daemon-logs (#19875)
Fixes #19874

Signed-off-by: Erisa A <erisa@tailscale.com>
2026-05-27 12:30:28 +01:00
BeckyPauley
0ed6da2826 cmd/k8s-operator, net/netutil: support 4via6 in egress proxy and connector (#19863)
Add support for configuring egress to destinations reachable via 4via6
subnet routes. This change affects standalone egress proxy only- egress
ProxyGroup needs IPv6 support before being able to support 4via6. Egress may
be configured using either the synthesized 4via6 address or the MagicDNS
name (in the form
<IPv4-address-with-hyphens-instead-of-dots>-via-<siteid>[.*]).

Also update the Connector to validate and advertise 4via6 subnet routes.
Export net/netutil.ValidateViaPrefix so it can be reused by the Connector
validation logic.

Updates #19334

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-05-27 10:54:35 +01:00
Jordan Whited
e5a8cf3b18 control/controlknobs,feature/*,ipn/ipnlocal,tailcfg: add runtimemetrics
Emit runtime metrics as clientmetrics when the
NodeAttrEmitRuntimeMetrics NodeCapability is present.

We start small with just 2 metrics: heap bytes and total process memory.

Updates tailscale/corp#39434

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-05-26 16:02:01 -07:00
Fran Bull
2eb45c2457 feature/conn25: extend assignment expiry on use
When we use assigned addresses in response to a DNS request, extend the
expiry on the assignment.

Updates tailscale/corp#39975

Signed-off-by: Fran Bull <fran@tailscale.com>
2026-05-26 07:28:47 -07:00
Michael Ben-Ami
5877809097 feature/conn25: unify FlowTable storage to prepare for expiry
Previously we had two maps keyed on a direction-specific tuple, with
distinct values containing the data (action) for that direction.
Values pointed at each other across maps to ensure they were removed
at the same time in the case of tuple overwrite, but LRU eviction
was per-map. So if LRU was turned on, it was possible for one
direction's data (action) to be evicted and leave the other direction
dangling.

NewFlow replaces the two direction-specific flow constructors, and
lookups return the direction-specific PacketAction directly.

Now the values in each map point to the same element, with data for both
directions in the element. A linked list also points to the elements to
implement LRU. The previous flowtrack.Cache is removed.

The single LRU structure will allow us to implement idle time expiration
by walking the list backward starting with the least recently used flow, and
stopping after a fixed number of flows, or at the first non-expired flow.

We add commented-out unused placeholder fields for tracking the
"last seen" timestamp, and an on-removal hook, to document the intent for
the follow-up expiry work.

Updates tailscale/corp#38630

Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
2026-05-26 10:09:48 -04:00
Yago Raña Gayoso
26952d53fa scripts/installer.sh: update KDE Linux link (#19857)
Signed-off-by: Yago Raña Gayoso <yago.rana.gayoso@gmail.com>
2026-05-24 21:40:42 +01:00
Simon Law
da8cd5cc7f ipn/ipnlocal: fix documentation typo, NodeAttrCacheNetworkMaps (#19851)
Updates #cleanup

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-22 22:19:10 -07:00
Simon Law
988615dbad ipn/ipnlocal,tstest/integration: pause the control client consistently (#19846)
There are two places where tailscaled transitions into a paused state:
1. tailscaled’s controlclient is initially created,
2. tailscale down, or the GUI equivalent, commands it to.

This patch unifies the implementation of both scenarios into
LocalBackend.shouldPauseControlClientLocked to prevent the
implementation from drifting.

The flaky tstest/integration.TestNoControlConnWhenDown test exposed
this mismatch, but only by accident. This patch also changes
TestNode.MustDown so that it runs `tailscale down` and then waits for
the testcontrol server to finish handling any associated /machine/map
requests.

Fixes #19831

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-22 17:58:44 -07:00
Adrian Dewhurst
5d8f401956 net/dns: fix handling non-IP single split DNS
Fixes #19834

Change-Id: I4d48efed00cd080b14c6fd713ff21e53a5a6ee3c
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
2026-05-22 20:45:58 -04:00
Brad Fitzpatrick
5295e3e119 ipn/{ipnstate,ipnlocal}: add integer NodeID to PeerStatus
In aa5da2e5f2 we made the IPN bus include deltas, including the
PeersRemoved, sending a slice of integer NodeIDs that were
removed. But when updating xcode, I realized there was no way to map
those integers to the stable node IDs used in other places.

I was consdering changing the just-added ipn.Notify.PeersRemoved from
an IntID to a string StableID, but then it doesn't match the MapResponse
wire protocol, which we've tried to match so far.

Instead, just add the integer ID as well. Callers can use whichever
world they want, having both. It's a little regrettable that we still
have two worlds of IDs, but oh well. Neither is really suitable to a
hypothetical future fully federated world of control servers anyway,
so we'll need a third type later anyway, so just live with the two we
have for now.

Updates #12542

Change-Id: Ib8fd48a265e1da1f8779152f141f624a7f7260e9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-22 08:16:55 -07:00
Amal Bansode
e32b9bde1d control/controlclient: fix deadlock in map session change queue processing (#19828)
Holding an exclusive lock while writing to the unbuffered changequeue chan
is likely going to deadlock when the run() path may try to grab the same lock
before reading from the chan to drain it (on map session close). This causes
the client to stop processing new map responses and TSMP disco key advertisements.

There is a good probability of inducing this deadlock using the old code and new
test added in this commit: TestUpdateDiscoForNodeCallback/test_deadlock.

Also fix an unintentional regression in how the client responds to a mapResponse sleep
command. 85bb5f84a5 moved the processing of mapResponses into a new goroutine,
serialized via mapSession's changequeue. Thus, controlclient stopped sleeping in the
same goroutine servicing mapResponses/control connections. This commit brings us back
to sleeping synchronously in the same goroutine as controlclient.

Updates #12639

Signed-off-by: Amal Bansode <amal@tailscale.com>
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Co-authored-by: Claus Lensbøl <claus@tailscale.com>
2026-05-22 07:13:18 -07:00
Simon Law
fd2405ca8f tstest/integration: mark TestNoControlConnWhenDown as a flaky test (#19832)
Updates #19831

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-21 17:36:09 -07:00
Simon Law
7dabebc691 net/traffic: switch rendezvous hashing from SHA256 to FNV-1a (#19821)
In PR tailscale/corp#30448, we originally decided to break ties using
SHA256 for our rendezvous hashing algorithm. Now that we’ve had some
experience with it, we think that FNV-1a is a better choice. It
distributes bits evenly, it’s much faster, and it doesn’t need to be
cryptographically secure. The FNV designers recommend FNV-1a over the
deprecated FNV-1.

This PR makes the switch and updates the related tests, since changing
the algorithm changes which stable pick gets selected. As of 2026-05,
this is the best time to make this change, since there are almost no
clients in the wild with traffic steering enabled.

Updates #17366
Updates tailscale/corp#29964
Updates tailscale/corp#29966
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-21 10:11:59 -07:00
Brad Fitzpatrick
aa5da2e5f2 ipn/ipnlocal, control/controlclient: process node adds/removes in constant time
For large tailnets (~50k+ nodes) with frequent peer churn (ephemeral
GitHub Actions workers etc.), tailscaled used to rebuild the full
netmap and fan it out on the IPN bus on every MapResponse that
added or removed a peer. There were two O(N) costs per delta: the
full netmap rebuild + every Notify.NetMap encode to every bus watcher.

This change tackles both:

  1. Plumb O(1) peer add/remove through the delta path. PeersChanged
     and PeersRemoved no longer prevent the delta happy path; instead,
     they mutate the per-node-backend peer map in place.

  2. Restrict ipn.Notify.NetMap emission to the platforms whose host
     GUIs still depend on it (Windows, macOS, iOS) and migrate
     in-tree consumers off it everywhere else:

     - Migrate reactive consumers (containerboot, kube agents,
       sniproxy, tsconsensus, etc.) off Notify.NetMap to the
       previously-added Notify.SelfChange signal so they no longer
       have to subscribe to the full netmap.
     - Add ipn.NotifyNoNetMap so GUI clients on "legacy-emit" platforms
       that have already migrated can opt out of the per-watcher
       NetMap encode.
     - Gate Notify.NetMap emission on the producer side by a compile-
       time GOOS check, so the supporting code is dead-code-eliminated
       on Linux and other geese where no GUI consumer needs it.

Re-running BenchmarkGiantTailnet from tstest/largetailnet, which was
added along with baseline numbers on unmodified main in ad5436af0d,
the per-delta cost (one peer add+remove pair) is now ~O(1) regardless
of tailnet size N:

    N         no-watcher (ms/op)            bus-watcher (ms/op)
              before    now     factor      before    now     factor
     10000        32   0.11       300x         166   0.13      1300x
     50000       222   0.11      2000x         865   0.13      6700x
    100000       504   0.12      4100x        1765   0.13     13400x
    250000      1551   0.12     12500x        4696   0.15     32400x

Updates #12542

Change-Id: I94e34b37331d1a8ec74c299deffadf4d061fda9e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-21 09:26:19 -07:00
Brad Fitzpatrick
2703f91174 wgengine/magicsock: fix data race in TestSetDERPMapDoReStun
SetDERPMap spawns a goroutine that calls ReSTUN, which logs via the
test logger. If the test returns before that goroutine logs, the
goroutine races with testing cleanup.

Use tstest.WhileTestRunningLogger so the goroutine's logf call becomes
a no-op once the test finishes.

Fixes #19829

Change-Id: I1097f98e40ffd1c5dd7fb7a715c918255853e3c6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-21 08:51:50 -07:00
Simon Law
7ebca58042 net/traffic,ipn/ipnlocal: extract traffic steering utilities (#19682)
The traffic package contains helpers for evaluating traffic steering
scores and picking appropriate nodes. These were extracted from
ipnlocal.suggestExitNodeUsingTrafficSteering so they can be reused by
the new routecheck package to probe exit nodes in priority order.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-21 08:28:27 -07:00
Fran Bull
dbe92f98b5 feature/conn25: set assignment expiry based on dns response TTL
Updates tailscale/corp#39975

Signed-off-by: Fran Bull <fran@tailscale.com>
2026-05-21 07:25:29 -07:00
Brad Fitzpatrick
f3a117e813 net/tsdial: run happy eyeballs across A and AAAA in UserDial
When tailscaled is running in userspace-networking mode behind an
exit node (e.g. as a SOCKS5 proxy), it resolves a hostname and then
dials a single resolved IP through the tunnel. If the name has both
A and AAAA, Go's net.Resolver merges them and we pick ips[0], which
on an IPv6-native host is usually AAAA. If the exit node has no IPv6
egress (or vice versa), the dial fails silently through the tunnel
and the user sees a hang.

Resolve all candidates and race connect attempts across address
families with a 300ms happy-eyeballs delay, matching Go's net.Dialer
default and the existing pattern in net/dnscache (commit ee0a03b14).
First success wins; losers are cancelled and any conns they produce
are closed. A failBoost channel wakes the launcher when a connect
fails fast (e.g. ICMP "no route" via the tunnel) so we don't sit on
the 300ms timer when the answer is already known.

userDialResolve is refactored into userDialResolveAll (returns the
full candidate list) plus a thin single-IP wrapper for callers like
UserDialPlan that don't race. UserDial's per-IP dispatch (netstack
vs peer dialer vs SystemDial vs std) is extracted to dialOneUser so
each candidate can route correctly on its own merits.

Also fix serveDial in localapi to pass the original hostname to
UserDial rather than a pre-resolved IP, so the race can fire.

This fix is single-ended: it works against any exit node, including
old ones, with no protocol changes. The trade-off versus filtering
on the exit-node side via PeerAPI DoH is that every dial through an
unreachable-family exit node costs one failed connect attempt per
cache window, rather than zero, which is acceptable given the
simplicity.

Fixes #19792
Fixes #13257

Change-Id: I9d7645d0034caf3ee22ecdd8070798353f77e94b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-20 18:35:55 -07:00
James Tucker
36c52ef383 tstest/integration/testcontrol: fix serveMap read-modify-write race
serveMap cloned s.nodes[nk], mutated the clone outside the mutex,
then wrote it back via updateNodeLocked. A concurrent UpdateNode,
SetNodeCapMap, or other writer landing between the clone and the
writeback would be silently clobbered. Mutate the live node under
the mutex instead.

Surfaces in tsnet's TestListenService as a flaky ErrUntaggedServiceHost
panic: the test calls control.UpdateNode to attach a tag, a concurrent
updateRoutine map request from the host races, and the host's next
netmap arrives with Tags=[].

Updates #19822

Change-Id: I6c5ebd5e5bf79a40316f53f627157230773cb469
Signed-off-by: James Tucker <james@tailscale.com>
2026-05-20 18:29:58 -07:00
Aria Stewart
61277e3ad4 Construct IPv6 ingress URLs correctly
Fixes #19338

Signed-off-by: Aria Stewart <aredridel@dinhe.net>
2026-05-20 17:21:35 -07:00