If a user explicitly adds a non-ts.net (not a CertDomain domain) domain
like "foo.com" to their serve config as a web target that's also an allowed
funnel domain (using raw "tailscale serve set-config"), then use the new
ALPN cert fetching (from b553969b) to get certs for that domain.
This is just plumbing; there's no new product functionality to
actually enable this easily client-side, and it also has no visible
product surface to enable it server-side.
Updates tailscale/corp#41736
Change-Id: Ie2e421ac9611bce64bba3de6a454b2d505ea0e8a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Adds two tests exercising the HTTP/2-inbound -> plaintext HTTP/1.1 backend
path through serve's reverseProxy and through the full serveWebHandler
entry point (with a funnel serveHTTPContext).
Updates #19866
Signed-off-by: Brendan Creane <bcreane@gmail.com>
When parsing the `tailscale up --exit-node=ARG` argument, we try to
resolve hostnames by searching the list of peers. However, at startup,
the peer list is empty, causing hostname lookups to trivially fail with
an unhelpful "invalid value" erorr.
Improve the error message when the peer list is empty to inform the user
that hostnames cannot be resolved during startup, and advise them to use
the exit node's Tailscale IP address instead.
Also, clarify that hostnames must be peer hostnames, not arbitrary
hostnames.
Fixes#19882
Change-Id: I9390a427c2863d657cf46c5e33b43cb3c5363764
Signed-off-by: Alex Chan <alexc@tailscale.com>
Some tests in another repo were broken by tailscale/tailscale#19607.
This fixes them, by finishing off the rest of the migration away from
netmap.NetworkMap on the IPN bus in containerboot.
Containerboot used to rebuild a full NetworkMap-shaped view while
reacting to IPN bus notifications. Now it insteads has its own
netmapState type (immutable) of exactly what it needs to track, and
sends those immutable values around, making cheap edits of new
immutable values when an IPN bus edit arrives.
This should make cmd/containerboot scale to much larger tailnets now too.
Fixes#19852Fixestailscale/corp#42347
Updates #12542
Change-Id: I88adaf061f85f677f954a764935e6654329d75a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Use TLS-ALPN-01 for Funnel certificate renewals only when the node
already has a cached certificate, and fall back to DNS-01 with a fresh
order if the ALPN path is unavailable or fails.
Dynamically advertise acme-tls/1 only while an ACME challenge
certificate is pending, and add client metrics for DNS-01 and
TLS-ALPN-01 start/success/failure paths.
Updates tailscale/corp#41736Fixestailscale/corp#42320
Change-Id: I5adc6ea129237f9ef592f84fc1a8953c80bc9d5c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
NodeMutationAdd was a misleading name: a PeersChanged entry in a
MapResponse can represent either a truly new peer or a full
replacement for an existing peer that couldn't be expressed as a
PeerChangedPatch. Calling it "Add" implied it was always a completely
new node, which is wrong. (I'd changed my mind on the design of
mapping add/delete events to NodeMutations halfway through #19607 and
forgot to update the name, even though I'd updated half the docs)
Rename it to NodeMutationUpsert to reflect the actual semantics: the
node should be inserted or replaced in the peer map regardless of
whether it already existed.
Updates #19607
Updates #12542
Change-Id: Iebd3daddb3318cba02e115a1b184fcb3ee8f83d6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The prior aa5da2e5f2 ("process node adds/removes in constant
time") commit missed a bus notification case, where new-style
subscribers set NotifyNoNetmap and then the controlclient map routing
sends a full update (rather than a delta). Those profiles + peers
need to be put on the bus too.
I noticed this only when porting the Android app over to use the
new bus stuff.
Updates #19607
Updates #12542
Change-Id: I82c35011d2c532222ca27f7d4e790522c31bd156
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Emit runtime metrics as clientmetrics when the
NodeAttrEmitRuntimeMetrics NodeCapability is present.
We start small with just 2 metrics: heap bytes and total process memory.
Updates tailscale/corp#39434
Signed-off-by: Jordan Whited <jordan@tailscale.com>
There are two places where tailscaled transitions into a paused state:
1. tailscaled’s controlclient is initially created,
2. tailscale down, or the GUI equivalent, commands it to.
This patch unifies the implementation of both scenarios into
LocalBackend.shouldPauseControlClientLocked to prevent the
implementation from drifting.
The flaky tstest/integration.TestNoControlConnWhenDown test exposed
this mismatch, but only by accident. This patch also changes
TestNode.MustDown so that it runs `tailscale down` and then waits for
the testcontrol server to finish handling any associated /machine/map
requests.
Fixes#19831
Signed-off-by: Simon Law <sfllaw@tailscale.com>
In aa5da2e5f2 we made the IPN bus include deltas, including the
PeersRemoved, sending a slice of integer NodeIDs that were
removed. But when updating xcode, I realized there was no way to map
those integers to the stable node IDs used in other places.
I was consdering changing the just-added ipn.Notify.PeersRemoved from
an IntID to a string StableID, but then it doesn't match the MapResponse
wire protocol, which we've tried to match so far.
Instead, just add the integer ID as well. Callers can use whichever
world they want, having both. It's a little regrettable that we still
have two worlds of IDs, but oh well. Neither is really suitable to a
hypothetical future fully federated world of control servers anyway,
so we'll need a third type later anyway, so just live with the two we
have for now.
Updates #12542
Change-Id: Ib8fd48a265e1da1f8779152f141f624a7f7260e9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In PR tailscale/corp#30448, we originally decided to break ties using
SHA256 for our rendezvous hashing algorithm. Now that we’ve had some
experience with it, we think that FNV-1a is a better choice. It
distributes bits evenly, it’s much faster, and it doesn’t need to be
cryptographically secure. The FNV designers recommend FNV-1a over the
deprecated FNV-1.
This PR makes the switch and updates the related tests, since changing
the algorithm changes which stable pick gets selected. As of 2026-05,
this is the best time to make this change, since there are almost no
clients in the wild with traffic steering enabled.
Updates #17366
Updates tailscale/corp#29964
Updates tailscale/corp#29966
Updates tailscale/corp#33033
Signed-off-by: Simon Law <sfllaw@tailscale.com>
For large tailnets (~50k+ nodes) with frequent peer churn (ephemeral
GitHub Actions workers etc.), tailscaled used to rebuild the full
netmap and fan it out on the IPN bus on every MapResponse that
added or removed a peer. There were two O(N) costs per delta: the
full netmap rebuild + every Notify.NetMap encode to every bus watcher.
This change tackles both:
1. Plumb O(1) peer add/remove through the delta path. PeersChanged
and PeersRemoved no longer prevent the delta happy path; instead,
they mutate the per-node-backend peer map in place.
2. Restrict ipn.Notify.NetMap emission to the platforms whose host
GUIs still depend on it (Windows, macOS, iOS) and migrate
in-tree consumers off it everywhere else:
- Migrate reactive consumers (containerboot, kube agents,
sniproxy, tsconsensus, etc.) off Notify.NetMap to the
previously-added Notify.SelfChange signal so they no longer
have to subscribe to the full netmap.
- Add ipn.NotifyNoNetMap so GUI clients on "legacy-emit" platforms
that have already migrated can opt out of the per-watcher
NetMap encode.
- Gate Notify.NetMap emission on the producer side by a compile-
time GOOS check, so the supporting code is dead-code-eliminated
on Linux and other geese where no GUI consumer needs it.
Re-running BenchmarkGiantTailnet from tstest/largetailnet, which was
added along with baseline numbers on unmodified main in ad5436af0d,
the per-delta cost (one peer add+remove pair) is now ~O(1) regardless
of tailnet size N:
N no-watcher (ms/op) bus-watcher (ms/op)
before now factor before now factor
10000 32 0.11 300x 166 0.13 1300x
50000 222 0.11 2000x 865 0.13 6700x
100000 504 0.12 4100x 1765 0.13 13400x
250000 1551 0.12 12500x 4696 0.15 32400x
Updates #12542
Change-Id: I94e34b37331d1a8ec74c299deffadf4d061fda9e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The traffic package contains helpers for evaluating traffic steering
scores and picking appropriate nodes. These were extracted from
ipnlocal.suggestExitNodeUsingTrafficSteering so they can be reused by
the new routecheck package to probe exit nodes in priority order.
Updates #17366
Updates tailscale/corp#33033
Signed-off-by: Simon Law <sfllaw@tailscale.com>
When tailscaled is running in userspace-networking mode behind an
exit node (e.g. as a SOCKS5 proxy), it resolves a hostname and then
dials a single resolved IP through the tunnel. If the name has both
A and AAAA, Go's net.Resolver merges them and we pick ips[0], which
on an IPv6-native host is usually AAAA. If the exit node has no IPv6
egress (or vice versa), the dial fails silently through the tunnel
and the user sees a hang.
Resolve all candidates and race connect attempts across address
families with a 300ms happy-eyeballs delay, matching Go's net.Dialer
default and the existing pattern in net/dnscache (commit ee0a03b14).
First success wins; losers are cancelled and any conns they produce
are closed. A failBoost channel wakes the launcher when a connect
fails fast (e.g. ICMP "no route" via the tunnel) so we don't sit on
the 300ms timer when the answer is already known.
userDialResolve is refactored into userDialResolveAll (returns the
full candidate list) plus a thin single-IP wrapper for callers like
UserDialPlan that don't race. UserDial's per-IP dispatch (netstack
vs peer dialer vs SystemDial vs std) is extracted to dialOneUser so
each candidate can route correctly on its own merits.
Also fix serveDial in localapi to pass the original hostname to
UserDial rather than a pre-resolved IP, so the race can fire.
This fix is single-ended: it works against any exit node, including
old ones, with no protocol changes. The trade-off versus filtering
on the exit-node side via PeerAPI DoH is that every dial through an
unreachable-family exit node costs one failed connect attempt per
cache window, rather than zero, which is acceptable given the
simplicity.
Fixes#19792Fixes#13257
Change-Id: I9d7645d0034caf3ee22ecdd8070798353f77e94b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Some netmap updates are guaranteed to affect only the "static" parts of the
netmap, and so should not require us to walk through all the peers and user
profiles when updating the cache. To support this, the new UpdateSelfOnly
method updates only the Self node and other tailnet settings that are not
dependent on the peers and profiles.
Use this when updating the cache on DERP home changes.
Updates #12542
Change-Id: Ifed522b29d579fb76e010b4ff738cc4e0a72d27f
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
The TestShouldUseOneCGNATRoute test fails when the underlying system
interfaces don’t match what the underlying assumptions of the test.
That assumption was that there would only ever be one CGNAT interface:
the Tailscale one.
This breaks on Linux when border0 is installed because border0 also
creates an interface with a CGNAT route.
This patch stubs netmon.RegisterInterfaceGetter to replace the system
interfaces and netmon.SetTailscaleInterfaceProps to identify the test
data that defines the Tailscale interface.
This patch also tests the control knob override for CGNAT for every
combination of operating system and system interfaces, instead of just
a couple of combinations.
Fixes#19731
Signed-off-by: Simon Law <sfllaw@tailscale.com>
Adds a new NoiseRoundTripper field to tsd.Sys
to expose an http.RoundTripper to make requests
over the control plane Noise connection.
This will be used in PAM use cases soon.
Updates tailscale/corp#41800
Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
This fixes a log message where ipn/ipnlocal.shouldUseOneCGNATRoute
would claim that an android machines was actually macOS.
Updates #cleanup
Updates #19652
Signed-off-by: Simon Law <sfllaw@tailscale.com>
Adds two new cap resolution methods alongside the existing PeerCaps:
PeerCapsForService(src netip.Addr, svcName tailcfg.ServiceName) resolves
the service name to its VIP addresses via the node's service IP mappings
and returns caps scoped to that service. Exposed on /v0/whois via the
svc_name query parameter and on client/local.Client as WhoIsForService.
PeerCapsForIP(src, dst netip.Addr) resolves caps against an arbitrary
destination IP. Exposed on /v0/whois via the svc_addr query parameter
and on client/local.Client as WhoIsForIP.
svc_name takes priority over svc_addr when both are present. Invalid
values for either return 400. The existing PeerCaps/WhoIs path is
unchanged: without a service parameter, WhoIs returns only host-level
caps.
Updates tailscale/corp#41632
Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
Add new clientmetric counters for establishing contact with peers while using
cached network map data. To do this, instrument the magicsock.Conn with a bit
to indicate whether its peer data came from a cached netmap. If so, there are
two conditions we will count as establishing connectivity to a peer:
- Receipt of a CallMeMaybe from a peer via disco.
- Establishing a valid endpoint address for a peer.
In vmtest, add Env.ClientMetrics to scrape metrics from the specified node.
Use this to check that counters were updated in caching tests.
Updates https://github.com/tailscale/projects/issues/13
Updates #12639
Change-Id: Ie8cf3244ac8af4f5bcfe4d0d944078da2ba08990
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
The `CreateStateForTest` helper reduces boilerplate in cases where the test
only cares about the trusted keys and not the disablement values (and makes
it more obvious where the disablement values are meaningful).
The `setupChonkStorage` helper reduces the boilerplate when creating on-disk
TKA storage in tests.
The `fakeLocalBackend` helper reduces the boilerplate when setting up a
`LocalBackend` instance in the IPN tests.
Updates #cleanup
Change-Id: Iacfba1be5f7fab208eec11e4369d63c7d7519da5
Signed-off-by: Alex Chan <alexc@tailscale.com>
Android rebuilds its VpnService interface when the VPN route
configuration changes, which tears down long lived TCP connections
through the tunnel. Use the same automatic OneCGNATRoute behavior as
macOS on Android, and prefer the single CGNAT route when no other
interface is using the CGNAT, falling back to fine grained peer routes
otherwise.
Updates tailscale/tailscale#19591
Signed-off-by: kari <kari@tailscale.com>
Add a narrow LocalAPI accessor and matching client/LocalBackend method
to look up a single peer's current full [tailcfg.Node] by NodeID, in
O(1) time on the daemon side, without fetching the entire netmap.
Useful for callers that need the latest state of a single peer (e.g.
in response to a peer-mutation event on the IPN bus) without paying
for a full netmap fetch.
Updates #12542
Change-Id: I1cb2d350e6ad846a5dabc1f5368dfc8121387f7c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Compacting on startup means nodes may compact at a different cadence
based on whether they're long-running or restarting frequently.
We already compact after every sync, which only occurs when the TKA
state has changed. Waiting for TKA changes to trigger compaction on
nodes means compaction will occur more consistently across a tailnet.
Updates tailscale/corp#33537
Change-Id: Ia0aa6d9e5e362e9ab08450fde69772841790d5b5
Signed-off-by: Alex Chan <alexc@tailscale.com>
Add a new bus signal that lets reactive consumers (containerboot, kube
agents, sniproxy, tsconsensus, etc.) react to self-node updates without
having to subscribe to the full netmap. Today those consumers either
watch Notify.NetMap (which on large tailnets is expensive to encode and
ship per watcher) or poll. SelfChange is a cheap, narrow alternative:
addresses, name, key expiry, capabilities, etc.
Consumers that need additional state can react to SelfChange and then
fetch the relevant bits on demand via existing LocalClient methods.
Producer-side, every netmap-bearing setControlClientStatus call now
also publishes SelfChange. Future changes will migrate individual
in-tree consumers off Notify.NetMap to this signal, and eventually
gate the legacy NetMap emission to platforms whose host GUIs still
require it.
Updates #12542
Change-Id: I4441650b0e085d663eb6bf26a03748b7d961ca49
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add two narrow LocalAPI accessors so callers don't have to subscribe to
the IPN bus and pull a full *netmap.NetworkMap just to read DNS-shaped
fields:
- GET /localapi/v0/cert-domains returns DNS.CertDomains.
- GET /localapi/v0/dns-config returns the full tailcfg.DNSConfig.
Migrate in-tree callers off the netmap-on-the-bus pattern:
- kube/certs.waitForCertDomain still wakes on the IPN bus but now
queries CertDomains via LocalClient.CertDomains rather than
reading n.NetMap.DNS.CertDomains. The kube LocalClient interface
and FakeLocalClient gain a CertDomains method.
- cmd/tailscale dns status calls LocalClient.DNSConfig directly
instead of opening a NotifyInitialNetMap watcher.
- cmd/tailscale configure kubeconfig switches from a netmap watcher
+ serviceDNSRecordFromNetMap to LocalClient.DNSConfig +
serviceDNSRecordFromDNSConfig.
This is part of a series moving callers away from depending on the
netmap traveling on the IPN bus, so the bus payload can shrink in a
later change.
Updates #12542
Change-Id: Ie10204e141d085fbac183b4cfe497226b670ad6c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add two narrower accessors alongside the existing
[LocalBackend.NetMap], with docs that distinguish their semantics:
- NetMapNoPeers: cheap (returns the cached *netmap.NetworkMap with
a possibly-stale Peers slice). For callers that only read non-Peers
fields like SelfNode, DNS, PacketFilter, capabilities.
- NetMapWithPeers: documented as returning an up-to-date Peers slice.
For callers that genuinely need to iterate Peers or call
PeerByXxx.
Mark the existing NetMap deprecated and point readers at the two new
accessors. NetMap, NetMapNoPeers, and NetMapWithPeers all currently
return the same value (b.currentNode().NetMap()): this commit is a
no-op behaviorally, just a renaming and migration of in-tree callers.
A subsequent change in the same series will switch
NetMapWithPeers to actually rebuild the Peers slice from the live
per-node-backend peers map (O(N) per call), at which point the
distinction between the two new accessors becomes load-bearing.
Migrate in-tree callers to the appropriate accessor based on what
fields they read:
- NetMapNoPeers (most common): localapi handlers, peerapi accept,
GetCertPEMWithValidity, web client noise request, doctor DNS
resolver check, tsnet CertDomains/TailscaleIPs, ssh/tailssh
SSH-policy/cap reads, several LocalBackend internals
(isLocalIP, allowExitNodeDNSProxyToServeName, pauseForNetwork
nil-check, serve config).
- NetMapWithPeers: writeNetmapToDiskLocked (persist full netmap to
disk for fast restart), PeerByTailscaleIP lookup.
Tests still call the legacy NetMap; they'll see the deprecation
warning but otherwise behave identically.
Also add two pieces of plumbing the next change in this series will
need, but which are already useful on their own:
- [client/local.GetDebugResultJSON]: a generic [Client.DebugResultJSON]
that decodes directly into a target type T, avoiding the
marshal/unmarshal roundtrip callers otherwise need.
- localapi "current-netmap" debug action: returns the current
netmap (with peers) as JSON. Documented as debug-only — the
netmap.NetworkMap shape is internal and may change without notice.
This commit is part of a series breaking up a larger change for
review; on its own it is a no-op refactor.
Updates #12542
Change-Id: Idbb30707414f8da3149c44ca0273262708375b02
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Replace the UAPI text protocol-based wireguard configuration with
wireguard-go's new direct callback API (SetPeerLookupFunc,
SetPeerByIPPacketFunc, RemoveMatchingPeers, SetPrivateKey).
Instead of computing a trimmed wireguard config ahead of time upon
control plane updates and pushing it via UAPI, install callbacks so
wireguard-go creates peers on demand when packets arrive. This removes
all the LazyWG trimming machinery: idle peer tracking, activity maps,
noteRecvActivity callbacks, the KeepFullWGConfig control knob, and the
ts_omit_lazywg build tag.
For incoming packets, PeerLookupFunc answers wireguard-go's questions
about unknown public keys by looking up the peer in the full config.
For outgoing packets, PeerByIPPacketFunc (installed from
LocalBackend.lookupPeerByIP) maps destination IPs to node public keys
using the existing nodeByAddr index.
Updates tailscale/corp#12345
Change-Id: I4cba80979ac49a1231d00a01fdba5f0c2af95dd8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
78627c1 introduced starting up and preserving the DERP server from
cache, but also changed it so the initial ReSTUN would not fire when
setting the DERPMap.
Change this so when not working from a cache, the ReSTUN will always
fire during startup.
Updates #19585
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Add a vmtest that brings up two gokrazy nodes A and B behind two
One2OneNAT networks (so direct UDP works in both directions and any
slowness can't be blamed on NAT traversal), establishes a WireGuard
tunnel A → B with TSMP, then rotates B's disco key four times and
asserts that the data plane recovers in both directions after each
rotation. All pings are TSMP (the data-plane ping; disco pings would
not exercise the WireGuard tunnel itself).
The five pings:
1. A → B (initial; brings up the tunnel; 30s budget)
2. B → A after rotate (LocalAPI rotate-disco-key debug action)
3. A → B after rotate (LocalAPI)
4. B → A after restart (SIGKILL; gokrazy supervisor respawns)
5. A → B after restart (SIGKILL)
Each post-rotation ping gets a 15-second budget. Two unavoidable
multi-second waits dominate today:
- The rotate-then-a→b phase takes ~10s on main because of LazyWG.
After B's WantRunning bounce, B's wgengine resets its
sentActivityAt/recvActivityAt maps and trims A out of the
wireguard-go config as an "idle peer"; B only re-adds A on
inbound activity, by which point A's first few TSMP packets
have been silently dropped at B's tundev. The
bradfitz/rm_lazy_wg branch removes that trimming entirely
(verified locally: this phase drops to <100ms there).
- The restart phases take ~5s for wireguard-go's RekeyTimeout
handshake retry. After SIGKILL+respawn the first WG handshake
init from the restarted node sometimes goes into the void
(likely the brief peer-removed window in the receiver's
two-step maybeReconfigWireguardLocked reconfig during which
the peer is absent from wireguard-go), and wg-go's 5s+jitter
retransmit timer is the next opportunity to retry. That retry
succeeds and the staged TSMP packet flushes. Intrinsic to the
protocol's retransmit policy.
Once LazyWG is removed and the first-handshake-after-reconfig race
is fixed, the budget should drop to 5s.
Supporting changes:
ipn/ipnlocal: DebugRotateDiscoKey now toggles WantRunning off and
back on after rotating the disco key. magicsock.Conn.RotateDiscoKey
only resets local disco state; without also dropping wireguard-go
session keys, peers keep encrypting with their stale per-peer
session against us until their rekey timer fires (WireGuard has no
data-plane signaling to invalidate sessions). Bouncing WantRunning
runs the engine through Reconfig(empty) → authReconfig, which
drops every peer's WG session so the next packet either way
triggers a fresh handshake.
ipn/ipnlocal, ipn/localapi: add a debug-only "peer-disco-keys"
LocalAPI action ([LocalBackend.DebugPeerDiscoKeys]) that returns
a map[NodePublic]DiscoPublic from the current netmap. Tests reach
it via [local.Client.DebugResultJSON]. We do not surface disco
keys via [ipnstate.PeerStatus] because adding a non-comparable
[key.DiscoPublic] field there breaks reflect-based test helpers
(e.g. TestFilterFormatAndSortExitNodes' use of cmp.Diff), and
general LocalAPI clients have no need for disco keys. Since the
debug LocalAPI is gated behind the ts_omit_debug build tag, this
endpoint is automatically stripped from small binaries.
cmd/tta: add /restart-tailscaled handler (Linux-only, via /proc walk)
to drive the SIGKILL phase. On gokrazy the supervisor respawns
tailscaled within a second.
tstest/integration/testcontrol: add Server.AllOnline. When set,
every peer entry in MapResponses is marked Online=true. Several
disco-key handling fast paths in controlclient and wgengine
(removeUnwantedDiscoUpdates, removeUnwantedDiscoUpdatesFromFull
NetmapUpdate, the wgengine tsmpLearnedDisco fast path) only fire
for online peers; without this flag, tests exercising disco-key
rotation only hit the offline-peer code paths, which mask issues
and are several seconds slower in this scenario. Finer-grained
per-node online tracking can be added later.
tstest/natlab/vmtest: add Env.RotateDiscoKey,
Env.RestartTailscaled, Env.PeerDiscoKey, Node.Name, an
[AllOnline] EnvOption that plumbs through to
testcontrol.Server.AllOnline, and an exported
Env.Ping(from, to, type, timeout). Ping replaces the unexported
helper so callers can specify both a ping type (PingDisco for
warming peer state, PingTSMP for asserting end-to-end
connectivity) and a deadline. PeerDiscoKey returns its LocalAPI
error so callers inside tstest.WaitFor can retry transient
failures rather than fataling the test.
Updates #12639
Updates #13038
Change-Id: I3644f27fc30e52990ba25a3983498cc582ddb958
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Commit 78627c132f changed the signature of magicsock.Conn.SetDERPMap to
take an additional bool doReStun parameter. Avoid both the boolean
parameter and the API signature change by restoring SetDERPMap to its
original single-argument form and adding a new SetDERPMapWithoutReSTUN
method for the cache-loading caller that wants to skip the post-set
ReSTUN.
Updates #19490
Change-Id: I97d9e82156bfc546ccf59756d1ea52f039b5de06
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
With netmap caching, the home DERP of the self node was neither saved to
the cache or loaded from it, making nodes not stick to a DERP when
starting without a connection to control.
Instead, make sure that when a cache is available, load that cache,
before looking for DERP servers. This is implemented by allowing a skip
of ReSTUN in setting the DERP map (we must have a DERP map before
setting the home DERP), so the DERP from cache will set itself and be
sticky until a connection to control is established.
Making DERP only change when connected to control is handled by existing
code from f072d017bd.
Updates #19490
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Seamless key renewal has been the default in all clients since 1.90.
We retained the ability to disable it from the control plane as a
precaution, but we haven't seen any issues that require us to disable it.
We're now removing all the code for non-seamless key renewal, because we
don't expect to turn it on again, and indeed it's been untested in the
field for three releases so might contain latent bugs!
Updates tailscale/corp#33042
Change-Id: I4b80bf07a3a50298d1c303743484169accc8844b
Signed-off-by: Alex Chan <alexc@tailscale.com>
Add a Go benchmark that exercises a single tailnet client (a [tsnet.Server]
running in the test process) against a synthetic large initial netmap and
a stream of caller-driven peer add/remove deltas, all in-process.
The harness is split in two parts:
- tstest/largetailnet, a reusable package containing a [Streamer]
that hijacks the map long-poll on a [testcontrol.Server] via the new
AltMapStream hook, sends one initial MapResponse with N synthetic
peers, and forwards caller-supplied delta MapResponses on the same
stream. Helpers like MakePeer / AllocPeer build synthetic peers with
unique IDs and addresses derived from the Tailscale ULA range.
- tstest/largetailnet/largetailnet_test.go, BenchmarkGiantTailnet
(headless tailscaled workload, no IPN bus subscriber) and
BenchmarkGiantTailnetBusWatcher (GUI-client workload with one
Notify subscriber attached). Both are gated on
--actually-test-giant-tailnet (skipped by default), stand up an
in-process testcontrol + tsnet.Server, let Up block until the
initial N-peer netmap has been processed, then ResetTimer and run
add+remove pairs via b.Loop. Per-delta sync is via a test-only
[ipnlocal.LocalBackend.AwaitNodeKeyForTest] channel that closes
once the just-added peer key appears in the netmap (no-watcher
variant) or via bus-Notify drain (bus-watcher variant).
To support the hijack, [testcontrol.Server] grows an AltMapStream hook
and a small MapStreamWriter interface for benchmarks/stress tests that
need to drive a controlled MapResponse sequence; the normal serveMap
path is untouched when AltMapStream is nil. The streamer answers
non-streaming "lite" map polls (which controlclient issues before the
streaming long-poll to push HostInfo) with an empty MapResponse and
returns immediately, so the streaming poll that follows is the one
that gets the initial netmap.
The benchmark is intended for before/after comparisons of netmap- and
delta-handling changes targeted at large tailnets. CPU profiles on
unmodified main show the expected O(N) hotspots:
setControlClientStatusLocked / authReconfigLocked /
userspaceEngine.Reconfig / setNetMapLocked, plus JSON encoding of the
full Notify.NetMap to bus watchers (which dominates the BusWatcher
variant).
Median ms/op over 10 runs on unmodified main, by tailnet size N:
N no-watcher bus-watcher
10000 32 166
50000 222 865
100000 504 1765
250000 1551 4696
Recommended invocation:
go test ./tstest/largetailnet/ -run=^$ \
-bench='BenchmarkGiantTailnet(BusWatcher)?$' \
-benchtime=2000x -timeout=10m \
--actually-test-giant-tailnet \
--giant-tailnet-n=250000 \
-cpuprofile=/tmp/giant.cpu.pprof
Updates #12542
Change-Id: I4f5b2bb271a36ba853d5a0ffe82054ef2b15c585
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a tsdial.Dialer.UserDialPlan method that resolves an address and
reports whether the dialer would route it via Tailscale. The LocalAPI
/dial handler now uses this to skip proxying for addresses that aren't
Tailscale routes (e.g. localhost), returning a Dial-Self response with
the resolved address so the client can dial it directly. This avoids
an unnecessary round-trip through the daemon for local connections.
The client's UserDial handles the new response by dialing the resolved
address itself, and the server passes the pre-resolved IP:port for
Tailscale dials to avoid redundant DNS lookups.
Thanks to giacomo and Moyao for pointing this out!
Updates tailscale/corp#39702
Change-Id: I78d640f11ccd92f43ddd505cbb0db8fee19f43a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Device posture checking can fail while enabled if tailscaled does not
have access to smbios. Previously, this was only observable by looking
in the tailscaled logs.
Fixestailscale/corp#39314
Signed-off-by: Evan Lowry <evan@tailscale.com>
This drops the per peer "appending remote" log while constructing the remote list, which can get noisy on big tailnets, and keeps logs around remote availability checks, including whether a peer is missing, offline, lacks PeerAPI reachability, lacks sharing permission, or is available.
Updates tailscale/corp#40580
Signed-off-by: kari-ts <kari@tailscale.com>
Remove the remaining known sources of flakiness in TestStateMachine and
TestStateMachineSeamless.
Updates tailscale/corp#36230
Updates #19377
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
TestStateMachine & TestStateMachineSeamless both flake a lot asserting the
"Shutdown" call on cc after a Logout. This is because Shutdown is called on
a goroutine to avoid a deadlock if it's called while holding the
LocalBackend lock (#18052).
This fixes that cause of flakes by waiting for LocalBackend's goroutine
tracker to have no goroutines running (so the goroutine that calls Shutdown
must have finished).
This does not make TestStateMachine non-flaky because it can flake later in
the test, too: the assertion on "unpause" after clearing the netmap between
"Start4" and "Start4 -> netmap" sometimes fails.
Updates tailscale/corp#36230
Updates #19377
Updates #18052
Signed-off-by: James Sanderson <jsanderson@tailscale.com>
Update this log message to show both the local and remote TKA HEAD; this
is useful for debugging issues on nodes that have fallen behind the
remote TKA HEAD.
Updates tailscale/corp#39455
Change-Id: Ia62ce15756180d2fbac4a898fb94d6143df08b54
Signed-off-by: Alex Chan <alexc@tailscale.com>
LocalBackend stores loginFlags at construction so that per-instance
properties (e.g. LoginEphemeral set by tsnet.Server.Ephemeral) persist
for the session. StartLoginInteractiveAs already merges b.loginFlags
into its cc.Login call, but the two auto-login call sites pass bare
controlclient.LoginDefault, silently dropping any stored flags.
Merge b.loginFlags at both auto-login call sites to match the existing
StartLoginInteractiveAs pattern. LoginDefault is zero so this is a
no-op when loginFlags is empty, and restores the documented behavior
when it isn't.
Fixes#15852
Signed-off-by: Scott Graham <scott.github@h4ck3r.net>
modifying DNS responses for domains they are also connectors for
For Connectors 2025, determine if a client is configured as a
connector and what domains it is a connector for. When acting as a
client, don't install Split DNS routes to other connectors for those
domains, and don't alter DNS responses for those domains. The responses
are forwarded back to the original client, which in turn does the alteration,
swapping the real IP for a Magic IP.
A client is also a connector for a domain if it has tags that overlap
with tags in the configured policy, and --advertise-connector=true
in the prefs (not in the self-node Hostinfo from the netmap). We use the prefs
as the source of truth because control only gets a copy from the prefs, and
may drift. And the AppConnector field is currently zeroed out in the
self-node Hostinfo from control.
The extension adds a ProfileStateChange hook to process prefs changes,
and the config type is split into prefs and nodeview sub-configs.
Fixestailscale/corp#39317
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
Before:
tka initialized at head 325557575a59525354484e4a534f494b4c4e56575435583737564b5036584c4d4c335534554255344c344c36484c5a444a323341
After:
tka initialized at head 2UWWZYRSTHNJSOIKLNVWT5X77VKP6XLML3U4UBU4L4L6HLZDJ23A
Printing the AUM hash as hex makes it difficult to compare to other AUM
hashes; stringifying it will make it consistent with other printing.
Updates #cleanup
Change-Id: Ic1e23a9ce6a71a53cff7d2190f9fa06eb838ab89
Signed-off-by: Alex Chan <alexc@tailscale.com>
For debugging purposes, unstable builds will sometimes intentionally panic for
unexpected behaviours. We observed such a panic after loading a cached netmap,
but because we had a valid cached map, the client was unable to recover on its
own and the operator had to manually reset the cache.
As a defensive hedge, when netmap caching is enabled, check for a panic during
installation of a net network map: If one occurs, discard any cached netmaps
before letting the panic unwind, so that we do not lose the panic itself, but
reduce the need for manual intervention.
Updates #12639
Updates tailscale/corp#27300
Change-Id: I0436889c6bdc2fa728c9cb83630cd7b00a72ce68
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>