This patch adds support for the fmt.Stringer interface to the
ipn.NotifyWatchOpt enum. This is useful when debugging these bitmasks.
For example:
fmt.Printf("%s", ipn.NotifyPeerChanges | ipn.NotifyNoNetMap)
// Output: (ipn.NotifyPeerChanges | ipn.NotifyNoNetMap)
Fixes#20066
Signed-off-by: Simon Law <sfllaw@tailscale.com>
The Logger previously took a *netmap.NetworkMap at Startup and on every
ReconfigNetworkMap call, denormalizing it into per-IP and self lookup
maps. That denormalization is O(n) over all peers and ran on every
netmap update, contributing to the broader quadratic behavior we want
to eliminate when a single peer is added or removed.
Instead, this makes netlog ask LocalBackend (well, nodeBackend) for
the info it needs, letting us remove the netmap.NetworkMap type
entirely from the netlog package.
This is a dependency to removing the netmap.NetworkMap type from
upstream callers, like wgengine.Engine in general.
Updates #12542
Change-Id: Ib5f2de96e788a667332c0a6f7ac833b3d0053b5c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
util/def: add def.Bool and def.Duration default parse helpers
Replace multiple instances of def.Bool and def.Duration with a new util/def
package.
Updates #20018
Co-authored-by: Bobby <boby@codelabs.co.id>
Co-authored-by: Simon Law <sfllaw@tailscale.com>
Signed-off-by: Bobby <boby@codelabs.co.id>
Signed-off-by: Simon Law <sfllaw@tailscale.com>
ipnlocal.LocalBackend.populatePeerStatusLocked assumed that Hostinfo
was always valid, but that’s not always true, especially in tests.
ipnlocal.peerAPIPorts suffered from a similar assumption.
This patch checks for NodeView.Valid and Hostinfo.Valid; assuming the
zero value as a safe default.
Updates #8948
Updates #12542
Signed-off-by: Simon Law <sfllaw@tailscale.com>
The earlier aa5da2e5f2 made peer adds and removes through a netmap
delta path that mutates only nodeBackend, on the assumption that
PeerForIP, lookupPeerByIP, the engine's wireguard config
(e.lastCfgFull), the engine BART, wgdev's PeerLookupFunc closure, and
the engine's cached netmap (e.netMap) would all stay correct without
further updates. They don't. I'd totally forgotten that
Engine.PeerForIP has its own alternate IP-to-peer lookup codepath.
Concretely, all of these failed for a peer that arrived via
[tailcfg.MapResponse.PeersChanged] (and never via a full
[tailcfg.MapResponse.Peers] list):
- [wgengine.Engine.PeerForIP] read from e.netMap and e.lastCfgFull
(neither updated on the delta path) and so missed the new
peer. The rando non-data-plane callers (Ping, TSMP, pendopen,
debug endpoints, tsdial.Dialer.UseNetstackForIP for tsnet and
onlyNetstack tailscaled) all returned "no matching peer".
- The engine BART (built from e.lastCfgFull) missed the new peer's
subnet routes / exit-node default routes.
- wgdev's [device.PeerLookupFunc] closure (rebuilt only inside
wgcfg.ReconfigDevice) didn't have the new peer's noise key, so
outbound encryption to the new peer dropped the packet even when
SetPeerByIPPacketFunc returned the right NodePublic.
- And nothing in the delta path triggered NodeMutationRemove to
flow through to authReconfig either, so the same stale state
pointed at removed peers indefinitely.
So just (functionally) revert it for now, to have something easily
cherry-pickable to the 1.100 release branch. Proper fixes can come later
for the next release.
This also adds three new tests:
- TestPingPeerLearnedViaDelta runs disco and TSMP subtests over a
delta-added peer with only self addresses. disco exercises the
cold PeerForIP path (magicsock); TSMP exercises the full data path
through wgdev encryption. Both fail without this fix.
- TestPingSubnetRouteOfDeltaPeer exercises a subnet-router peer
arriving via delta. With s1 in --accept-routes mode, an IP
inside the advertised CIDR must resolve to s2 and a TSMP ping
must round-trip. Hits the BART + lastCfgFull + wgdev staleness
in one go.
- TestPingSelfReturnsIsLocalIP is a regression guard for the
IsSelf early-out in Engine.Ping. Passes on main today; included
here so future refactors of PeerForIP can't regress self
handling without test breakage.
Updates tailscale/corp#43394
Change-Id: I7a049271359bd73e7147ae9e2554e85614c2b8d2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Added in #20111, but it is too noisy under real load to be useful.
Updates #12542
Change-Id: Ib99a8966ade0bfa4281fccc057249819cdcdfe83
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
tailscale serve set-config now also accepts the legacy raw ipn.ServeConfig
format (as emitted by `tailscale serve status --json` and consumed via
TS_SERVE_CONFIG, which has no "version" field), so the common
serve-status-edit-set workflow stops failing. Only the services-oriented
content is applied; any node-level fields are skipped with a warning to
stderr pointing users at get-config to migrate.
Fixestailscale/corp#39793
Signed-off-by: Brendan Creane <bcreane@gmail.com>
Add an UpdatePeers method to the cache. This allows us to support netmap peer deltas,
by allowing just the peers to be updated in an existing cache. As a safety check, reject
an update if there was no base netmap data to apply a change to.
Then, when processing peer mutations in the backend, capture any changes that should
be applied to the cache and update it, if one is enabled.
Updates #12542
Change-Id: I2f8790a8fdc5e85fce6700ba4821a8cb10dddffa
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Adds tailscaled_serve_{inbound,outbound}_bytes_total, labeled by Tailscale
Service name, by wrapping the peer-facing conn in tcpHandlerForVIPService.
Per-service counters persist for the process lifetime rather than being
evicted on serve-config changes.
Fixes#19572
Signed-off-by: Raj Singh <raj@tailscale.com>
Co-authored-by: Ethan Smith <ethan.smith@grafana.com>
When a client's node key expires and the user clicks "Login" (or runs
`tailscale up`), the Login() method was cancelling the map poll context.
This caused key extension notifications from the server to be lost,
leaving clients stuck in NeedsLogin state even after an admin extended
their key.
The fix has three parts:
1. Login(): Don't cancel mapCtx if we have valid credentials (loggedIn=true)
or a valid node key. This allows the map poll to continue receiving
server notifications while the auth flow proceeds in parallel.
2. mapRoutine(): Poll when we have a node key, even if !loggedIn. This
handles the tsnet restart scenario where control returns an AuthURL
(so loggedIn=false) but we still have a valid node key that can
receive map updates.
3. sendStatus()/UpdateFullNetmap(): Forward netmaps when we have a node
key, not just when loggedIn. This ensures the backend sees key expiry
changes even when the auth flow hasn't completed.
"First successful flow wins": if a key extension arrives via map poll,
the client recovers automatically. If the auth flow completes first,
that works too. Either way, the client is no longer stuck.
This aligns with the SeamlessKeyRenewal philosophy: maintain connectivity
paths while authentication proceeds, allowing server-initiated recovery.
Fixes#19326
Change-Id: I26dbbc1fa7c1159ba075362e44d02814355d6b44
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add NotifyInProcessNoDisconnect for in-process IPN bus subscribers that
must apply every bus update. When such a subscriber falls behind, block
Notify production instead of sending the terminal fell-behind message and
closing the watch.
This is intentionally not available over LocalAPI, where a slow or stuck
out-of-process client should still be disconnected rather than allowed to
stall tailscaled. In-process callers that use the bit must keep their
callbacks fast and must not call back into LocalBackend from the callback.
Updates #20062
Change-Id: I730ad61a07475243bb226fba2262c1a3ded211ae
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
New-style IPN bus subscribers consume stateful delta streams. Reject
NotifyRateLimit when it is combined with those subscription bits so
tailscaled cannot merge or delay messages that clients need to apply in
order.
Also stop silently dropping notifications when a watcher falls behind.
Remove the watcher, replace its stale queue with one terminal ErrMessage
notification, and close the watch.
Updates #20062
Change-Id: Id9d402ea76f4011cd23f122adf62f30dd4b6f90b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
To avoid breaking downstream code, add deprecated aliases for all the
old names.
Updates tailscale/corp#37904
Change-Id: I86d0b0d7da371946440b181c665448f91c3ef8d2
Signed-off-by: Alex Chan <alexc@tailscale.com>
magicsock de-duplicates NetInfo callbacks against c.netInfoLast, a cache
that lives on the long-lived magicsock.Conn. That cache survives a control
client swap (interactive login or profile switch), where only the control
client (and its own per-client NetInfo dedup) is replaced. As a result, the
first netcheck after the swap produces a structurally-identical NetInfo
(same PreferredDERP, same NAT shape), magicsock suppresses it as unchanged,
and the new control session never learns our home DERP. Peers can't reach
the node over DERP until some unrelated NetInfo field happens to change.
Add Conn.ResetNetInfoLast to clear the dedup cache, and call it from
LocalBackend.setControlClientLocked whenever a control client is installed,
so the next netcheck re-reports the current NetInfo to the new client.
netInfoLast is only a dedup/optimization cache (all readers nil-guard, and
it is recomputed by every netcheck), so clearing it can only add a delivery,
never lose or misroute one; it is scoped to control-client lifecycle events,
not steady-state operation.
Updates #17887Fixes#20024
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
This resolves a local privilege escalation (LPE). Prior to this change,
a non-admin user could utilize serve to access local Unix sockets they
otherwise should not be able to access. For example,
tailscale serve --http 80 unix:/var/run/docker.sock
would give the user access to the Docker socket (usually root only).
This works because tailscaled has root access and implements the proxy
to the socket (see also: 'the confused deputy problem').
We resolve the problem by refusing to serve Unix targets altogether
unless instructed to by a root user.
Thanks to Tim Sageser (dtrsecurity) for this report.
Fixestailscale/corp#41998
Signed-off-by: Harry Harpham <harry@tailscale.com>
serveDebugDERPRegion built its TLS config with
ServerName: cmp.Or(derpNode.CertName, derpNode.HostName), which for a
"sha256-raw:<hex>" CertName passed the raw fingerprint to Go's stock
verifier as a hostname; the handshake always failed with a hostname
mismatch. This is the second half of #15579; the first half (tailscaled
itself failing with "unexpected multiple certs presented") was fixed in
Extract a tlsConfigForNode helper that mirrors derphttp.Client.tlsClient
so that sha256-raw and domain-fronting CertName values are dispatched
to tlsdial.SetConfigExpectedCertHash and tlsdial.SetConfigExpectedCert
respectively, falling back to HostName when CertName is empty.
The core fix here was originally written by @imnuke in #19965; that PR
also added a unit test in ipn/localapi/debugderp_test.go which is
replaced in this commit by a new vmtest that exercises the whole stack:
vnet now serves a self-signed cert valid for each fake DERP node's
HostName and exposes its SHA-256 fingerprint, and vmtest grows a new
SelfSignedDERPCertPinning EnvOption that swaps the test DERP map's
nodes to CertName="sha256-raw:<hex>" with InsecureForTests cleared.
TestSelfSignedDERPHashPinning then stands up two hard-NAT'd nodes, has
them communicate over DERP, and calls DebugDERPRegion on each. Before
this fix the test fails with the exact x509 hostname-mismatch error
from the original bug; after, it passes.
Updates #15579
Change-Id: I61f38ffebc7ac5abc962639db1ae88f5cd8633b1
Co-authored-by: Nuke <nuke@imnuke.dev>
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Commit 2b338dd6a8 removed watchdogEngine because it was weird
(so many methods) and increasingly unnecessary after we'd cleaned up
and simplified so much of the locking.
This adds back a watchdog, but an easier to maintain one that's more
idiomatic.
Updates #19759
Change-Id: I86c458473e126c0809f37696446ce7acf4cc4eb9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The Tailscale daemon only refreshed TLS certs as a side effect of inbound
TLS handshakes or "tailscale cert" CLI calls. A node that doesn't see
inbound traffic during the renewal window silently rolls past expiry.
Add a once-per-hour background loop on LocalBackend that enumerates Serve
and Funnel HTTPS hostnames (filtered against the netmap's CertDomains so
we don't poke ACME for other nodes' service hostnames) and calls the
existing GetCertPEM path. The renewal decision (ARI window, then 2/3
expiry fallback) is unchanged; the loop just guarantees it runs.
For visibility during initial issuance or restart with a long-expired
cached cert, add a "tls-cert-pending" health Warnable that's set while
ACME is in flight and no usable cached cert exists. Async renewal of a
still-valid cert intentionally doesn't fire it. And then make the CLI "cert"
subcommand print out a warning if it's blocking due to a cert fetch
in flight, using that health info.
Fixes#19911Fixes#19912
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I144e46c40e957b2e879587decace32a523a6eade
When running `tailscale netcheck`, the reported timestamp used to be
in UTC and formatted according to RFC 3339 with a `T` to separate the
date from the time:
sfllaw@h2co3:~$ tailscale netcheck | head -n3
Report:
* Time: 2026-06-01T21:12:32.252620138Z
This is machine-readable time leaking out to the user interface. Times
in normal commands are formatted for humans to read:
sfllaw@h2co3:~$ date
Mon 01 Jun 2026 02:39:14 PM PDT
sfllaw@h2co3:~$ journalctl -t tailscaled | tail -n1
Jun 01 14:35:21 h2co3 tailscaled[3328921]: wgengine: sending TSMP disco key advertisement to 100.90.144.102
sfllaw@h2co3:~$ timedatectl show
Timezone=America/Los_Angeles
LocalRTC=no
CanNTP=yes
NTP=yes
NTPSynchronized=yes
TimeUSec=Mon 2026-06-01 14:38:32 PDT
RTCTimeUSec=Mon 2026-06-01 14:38:32 PDT
sfllaw@h2co3:~$ uptime --since
2026-05-15 07:37:45
This PR makes the times printed by the CLI commands consistent:
- For `tailscale routecheck`, it now prints local time as
`2026-05-15 07:37:45-07:00`.
- For `netlogfmt`, it has always printed local time with a space,
but now includes the time zone.
- All machine-readable outputs continue to be standard RFC 3339 in
UTC, i.e. `--format=json`.
As part of a general cleanup, this PR also adds standard common
time.Format layouts as tstime constants.
Fixes#19928
Signed-off-by: Simon Law <sfllaw@tailscale.com>
The routecheck package parallels the netcheck package, where the
former checks routes and routers while the latter checks networks.
Like netcheck, it compiles reports for other systems to consume.
Historically, the client has never known whether a peer is actually
reachable. Most of the time this doesn’t matter, since the client will
want to establish a WireGuard tunnel to any given destination.
However, if the client needs to choose between two or more nodes,
then it should try to choose a node that it can reach.
Suggested exit nodes are one such example, where the client filters
out any nodes that aren’t connected to the control plane. Sometimes an
exit node will get disconnected from the control plane: when the
network between the two is unreliable or when the exit node is too
busy to keep its control connection alive. In these cases, Control
disables the Node.Online flag for the exit node and broadcasts this
across the tailnet. Arguably, the client should never have relied on
this flag, since it only makes sense in the admin console.
This patch implements an initial routecheck client that can probe
every node that your client knows about. You should not ping scan your
visible tailnet, this method is for debugging only.
This patch also introduces a new OnNetMapToggle hook, which fires when
the netmap transitions from nil to non-nil, or vice versa. This
happens either when the client receives its first MapResponse after
connecting to the control plane, or when it clears the netmap while it
is disconnecting. Routecheck uses this to wait for a valid netmap
so it knows which peers to probe.
Updates #17366
Updates tailscale/corp#33033
Signed-off-by: Simon Law <sfllaw@tailscale.com>
This adds tsnet.Server.ListenSSH which, if the SSH feature is linked,
returns a net.Listener whose Accept yields *tailssh.Session values (as
net.Conn). This lets tsnet apps accept incoming SSH connections to
implement custom TUI applications.
Basic apps can use net.Conn directly (Read/Write/Close). Rich apps
import ssh/tailssh and type-assert for peer identity, PTY, signals,
etc. If feature/ssh isn't imported, ListenSSH returns an error.
Includes a demo guess-the-number game in tsnet/example/ssh-game.
Updates tailscale/corp#37839
Change-Id: I4e7c3c96afb030cdf4da8f2d8b2253820628129a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Currently we are picking a peer for the split dns routes when we get a
netmap. Use the new custom scheme resolvers, installed per app in the
config in the netmap, to allow us to choose which connector peer should
handle a DNS request at the time the request is made.
Fixestailscale/corp#39858
Signed-off-by: Fran Bull <fran@tailscale.com>
All StateStore implementations store a nil value in the cache map when WriteState is called with a nil byte slice instead of deleting the key. This causes ReadState to return (nil, nil) instead of (nil, ErrStateNotExist), since the key is still present in the map.
This breaks reset-auth in Windows, Linux, and Android, and the node can't log back in without manually editing the state file. (macOS uses a different state store)
DeleteProfile, DeleteAllProfilesForUser, setUnattendedModeAsConfigured are impacted but don't seem to break because the deleted keys are not reread.
This deletes the key from the cache instead.
Fixestailscale/corp#42477
Signed-off-by: kari-ts <kari@tailscale.com>
If a user explicitly adds a non-ts.net (not a CertDomain domain) domain
like "foo.com" to their serve config as a web target that's also an allowed
funnel domain (using raw "tailscale serve set-config"), then use the new
ALPN cert fetching (from b553969b) to get certs for that domain.
This is just plumbing; there's no new product functionality to
actually enable this easily client-side, and it also has no visible
product surface to enable it server-side.
Updates tailscale/corp#41736
Change-Id: Ie2e421ac9611bce64bba3de6a454b2d505ea0e8a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Adds two tests exercising the HTTP/2-inbound -> plaintext HTTP/1.1 backend
path through serve's reverseProxy and through the full serveWebHandler
entry point (with a funnel serveHTTPContext).
Updates #19866
Signed-off-by: Brendan Creane <bcreane@gmail.com>
When parsing the `tailscale up --exit-node=ARG` argument, we try to
resolve hostnames by searching the list of peers. However, at startup,
the peer list is empty, causing hostname lookups to trivially fail with
an unhelpful "invalid value" erorr.
Improve the error message when the peer list is empty to inform the user
that hostnames cannot be resolved during startup, and advise them to use
the exit node's Tailscale IP address instead.
Also, clarify that hostnames must be peer hostnames, not arbitrary
hostnames.
Fixes#19882
Change-Id: I9390a427c2863d657cf46c5e33b43cb3c5363764
Signed-off-by: Alex Chan <alexc@tailscale.com>
Some tests in another repo were broken by tailscale/tailscale#19607.
This fixes them, by finishing off the rest of the migration away from
netmap.NetworkMap on the IPN bus in containerboot.
Containerboot used to rebuild a full NetworkMap-shaped view while
reacting to IPN bus notifications. Now it insteads has its own
netmapState type (immutable) of exactly what it needs to track, and
sends those immutable values around, making cheap edits of new
immutable values when an IPN bus edit arrives.
This should make cmd/containerboot scale to much larger tailnets now too.
Fixes#19852Fixestailscale/corp#42347
Updates #12542
Change-Id: I88adaf061f85f677f954a764935e6654329d75a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Use TLS-ALPN-01 for Funnel certificate renewals only when the node
already has a cached certificate, and fall back to DNS-01 with a fresh
order if the ALPN path is unavailable or fails.
Dynamically advertise acme-tls/1 only while an ACME challenge
certificate is pending, and add client metrics for DNS-01 and
TLS-ALPN-01 start/success/failure paths.
Updates tailscale/corp#41736Fixestailscale/corp#42320
Change-Id: I5adc6ea129237f9ef592f84fc1a8953c80bc9d5c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
NodeMutationAdd was a misleading name: a PeersChanged entry in a
MapResponse can represent either a truly new peer or a full
replacement for an existing peer that couldn't be expressed as a
PeerChangedPatch. Calling it "Add" implied it was always a completely
new node, which is wrong. (I'd changed my mind on the design of
mapping add/delete events to NodeMutations halfway through #19607 and
forgot to update the name, even though I'd updated half the docs)
Rename it to NodeMutationUpsert to reflect the actual semantics: the
node should be inserted or replaced in the peer map regardless of
whether it already existed.
Updates #19607
Updates #12542
Change-Id: Iebd3daddb3318cba02e115a1b184fcb3ee8f83d6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The prior aa5da2e5f2 ("process node adds/removes in constant
time") commit missed a bus notification case, where new-style
subscribers set NotifyNoNetmap and then the controlclient map routing
sends a full update (rather than a delta). Those profiles + peers
need to be put on the bus too.
I noticed this only when porting the Android app over to use the
new bus stuff.
Updates #19607
Updates #12542
Change-Id: I82c35011d2c532222ca27f7d4e790522c31bd156
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Emit runtime metrics as clientmetrics when the
NodeAttrEmitRuntimeMetrics NodeCapability is present.
We start small with just 2 metrics: heap bytes and total process memory.
Updates tailscale/corp#39434
Signed-off-by: Jordan Whited <jordan@tailscale.com>
There are two places where tailscaled transitions into a paused state:
1. tailscaled’s controlclient is initially created,
2. tailscale down, or the GUI equivalent, commands it to.
This patch unifies the implementation of both scenarios into
LocalBackend.shouldPauseControlClientLocked to prevent the
implementation from drifting.
The flaky tstest/integration.TestNoControlConnWhenDown test exposed
this mismatch, but only by accident. This patch also changes
TestNode.MustDown so that it runs `tailscale down` and then waits for
the testcontrol server to finish handling any associated /machine/map
requests.
Fixes#19831
Signed-off-by: Simon Law <sfllaw@tailscale.com>
In aa5da2e5f2 we made the IPN bus include deltas, including the
PeersRemoved, sending a slice of integer NodeIDs that were
removed. But when updating xcode, I realized there was no way to map
those integers to the stable node IDs used in other places.
I was consdering changing the just-added ipn.Notify.PeersRemoved from
an IntID to a string StableID, but then it doesn't match the MapResponse
wire protocol, which we've tried to match so far.
Instead, just add the integer ID as well. Callers can use whichever
world they want, having both. It's a little regrettable that we still
have two worlds of IDs, but oh well. Neither is really suitable to a
hypothetical future fully federated world of control servers anyway,
so we'll need a third type later anyway, so just live with the two we
have for now.
Updates #12542
Change-Id: Ib8fd48a265e1da1f8779152f141f624a7f7260e9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In PR tailscale/corp#30448, we originally decided to break ties using
SHA256 for our rendezvous hashing algorithm. Now that we’ve had some
experience with it, we think that FNV-1a is a better choice. It
distributes bits evenly, it’s much faster, and it doesn’t need to be
cryptographically secure. The FNV designers recommend FNV-1a over the
deprecated FNV-1.
This PR makes the switch and updates the related tests, since changing
the algorithm changes which stable pick gets selected. As of 2026-05,
this is the best time to make this change, since there are almost no
clients in the wild with traffic steering enabled.
Updates #17366
Updates tailscale/corp#29964
Updates tailscale/corp#29966
Updates tailscale/corp#33033
Signed-off-by: Simon Law <sfllaw@tailscale.com>
For large tailnets (~50k+ nodes) with frequent peer churn (ephemeral
GitHub Actions workers etc.), tailscaled used to rebuild the full
netmap and fan it out on the IPN bus on every MapResponse that
added or removed a peer. There were two O(N) costs per delta: the
full netmap rebuild + every Notify.NetMap encode to every bus watcher.
This change tackles both:
1. Plumb O(1) peer add/remove through the delta path. PeersChanged
and PeersRemoved no longer prevent the delta happy path; instead,
they mutate the per-node-backend peer map in place.
2. Restrict ipn.Notify.NetMap emission to the platforms whose host
GUIs still depend on it (Windows, macOS, iOS) and migrate
in-tree consumers off it everywhere else:
- Migrate reactive consumers (containerboot, kube agents,
sniproxy, tsconsensus, etc.) off Notify.NetMap to the
previously-added Notify.SelfChange signal so they no longer
have to subscribe to the full netmap.
- Add ipn.NotifyNoNetMap so GUI clients on "legacy-emit" platforms
that have already migrated can opt out of the per-watcher
NetMap encode.
- Gate Notify.NetMap emission on the producer side by a compile-
time GOOS check, so the supporting code is dead-code-eliminated
on Linux and other geese where no GUI consumer needs it.
Re-running BenchmarkGiantTailnet from tstest/largetailnet, which was
added along with baseline numbers on unmodified main in ad5436af0d,
the per-delta cost (one peer add+remove pair) is now ~O(1) regardless
of tailnet size N:
N no-watcher (ms/op) bus-watcher (ms/op)
before now factor before now factor
10000 32 0.11 300x 166 0.13 1300x
50000 222 0.11 2000x 865 0.13 6700x
100000 504 0.12 4100x 1765 0.13 13400x
250000 1551 0.12 12500x 4696 0.15 32400x
Updates #12542
Change-Id: I94e34b37331d1a8ec74c299deffadf4d061fda9e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The traffic package contains helpers for evaluating traffic steering
scores and picking appropriate nodes. These were extracted from
ipnlocal.suggestExitNodeUsingTrafficSteering so they can be reused by
the new routecheck package to probe exit nodes in priority order.
Updates #17366
Updates tailscale/corp#33033
Signed-off-by: Simon Law <sfllaw@tailscale.com>
When tailscaled is running in userspace-networking mode behind an
exit node (e.g. as a SOCKS5 proxy), it resolves a hostname and then
dials a single resolved IP through the tunnel. If the name has both
A and AAAA, Go's net.Resolver merges them and we pick ips[0], which
on an IPv6-native host is usually AAAA. If the exit node has no IPv6
egress (or vice versa), the dial fails silently through the tunnel
and the user sees a hang.
Resolve all candidates and race connect attempts across address
families with a 300ms happy-eyeballs delay, matching Go's net.Dialer
default and the existing pattern in net/dnscache (commit ee0a03b14).
First success wins; losers are cancelled and any conns they produce
are closed. A failBoost channel wakes the launcher when a connect
fails fast (e.g. ICMP "no route" via the tunnel) so we don't sit on
the 300ms timer when the answer is already known.
userDialResolve is refactored into userDialResolveAll (returns the
full candidate list) plus a thin single-IP wrapper for callers like
UserDialPlan that don't race. UserDial's per-IP dispatch (netstack
vs peer dialer vs SystemDial vs std) is extracted to dialOneUser so
each candidate can route correctly on its own merits.
Also fix serveDial in localapi to pass the original hostname to
UserDial rather than a pre-resolved IP, so the race can fire.
This fix is single-ended: it works against any exit node, including
old ones, with no protocol changes. The trade-off versus filtering
on the exit-node side via PeerAPI DoH is that every dial through an
unreachable-family exit node costs one failed connect attempt per
cache window, rather than zero, which is acceptable given the
simplicity.
Fixes#19792Fixes#13257
Change-Id: I9d7645d0034caf3ee22ecdd8070798353f77e94b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Some netmap updates are guaranteed to affect only the "static" parts of the
netmap, and so should not require us to walk through all the peers and user
profiles when updating the cache. To support this, the new UpdateSelfOnly
method updates only the Self node and other tailnet settings that are not
dependent on the peers and profiles.
Use this when updating the cache on DERP home changes.
Updates #12542
Change-Id: Ifed522b29d579fb76e010b4ff738cc4e0a72d27f
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
The TestShouldUseOneCGNATRoute test fails when the underlying system
interfaces don’t match what the underlying assumptions of the test.
That assumption was that there would only ever be one CGNAT interface:
the Tailscale one.
This breaks on Linux when border0 is installed because border0 also
creates an interface with a CGNAT route.
This patch stubs netmon.RegisterInterfaceGetter to replace the system
interfaces and netmon.SetTailscaleInterfaceProps to identify the test
data that defines the Tailscale interface.
This patch also tests the control knob override for CGNAT for every
combination of operating system and system interfaces, instead of just
a couple of combinations.
Fixes#19731
Signed-off-by: Simon Law <sfllaw@tailscale.com>
Adds a new NoiseRoundTripper field to tsd.Sys
to expose an http.RoundTripper to make requests
over the control plane Noise connection.
This will be used in PAM use cases soon.
Updates tailscale/corp#41800
Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
This fixes a log message where ipn/ipnlocal.shouldUseOneCGNATRoute
would claim that an android machines was actually macOS.
Updates #cleanup
Updates #19652
Signed-off-by: Simon Law <sfllaw@tailscale.com>
Adds two new cap resolution methods alongside the existing PeerCaps:
PeerCapsForService(src netip.Addr, svcName tailcfg.ServiceName) resolves
the service name to its VIP addresses via the node's service IP mappings
and returns caps scoped to that service. Exposed on /v0/whois via the
svc_name query parameter and on client/local.Client as WhoIsForService.
PeerCapsForIP(src, dst netip.Addr) resolves caps against an arbitrary
destination IP. Exposed on /v0/whois via the svc_addr query parameter
and on client/local.Client as WhoIsForIP.
svc_name takes priority over svc_addr when both are present. Invalid
values for either return 400. The existing PeerCaps/WhoIs path is
unchanged: without a service parameter, WhoIs returns only host-level
caps.
Updates tailscale/corp#41632
Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>