Commit Graph

2754 Commits

Author SHA1 Message Date
BeckyPauley
98f1ac0880 cmd/k8s-operator, net/netutil: revert 4via6 changes (#19990)
Reverts support 4via6 in egress proxy and connector (#19863)

Updates #19334

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-06-03 20:20:36 +01:00
Kabir
01c59d84a0 cmd/tailscale/cli: show services in serve status (#19600)
The "tailscale serve status" human-readable output previously showed
only serve-based proxies, not services.

Fixes https://github.com/tailscale/corp/issues/34163

Change-Id: Ie48858a8d8afd7184979d0fe2ab21ebd6fd0d4a0

Signed-off-by: Kabir Sikand <kabir@tailscale.com>
2026-06-02 17:09:54 -04:00
Simon Law
b47dd932f3 cmd/tailscale/cli: use tstime constant for tailscale routecheck (#19957)
Updates #19928

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-01 17:42:18 -07:00
ferrumclaudepilgrim
3f70abdc6f cmd/tailscaled, version/distro: default to userspace-networking on Crostini
cros-garcon NULL-derefs on cold-boot netlink enumeration when
tailscale0 is present, preventing the Crostini container and
ChromeOS Terminal from starting cleanly. This is an upstream
ChromiumOS bug in cros-garcon; tailscaled can work around it
by defaulting to userspace-networking mode on Crostini.

Tailscale SSH continues to work via tailscaled's netstack.
Users can override with --tun=tailscale0 on ChromeOS builds
where cros-garcon is fixed.

Crostini is detected via /opt/google/cros-containers/bin/garcon,
which is present in every Crostini penguin container.

ssh/tailssh extends the existing Debian default-PATH case to
cover Crostini, since Crostini is Debian-based and benefits
from the same SSH PATH defaults.

RELNOTE: Crostini now defaults to userspace-networking.

Fixes #19488
Updates #12090

Signed-off-by: ferrumclaudepilgrim <ferrumclaudepilgrim@users.noreply.github.com>
2026-06-01 17:40:07 -07:00
Brad Fitzpatrick
a6ab7efa4f ipn/ipnlocal, cmd/tailscale/cli: auto-renew TLS certs and warn while pending
The Tailscale daemon only refreshed TLS certs as a side effect of inbound
TLS handshakes or "tailscale cert" CLI calls. A node that doesn't see
inbound traffic during the renewal window silently rolls past expiry.

Add a once-per-hour background loop on LocalBackend that enumerates Serve
and Funnel HTTPS hostnames (filtered against the netmap's CertDomains so
we don't poke ACME for other nodes' service hostnames) and calls the
existing GetCertPEM path. The renewal decision (ARI window, then 2/3
expiry fallback) is unchanged; the loop just guarantees it runs.

For visibility during initial issuance or restart with a long-expired
cached cert, add a "tls-cert-pending" health Warnable that's set while
ACME is in flight and no usable cached cert exists. Async renewal of a
still-valid cert intentionally doesn't fire it. And then make the CLI "cert"
subcommand print out a warning if it's blocking due to a cert fetch
in flight, using that health info.

Fixes #19911
Fixes #19912

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I144e46c40e957b2e879587decace32a523a6eade
2026-06-01 16:31:54 -07:00
Simon Law
92bfda580c cmd/tailscale/cli: fix time in tailscale routecheck (#19956)
When running `tailscale netcheck`, the reported timestamp used to be
in UTC and formatted according to RFC 3339 with a `T` to separate the
date from the time:

	sfllaw@h2co3:~$ tailscale netcheck | head -n3

	Report:
		* Time: 2026-06-01T21:12:32.252620138Z

This is machine-readable time leaking out to the user interface. Times
in normal commands are formatted for humans to read:

	sfllaw@h2co3:~$ date
	Mon 01 Jun 2026 02:39:14 PM PDT
	sfllaw@h2co3:~$ journalctl -t tailscaled | tail -n1
	Jun 01 14:35:21 h2co3 tailscaled[3328921]: wgengine: sending TSMP disco key advertisement to 100.90.144.102
	sfllaw@h2co3:~$ timedatectl show
	Timezone=America/Los_Angeles
	LocalRTC=no
	CanNTP=yes
	NTP=yes
	NTPSynchronized=yes
	TimeUSec=Mon 2026-06-01 14:38:32 PDT
	RTCTimeUSec=Mon 2026-06-01 14:38:32 PDT
	sfllaw@h2co3:~$ uptime --since
	2026-05-15 07:37:45

This PR makes the times printed by the CLI commands consistent:

- For `tailscale routecheck`, it now prints local time as
  `2026-05-15 07:37:45-07:00`.
- For `netlogfmt`, it has always printed local time with a space,
  but now includes the time zone.
- All machine-readable outputs continue to be standard RFC 3339 in
  UTC, i.e. `--format=json`.

As part of a general cleanup, this PR also adds standard common
time.Format layouts as tstime constants.

Fixes #19928

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-01 16:12:08 -07:00
Achille Roussel
7f3bbc9865 net/netutil: add NewDefaultTransport to avoid http.DefaultTransport panics
Several packages built their HTTP transports with

    http.DefaultTransport.(*http.Transport).Clone()

The standard library only documents http.DefaultTransport as an
http.RoundTripper, so an application is free to replace it with a
RoundTripper that is not a *http.Transport (e.g. an instrumented or
tracing wrapper). When such an application embeds tsnet.Server, the
unchecked type assertion panics as soon as tsnet brings up its control
connection, DNS bootstrap, or log uploader.

Add netutil.NewDefaultTransport, which returns a clone of the global
when it is still the standard *http.Transport (preserving existing
behavior) and otherwise returns a fresh transport mirroring the stdlib
defaults. Route every clone site through it.

Updates #19937

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Achille Roussel <achille.roussel@gmail.com>
2026-06-01 12:28:36 -07:00
Brad Fitzpatrick
0d92a69259 cmd/tailscale/cli: add "tailscale get" command
This adds @alexwlchan's proposed "tailscale get" command that reads
current preference values, complementing "tailscale set". It uses the
same flag names as set.

  tailscale get              # show all settings as a table
  tailscale get all          # same
  tailscale get accept-dns   # show a single value
  tailscale get --json       # output as JSON object
  tailscale get --set-flags  # output as tailscale set argv

Fixes #11389
Fixes tailscale/corp#38702

Change-Id: Ie366f27f11ccc56c76fff9a94ed8a9de9c835bd0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-01 11:59:33 -07:00
Simon Law
2d6844c565 cmd/tailscale/cli: add routecheck command (#19641)
Introduce a new `tailscale routecheck` command which prints a report
of high-availability routers that are reachable.

This command rhymes with the `tailscale netcheck` command and but
instead of reporting on local network conditions, `routecheck` reports
on remote connectivity.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-01 11:50:24 -07:00
Brad Fitzpatrick
d961e44856 cmd/testwrapper: auto-retry every failing test
Previously, testwrapper only retried tests explicitly annotated with
flakytest.Mark. Authors don't pre-emptively mark tests that haven't
flaked yet, so the first flake of a brand-new test failed CI even
when a re-run would have passed.

testwrapper now retries every failing test within a per-test wall-clock
budget (default: 5 minute per-attempt timeout capped at 1.5x the first
failure duration, 10 minute total). A test that fails and then passes
on retry is reported as flaky; a test that never passes within the
budget remains a real failure (exit non-zero).

For flakeapp's existing log scraping, the wire format is preserved:
the "flakytest failures JSON:" line is now emitted only for tests
that ultimately flaked (passed on retry). Unmarked tests get a fake
issue URL of the form https://github.com/{owner}/{repo}/issues/UNKNOWN
where owner/repo is detected from GITHUB_REPOSITORY, the local git
remote, or falls back to tailscale/tailscale. A new "permanent test
failures JSON:" line is emitted for tests that never passed; flakeapp
ignores it for now (a follow-up can teach it to record real failures
separately).

flakytest.Mark stays as an opt-in API: still useful for tracking a
known-flaky test against a real issue and for TS_SKIP_FLAKY_TESTS.

Updates tailscale/corp#38960

Change-Id: I56dfc9b023486d239f60793a53e9690578ce8017
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-01 11:07:56 -07:00
Simon Law
2ee9eacb94 client/local,ipn/localapi: add /localapi/v0/routecheck endpoint (#19640)
In order to support a `tailscale routecheck` command, we introduce the
`/localapi/v0/routecheck` endpoint to the local API. This endpoint
returns the most recent report collected by the routecheck client.
If `force=true` is an argument in the query string, then this endpoint
will actively probe before returning the report.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-01 11:06:14 -07:00
Simon Law
28801674a6 net/routecheck: introduce new package for checking peer reachability (#19639)
The routecheck package parallels the netcheck package, where the
former checks routes and routers while the latter checks networks.
Like netcheck, it compiles reports for other systems to consume.

Historically, the client has never known whether a peer is actually
reachable. Most of the time this doesn’t matter, since the client will
want to establish a WireGuard tunnel to any given destination.
However, if the client needs to choose between two or more nodes,
then it should try to choose a node that it can reach.

Suggested exit nodes are one such example, where the client filters
out any nodes that aren’t connected to the control plane. Sometimes an
exit node will get disconnected from the control plane: when the
network between the two is unreliable or when the exit node is too
busy to keep its control connection alive. In these cases, Control
disables the Node.Online flag for the exit node and broadcasts this
across the tailnet. Arguably, the client should never have relied on
this flag, since it only makes sense in the admin console.

This patch implements an initial routecheck client that can probe
every node that your client knows about. You should not ping scan your
visible tailnet, this method is for debugging only.

This patch also introduces a new OnNetMapToggle hook, which fires when
the netmap transitions from nil to non-nil, or vice versa. This
happens either when the client receives its first MapResponse after
connecting to the control plane, or when it clears the netmap while it
is disconnecting. Routecheck uses this to wait for a valid netmap
so it knows which peers to probe.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-01 10:33:08 -07:00
Brad Fitzpatrick
c086992f4f cmd/tailscale/cli: add whoami subcommand
Add a "tailscale whoami" subcommand that is equivalent to running
"tailscale whois $(tailscale ip -4)" but more ergonomic. It supports
the --json flag just like whois, and shares the WhoIsResponse
rendering code with whois.

Fixes #19907

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I8f33ba7a5608bab7dffa8213303beb5f345936d3
2026-05-28 10:49:17 -07:00
Alex Chan
9d126aec34 all: remove network lock references from private method names
Updates tailscale/corp#37904

Change-Id: I312d46d958209ca3d1152d1877fb91a57c91798d
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-28 18:00:36 +01:00
Alex Chan
446ae97491 ipn: improve --exit-node hostname error during startup
When parsing the `tailscale up --exit-node=ARG` argument, we try to
resolve hostnames by searching the list of peers. However, at startup,
the peer list is empty, causing hostname lookups to trivially fail with
an unhelpful "invalid value" erorr.

Improve the error message when the peer list is empty to inform the user
that hostnames cannot be resolved during startup, and advise them to use
the exit node's Tailscale IP address instead.

Also, clarify that hostnames must be peer hostnames, not arbitrary
hostnames.

Fixes #19882

Change-Id: I9390a427c2863d657cf46c5e33b43cb3c5363764
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-28 16:43:45 +01:00
dragondscv
4b8115bb2c cmd/containerboot: clamp MSS to PMTU for proxy group pods (#19686)
Single-pod ingress/egress proxies already called ClampMSSToPMTU when
setting up forwarding rules, but the proxy group (HA) code paths in
egressservices.go and ingressservices.go did not. This caused TCP
connections through proxy group pods to suffer from MSS/MTU mismatch
issues in environments where path MTU discovery is not working.

Add ClampMSSToPMTU calls in the egress sync loop (alongside the existing
EnsureSNATForDst call) and in addDNATRuleForSvc (alongside the existing
EnsureDNATRuleForSvc call), mirroring what the single-pod forwarding
rules already do.

Also add MSS clamping assertions to TestSyncIngressConfigs and track
ClampMSSToPMTU calls in FakeNetfilterRunner.

Fixes issue #19812 https://github.com/tailscale/tailscale/issues/19812.
Tracking internal ticket TSS-86326.

Signed-off-by: Jay Tung <ltung@crusoeenergy.com>
Co-authored-by: Jay Tung <ltung@crusoeenergy.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 12:57:38 +01:00
Brad Fitzpatrick
782c73bf41 cmd/containerboot: fix data race in TestContainerBoot
Parallel subtests share *ipn.Notify pointers (e.g. runningNotify).
When multiple subtests reached the same phase concurrently, they
all wrote to the shared notify's InitialStatus field without
synchronization, triggering the race detector.

Fix by shallow-copying *ipn.Notify before setting InitialStatus,
so each test iteration works on its own copy.

Updates #19380

Change-Id: I9dd40037e02146166f006f4f7c1ddcc47adba191
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 18:40:03 -07:00
Brad Fitzpatrick
94af1b00fb cmd/testwrapper, tstest: move test sharding out of test code
Previously, sharding required tests to opt in by calling tstest.Shard,
which used a process-global counter to assign each test to a shard.
This had two problems: most tests didn't call it, so they ran on every
shard (defeating the purpose), and shard assignments were unstable
(depended on call order, so adding a test could reshuffle others).

Remove tstest.Shard and tstest.SkipOnUnshardedCI entirely. Instead,
have testwrapper implement sharding automatically for all tests: when
TS_TEST_SHARD=N/M is set, it uses "go list -json" (no compilation) to
find test source files, scans them for top-level Test/Benchmark/
Example/Fuzz function names, and filters by fnv32a(name) % M == N-1.
The filtered names are passed as an anchored -run regex to go test.

Using go list instead of "go test -list" avoids linking the test binary
twice (Go's build cache does not cache test binary linking).

Fixes #19886

Change-Id: I62ab7b3d757324d4c5fd0b5de50c1e3742681791
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 16:53:17 -07:00
Brad Fitzpatrick
364b952d62 cmd/containerboot: track peers from IPN bus updates, stop using netmap.NetworkMap
Some tests in another repo were broken by tailscale/tailscale#19607.
This fixes them, by finishing off the rest of the migration away from
netmap.NetworkMap on the IPN bus in containerboot.

Containerboot used to rebuild a full NetworkMap-shaped view while
reacting to IPN bus notifications. Now it insteads has its own
netmapState type (immutable) of exactly what it needs to track, and
sends those immutable values around, making cheap edits of new
immutable values when an IPN bus edit arrives.

This should make cmd/containerboot scale to much larger tailnets now too.

Fixes #19852
Fixes tailscale/corp#42347
Updates #12542

Change-Id: I88adaf061f85f677f954a764935e6654329d75a6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-27 14:12:48 -07:00
Patrick O'Doherty
8501be1990 go.mod: bump dependencies to resolve govulncheck warnings (#19884)
Bump the following:
  go get -u github.com/moby/spdystream@v0.5.1
  go get -u golang.org/x/crypto@v0.52.0
  go get -u golang.org/x/net@v0.55.0

to resolve open govulncheck warnings.

Updates #cleanup

Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
2026-05-27 12:24:59 -07:00
Jordan Whited
4aef023765 cmd/tailscaled,types/logger: remove TS_DEBUG_MEMORY and associated logger
Commit e5a8cf3b1 added feature/runtimemetrics, which emits heap bytes
and total process memory as clientmetrics when the
NodeAttrEmitRuntimeMetrics capability is set. That subsumes the job of
the TS_DEBUG_MEMORY envknob, whose only effect is to prefix every log
line with Go heap+stack and Maxrss via logger.RusagePrefixLog.

Updates tailscale/corp#39434

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-05-27 09:09:05 -07:00
Artem Leshchev
5652b6c9c0 cmd/k8s-operator: fix token exchange for identity federation (#19845)
tailscale-client-go-v2 natively supports identity federation authentication,
and in #19010 the required authentication provider is used, but the manual
token exchange was never removed, so we were exchanging JWT token to an auth
token, and then were trying to use that auth token for exchange once again.
This commit removes the legacy mechanism, fully relying on
tailscale-client-go-v2 to handle authentication.

Fixes #19844

Signed-off-by: Artem Leshchev <matshch@avride.ai>
2026-05-27 16:45:07 +01:00
Jason Dillingham
0e2b3f31af cmd/k8s-operator: stabilize StaticEndpoints order in ProxyGroup reconciles (#19755)
findStaticEndpoints built its return slice by iterating nodes.Items in
the order returned by r.List, which is not guaranteed to be stable
across calls. When the resulting set of addresses already matched the
existing config Secret, the slice could still permute between
reconciles, making the marshalled config Secret differ byte-for-byte.
That tripped the DeepEqual check on the config Secret, which rewrote
the Secret, which fired a watch event, which re-enqueued the
ProxyGroup, looping forever.

Detect this case and return the existing currAddrs slice unchanged
when the resulting set is the same, preserving the "use the currently
used IPs first" intent without spurious writes.

Fixes #19700

Signed-off-by: Jason Dillingham <jasonmdillingham@gmail.com>
2026-05-27 14:28:04 +01:00
Erisa A
e2a0d45418 cmd/tailscale/cli: fix time parsing in debug daemon-logs (#19875)
Fixes #19874

Signed-off-by: Erisa A <erisa@tailscale.com>
2026-05-27 12:30:28 +01:00
BeckyPauley
0ed6da2826 cmd/k8s-operator, net/netutil: support 4via6 in egress proxy and connector (#19863)
Add support for configuring egress to destinations reachable via 4via6
subnet routes. This change affects standalone egress proxy only- egress
ProxyGroup needs IPv6 support before being able to support 4via6. Egress may
be configured using either the synthesized 4via6 address or the MagicDNS
name (in the form
<IPv4-address-with-hyphens-instead-of-dots>-via-<siteid>[.*]).

Also update the Connector to validate and advertise 4via6 subnet routes.
Export net/netutil.ValidateViaPrefix so it can be reused by the Connector
validation logic.

Updates #19334

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-05-27 10:54:35 +01:00
Jordan Whited
e5a8cf3b18 control/controlknobs,feature/*,ipn/ipnlocal,tailcfg: add runtimemetrics
Emit runtime metrics as clientmetrics when the
NodeAttrEmitRuntimeMetrics NodeCapability is present.

We start small with just 2 metrics: heap bytes and total process memory.

Updates tailscale/corp#39434

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2026-05-26 16:02:01 -07:00
Simon Law
7dabebc691 net/traffic: switch rendezvous hashing from SHA256 to FNV-1a (#19821)
In PR tailscale/corp#30448, we originally decided to break ties using
SHA256 for our rendezvous hashing algorithm. Now that we’ve had some
experience with it, we think that FNV-1a is a better choice. It
distributes bits evenly, it’s much faster, and it doesn’t need to be
cryptographically secure. The FNV designers recommend FNV-1a over the
deprecated FNV-1.

This PR makes the switch and updates the related tests, since changing
the algorithm changes which stable pick gets selected. As of 2026-05,
this is the best time to make this change, since there are almost no
clients in the wild with traffic steering enabled.

Updates #17366
Updates tailscale/corp#29964
Updates tailscale/corp#29966
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-21 10:11:59 -07:00
Brad Fitzpatrick
aa5da2e5f2 ipn/ipnlocal, control/controlclient: process node adds/removes in constant time
For large tailnets (~50k+ nodes) with frequent peer churn (ephemeral
GitHub Actions workers etc.), tailscaled used to rebuild the full
netmap and fan it out on the IPN bus on every MapResponse that
added or removed a peer. There were two O(N) costs per delta: the
full netmap rebuild + every Notify.NetMap encode to every bus watcher.

This change tackles both:

  1. Plumb O(1) peer add/remove through the delta path. PeersChanged
     and PeersRemoved no longer prevent the delta happy path; instead,
     they mutate the per-node-backend peer map in place.

  2. Restrict ipn.Notify.NetMap emission to the platforms whose host
     GUIs still depend on it (Windows, macOS, iOS) and migrate
     in-tree consumers off it everywhere else:

     - Migrate reactive consumers (containerboot, kube agents,
       sniproxy, tsconsensus, etc.) off Notify.NetMap to the
       previously-added Notify.SelfChange signal so they no longer
       have to subscribe to the full netmap.
     - Add ipn.NotifyNoNetMap so GUI clients on "legacy-emit" platforms
       that have already migrated can opt out of the per-watcher
       NetMap encode.
     - Gate Notify.NetMap emission on the producer side by a compile-
       time GOOS check, so the supporting code is dead-code-eliminated
       on Linux and other geese where no GUI consumer needs it.

Re-running BenchmarkGiantTailnet from tstest/largetailnet, which was
added along with baseline numbers on unmodified main in ad5436af0d,
the per-delta cost (one peer add+remove pair) is now ~O(1) regardless
of tailnet size N:

    N         no-watcher (ms/op)            bus-watcher (ms/op)
              before    now     factor      before    now     factor
     10000        32   0.11       300x         166   0.13      1300x
     50000       222   0.11      2000x         865   0.13      6700x
    100000       504   0.12      4100x        1765   0.13     13400x
    250000      1551   0.12     12500x        4696   0.15     32400x

Updates #12542

Change-Id: I94e34b37331d1a8ec74c299deffadf4d061fda9e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-21 09:26:19 -07:00
Simon Law
7ebca58042 net/traffic,ipn/ipnlocal: extract traffic steering utilities (#19682)
The traffic package contains helpers for evaluating traffic steering
scores and picking appropriate nodes. These were extracted from
ipnlocal.suggestExitNodeUsingTrafficSteering so they can be reused by
the new routecheck package to probe exit nodes in priority order.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-21 08:28:27 -07:00
Aria Stewart
61277e3ad4 Construct IPv6 ingress URLs correctly
Fixes #19338

Signed-off-by: Aria Stewart <aredridel@dinhe.net>
2026-05-20 17:21:35 -07:00
Brad Fitzpatrick
04ae61fe4b tstest/integration/jswasmtest: add headless-Chromium tests for @tailscale/connect
Add Go tests that drive a real headless Chromium (via chromedp) against
the built cmd/tsconnect/pkg/ artifact and verify the @tailscale/connect
public API surface end-to-end. The package has not been republished in
three years, in part because no test exercises the produced artifact at
runtime — only tsc --noEmit and a Go build run in CI.

TestCreateIPN loads pkg.js into the browser, calls createIPN with a junk
auth key, and asserts that pkg.createIPN / pkg.runSSHSession are
functions and that createIPN() returns an IPN with the documented
run/login/logout/ssh/fetch methods. No control-plane traffic.

TestFetchTailnetPeer stands up a full local tailnet (testcontrol +
DERP + a tsnet.Server peer) and verifies that the browser-side WASM
client can join over WebSocket-noise to the same control, connect to
DERP over WSS, and then ipn.fetch() an HTTP service hosted on the tsnet
peer through the tailnet. The test asserts the response body matches a
known string. Browser state transitions are logged: NoState -> NeedsLogin
-> Starting -> Running.

Tests are opt-in via --run-headless-browser-tests (matching the existing
--run-vm-tests pattern in tstest/natlab/vmtest) so they never fire in
casual `go test ./...` runs. When the flag is set, a test is skipped if
cmd/tsconnect/pkg/ has not been built, and fails with t.Error if no
chromium binary is found on $PATH (honoring $CHROME_BIN as an override).
findChromium also falls back to /Applications/Google Chrome.app and
/Applications/Chromium.app on darwin, since macOS Chrome's executable
lives inside an .app bundle and is not on $PATH by default. The
.github/workflows/test.yml wasm job is extended to install
google-chrome-stable and run the tests with the flag after build-pkg.

To prevent silently testing a stale pkg/main.wasm (built from an older
checkout than the rest of the test invocation), build-pkg now writes
pkg/build-info.json recording the sha256 of the raw (pre-wasm-opt)
go-build output. The test does its own `go build` of
cmd/tsconnect/wasm with the same -tags/-trimpath/-ldflags (factored
into a new cmd/tsconnect/wasmbuild package shared by both call sites)
and t.Fatalfs with a "rebuild" instruction on mismatch. Cost is
near-zero because the Go build cache from the prior build-pkg makes
the rebuild a cache hit.

The new wasmbuild package also replaces cmd/tsconnect's hardcoded -tags
string with a minimal-feature-set computation. wasmbuild.Keep names the
small set of feature/featuretags entries the browser client actually
needs (netstack, logtail, dns, health, c2n, ipnbus); wasmbuild.Tags()
emits a ts_omit_<f> for every other
omittable feature in feature/featuretags.Features, with transitive deps
expanded via featuretags.Requires. An init() panics if Keep references
a feature unknown to feature/featuretags so a rename there fails
loudly. Net effect on size: 32M raw / 9.4M brotli before this change,
25M raw / 4.4M brotli after — vs the last-published 1.39.98 at 21M /
3.8M. The transitive package-import graph is unchanged (176
tailscale.com/* packages either way): featuretags omits eliminate
dead code via `const HasX = false`, not imports. Trimming the import
graph would require a separate, larger refactor splitting interface
packages by build tag.

Writing TestFetchTailnetPeer surfaced several real issues, all fixed
here:

  * cmd/tsconnect built the wasm with the nethttpomithttp2 tag, but
    control/ts2021 (since commit 1d93bdce2, "control/controlclient:
    remove x/net/http2, use net/http", Oct 2025) requires HTTP/2 from
    net/http's bundled implementation. With nethttpomithttp2 set, the
    bundle is excluded and the wasm client cannot speak HTTP/2 to any
    control plane, including production. Drop the tag. Wasm size grows
    ~1 MB raw / ~300 KB brotli (more than offset by the feature
    pruning above). The last published @tailscale/connect (1.39.98,
    early 2023) pre-dates the regression, which is why no consumer has
    reported the breakage.

  * tstest/integration/testcontrol.Server's /ts2021 noise upgrade
    endpoint rejected anything but POST. WebSocket clients (the only
    transport available to browser-WASM) come in as GET. Allow both;
    the controlhttp AcceptHTTP path dispatches on the Upgrade header,
    so the websocket library still enforces GET for WS upgrades.
    This matches production, where the same controlhttpserver.AcceptHTTP
    routes purely on the Upgrade header without checking method.

  * derp/derphttp's urlString built the DERP URL from node.HostName
    only, dropping node.DERPPort. Non-WS clients use a separate code
    path (connectToHost) that honors DERPPort, but WebSocket-only
    clients (browser-WASM) went through urlString and so could not
    reach a DERP running on any port other than 443. Include the port
    when it differs from the scheme default.

Also move addWebSocketSupport from cmd/derper (where it was main-only)
to derp/derpserver.AddWebSocketSupport so tstest/integration.RunDERPAndSTUN
can wrap its DERP handler with WebSocket support — without that, the
test DERP would not accept the browser's wss connection.

Fixes #9394

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: Iff9cdee303e3b239924249b5bffb2fd04e02f391
2026-05-20 10:48:29 -07:00
Brad Fitzpatrick
95d874e9b4 cmd/testwrapper: surface race reports and skip retries when detected
A data race in a package matters more than any individual test
result. Two related problems:

1. Where go test's race detector text ("WARNING: DATA RACE" plus
   the goroutine stack traces) lands in JSON output is timing-
   dependent: it can be attributed to a test that ends up reporting
   PASS (e.g. when the racing goroutines outlive the test that
   spawned them and TSan prints during a different test's window).
   testwrapper's main loop only flushes the logs of failed tests,
   so the race report ends up stuck in a passing test's buffer and
   is silently dropped. The race builders just see a bare
   "FAIL\nFAIL\tpkg\ttime".

2. If the failing test in such a package happens to be marked flaky,
   testwrapper retries it. That is the worst possible response to a
   race: the flaky test might not even be the racy code, and a
   second run without the racy goroutines could "succeed" while
   hiding the real bug.

Address both: scan every output line for the race detector's first-
line marker. Track whether the package observed a race at all, on
the pkgFinished testAttempt. When a race was seen, fold every per-
test log buffer into the package-level logs (so the full report
surfaces from the existing pkg-fail flush path), and drop any
flaky-test retry plans for that package so we fail immediately
instead of running another attempt.

Two new tests:
- TestRaceSuppressesFlakyRetry verifies that a flaky test alongside
  a racy test does NOT get retried.
- TestRaceAttributedToPassingTest verifies that a race attributed by
  test2json to a passing test still surfaces in the output.

Also add a corpus of captured raw test binary outputs under
cmd/testwrapper/testdata/, with one subdirectory per scenario,
documenting the six representative shapes that go test -race can
emit (race in test body, race in goroutines that outlive a test,
race forced into a later test, race in TestMain post-m.Run, and a
parallel-tests split-attribution case via a "=== NAME" redirect
line). See its README.md for details.

Fixes #19603

Change-Id: Ifbfcd67fb3b1882c4907bd9cb2d68a8b5a91dd54
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-19 21:21:05 -07:00
Brad Fitzpatrick
2b338dd6a8 wgengine, cmd/tailscaled, control/controlclient: remove Engine watchdog
The Engine watchdog wrapped every wgengine.Engine method call in a
goroutine with a 45s timeout and crashed the process on timeout. It
was added years ago to surface deadlocks during development, but the
underlying deadlocks have long since been fixed, and even when it did
fire it produced obscure stack traces (from inside the watchdog
goroutine, not the original caller) without buying much.

Audit of userspaceEngine's methods shows none have cyclic locking or
unbounded blocking now that ResetAndStop no longer loops waiting for
DERPs to drain (fa49009ee). The watchdog is dead weight; remove it
along with the TS_DEBUG_DISABLE_WATCHDOG escape hatch.

Updates #19759

Change-Id: Iba9d718fe1f8718a6631296e336b138c31b99ff1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-15 16:49:28 -07:00
Simon Law
5d1bf80597 feature/routecheck: add ts_omit_routecheck feature flag (#19638)
RouteCheck, which checks that overlapping routers are reachable, is
enabled by default for both tailscaled and tsnet.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-15 15:50:50 -07:00
Noel O'Brien
894ff5d8ee cmd/hello: split css and js into separate files (#19771)
Move the inline CSS and JS into separate files to be more friendly
to Content Security Policies. ServeHTTP is updated to serve these
assets from the '/static/' path.

Updates tailscale/corp#32398

Signed-off-by: Noel O'Brien <noel@tailscale.com>
2026-05-15 09:37:22 -07:00
Alex Chan
0cb432ed84 all: update more references to Tailnet/Network Lock
Updates tailscale/corp#37904

Change-Id: I09e73b3248b9ddf86dafe33dfb621bd560f6596d
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-15 16:23:50 +01:00
Fernando Serboncini
2a06fb66d0 cmd/cloner: preserve nil-valued entries when cloning map (#19749)
The codegen path for map-of-slice-of-pointer fields, skipped
nil-valued entries. That dropped the key from the map.

This broke how dns.Config.Routes uses nil values sentinels.

Fixes #19730
Fixes #19732
Fixes #19746
Fixes #19744

Change-Id: Ic6400227f4ab21b3ca0e8c0eeecf9b83d145a9ab

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-05-14 10:30:59 -04:00
Brad Fitzpatrick
6b729795c3 derp/derpserver: use hashtriemap for peer lookup
Replace the process-global Server.mu lookup in the packet send hot path
with a global hashtriemap mirror of local clientSet entries. The
authoritative clients map remains guarded by Server.mu; clientsAtomic is
only a lock-free fast path for active local clients.

Misses, stale inactive client sets, duplicate accounting, and mesh
forwarding still fall back to lookupDestUncached. This avoids taking
Server.mu for the common local active-client send path, at the cost of
adding one global concurrent map that mirrors Server.clients for local
peers.

The benchmark uses four destination peers. The before run sets
TS_DEBUG_DERP_DISABLE_PEER_HASHTRIE=true to force the old mutex lookup
path; the after run uses the hashtrie fast path.

    goos: linux
    goarch: amd64
    pkg: tailscale.com/derp/derpserver
    cpu: Intel(R) Xeon(R) 6975P-C
                          │    before     │                after                │
                          │    sec/op     │   sec/op     vs base                │
    LookupDestHashTrie-16   176.050n ± 1%   1.904n ± 6%  -98.92% (p=0.000 n=10)

                          │   before   │             after              │
                          │    B/op    │    B/op     vs base            │
    LookupDestHashTrie-16   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
    ¹ all samples are equal

                          │   before   │             after              │
                          │ allocs/op  │ allocs/op   vs base            │
    LookupDestHashTrie-16   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
    ¹ all samples are equal

Updates #3560 (very indirectly, historically)
Updates #19713 (as an alternative to that PR)

Change-Id: Ifb72e5c9854ad00e938cd24c6ab9c27312f297e8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-12 16:08:16 -07:00
DeedleFake
ad8ead9c94 cmd/tailscale/cli: add RunWithContext
Fixes #12778

Change-Id: If9f8b299cef0cb68f93b344845b5c6a5b7554d2c
Signed-off-by: DeedleFake <deedlefake@users.noreply.github.com>
2026-05-12 12:27:55 -07:00
Francois Marier
ead5ce65a3 cmd/pgproxy: fix client TLS handshake timeout
There is a 30-second timeout set on client TLS connections but the handshake was
called on the wrong connection and so the timeout was never used in practice.

Signed-off-by: Francois Marier <francois@fmarier.org>
2026-05-11 11:12:11 -07:00
Brad Fitzpatrick
87a74c3aa2 tsnet: make workload identity federation opt-in
The tailscale.com/wif package brings in the AWS SDK
(github.com/aws/aws-sdk-go-v2/{config,sts,...} and github.com/aws/smithy-go)
to support fetching ID tokens from AWS IMDS for workload identity
federation. Until now, tsnet pulled this in unconditionally via
feature/condregister/identityfederation, costing ~70 unwanted deps for
every tsnet program whether or not it uses workload identity federation.

These AWS SDK deps were originally removed from tsnet on 2025-09-29 by
commit 69c79cb9f ("ipn/store, feature/condregister: move AWS + Kube
store registration to condregister"). They were then accidentally added
back on 2026-01-14 by commit 6a6aa805d ("cmd,feature: add identity
token auto generation for workload identity", PR #18373) when the new
wif package was wired into tsnet via feature/identityfederation.

Drop the blanket import. tsnet programs that want workload identity
federation now opt in with:

    import _ "tailscale.com/feature/identityfederation"

The hook lookup in resolveAuthKey already uses GetOk and degrades
gracefully when the feature isn't linked, so existing programs that
don't use workload identity federation see no behavior change. The
tailscale CLI still imports the condregister wrapper directly, so its
behavior is also unchanged.

Lock this in with TestDeps additions: tailscale.com/wif as a BadDep,
plus substring checks in OnDep that fail on any github.com/aws/ or
k8s.io/ dependency creeping back in.

Also, switch cmd/gitops-pusher from the condregister wrapper to a
direct import of feature/identityfederation: gitops-pusher's auth flow
calls HookExchangeJWTForTokenViaWIF directly, so it shouldn't be
subject to the ts_omit_identityfederation build tag.

Updates #12614

Change-Id: I70599f2bdd4d3666b26a859d5b76caa5d6b94507
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-06 18:43:45 -07:00
Tom Proctor
b74eeda055 cmd/testwrapper: print unit for package duration (#19663)
Include the unit (s) when printing the time taken to test each package.

Updates #cleanup

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2026-05-06 22:31:48 +01:00
Alex Chan
eac531da8e cmd/tailscale/cli: unhide --report posture flag in up
This was originally hidden during the beta period in both `up` and `set`,
then when device posture went GA we unhid the flag in `set` but not in
`up`.

This is confusing for users, because an error message can direct them to
run `tailscale up` with this flag if they've set it previously, but the
help text won't tell them what it does.

Updates #5902
Updates #17972

Change-Id: I9a31946f4b3bb411feed0f5a6449d7ff9a5ba9d3
Signed-off-by: Alex Chan <alexc@tailscale.com>
2026-05-05 10:12:36 +01:00
Brad Fitzpatrick
883d4fd2cd wgengine/netstack, net/ping: stop using pro-bing and use our net/ping instead
Fixes #19633
Fixes #13760

Change-Id: I0fa9423523a3a0fb1dfcde57de0f26e51723ff97
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-04 14:05:24 -07:00
Brad Fitzpatrick
9bb7ca6116 cmd/vet/lowerell, drive/driveimpl: forbid variables named "l" or "I"
Add a new vet checker that rejects variables, parameters, named
return values, receivers, range/type-switch bindings, type
parameters, struct fields, and constants named "l" (lowercase ell)
or "I" (uppercase i). Both are hard to distinguish from the digit
"1" and from each other in too many fonts.

Rename the two pre-existing struct fields named "l" (both of type
net.Listener) in drive/driveimpl/drive_test.go to "ln", matching the
convention used elsewhere for net.Listener locals.

Rename the test-fixture struct fields "I" (single int label) to
"Int" in metrics/multilabelmap_test.go and util/deephash/deephash_test.go,
preserving the "first letters of types" convention used alongside
neighboring fields like I8/I16/U/U8.

Also teach pkgdoc_test.go to skip testdata/ directories, which
the go tool ignores; they are not real packages.

Fixes #19631

Change-Id: I71ad2fa990705f7a070406ebcdb8cefa7487d849
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-04 14:03:28 -07:00
Tom Meadows
ee10f9881c cmd/k8s-operator: add authkey reissuing to recorder reconciler (#19556)
also fixes memory leak with authKeyReissuing map on ProxyGroup
reconciler authkey reissue.

Updates #19311

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-05-01 18:26:55 +01:00
Andrew Lytvynov
f15a4f4416 client/web: move API permission checks into handlers (#19576)
There are only a couple endpoints that check peer capabilities. Keeping
permission checks with the code that assumes they were performed, rather
than with the routing layer, feels easier to reason about.

Check that the caller is actually a peer and pass their capabilities via
a context value for handlers that want to check them.

Along with this, simplify the helper handler wrappers that are not
needed for most of the endpoints.

Updates #40851

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2026-05-01 09:01:53 -07:00
Brad Fitzpatrick
bbcb8650d4 cmd/tailscale/cli: fetch netmap via current-netmap debug action
Stop opening an IPN bus subscription with NotifyInitialNetMap purely to
read the current netmap once. Use the LocalAPI debug current-netmap
action (added in 159cf8707) instead, which returns the current netmap
synchronously without subscribing to the bus.

Updates #12542

Change-Id: I8aa2096d65aaea4dfe62634f03ce06b5470e0e51
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-01 07:53:51 -07:00
Brad Fitzpatrick
4c3ed5ab32 all: migrate code off Notify.NetMap to Notify.SelfChange
Move tailscaled's in-tree reactive users from of IPN bus Notify.NetMap
updates to the narrower Notify.SelfChange signal introduced earlier in
this series. Consumers that need additional state (peers, DNS config,
etc.) fetch it on demand via the LocalAPI.

It is a step toward the larger goal of not fanning Notify.NetMap out
to every bus watcher on Linux/non-GUI hosts.

A future change stops sending Notify.NetMap entirely on Linux and
non-GUI platforms. (eventually once macOS/iOS/Windows migrate to the
upcoming new Notify APIs, we'll remove ipn.Notify.NetMap entirely)

Updates #12542

Change-Id: I51ea9d86bdca1909d6ac0e7d5bd3934a3a4e8516
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-01 06:51:40 -07:00
Brad Fitzpatrick
9f343fdc0c client/local, ipn/localapi, all: add CertDomains and DNSConfig accessors
Add two narrow LocalAPI accessors so callers don't have to subscribe to
the IPN bus and pull a full *netmap.NetworkMap just to read DNS-shaped
fields:

  - GET /localapi/v0/cert-domains returns DNS.CertDomains.
  - GET /localapi/v0/dns-config returns the full tailcfg.DNSConfig.

Migrate in-tree callers off the netmap-on-the-bus pattern:

  - kube/certs.waitForCertDomain still wakes on the IPN bus but now
    queries CertDomains via LocalClient.CertDomains rather than
    reading n.NetMap.DNS.CertDomains. The kube LocalClient interface
    and FakeLocalClient gain a CertDomains method.
  - cmd/tailscale dns status calls LocalClient.DNSConfig directly
    instead of opening a NotifyInitialNetMap watcher.
  - cmd/tailscale configure kubeconfig switches from a netmap watcher
    + serviceDNSRecordFromNetMap to LocalClient.DNSConfig +
    serviceDNSRecordFromDNSConfig.

This is part of a series moving callers away from depending on the
netmap traveling on the IPN bus, so the bus payload can shrink in a
later change.

Updates #12542

Change-Id: Ie10204e141d085fbac183b4cfe497226b670ad6c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-30 13:50:46 -07:00