Commit Graph

474 Commits

Author SHA1 Message Date
Brad Fitzpatrick
aefb1531d1 net/tsdial, ipn/ipnlocal: stop using netmap.NetworkMap in Dialer
tsdial.Dialer.SetNetMap rebuilt an O(n peers) map of MagicDNS names on
every netmap change. As we move toward per-peer incremental deltas,
this becomes quadratic. This removes it and replaces it with
SetResolveMagicDNS, a callback into LocalBackend that looks up
hostnames from nodeBackend's new nodeByName index (populated alongside
nodeByAddr/nodeByKey on both full and delta paths). The index stores
both FQDNs and short names as keys.

This is the same treatment applied to netlog (8f210454d), wglog
(988b0905b), and drive (1d6989408): stop pushing *netmap.NetworkMap
into subsystems and instead have them pull from LocalBackend's live
data via callbacks.

Updates #12542

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I24557ab0c8a27636e08e4779bcfd3ec633db0a78
2026-06-24 13:14:45 -07:00
tsushanth
0b551986fe cmd/k8s-operator: scope HA Service hostname check per-tailnet (#20114)
The ProxyGroup HA Service reconciler's validateService scanned every
Service in the cluster with shouldExpose=true for duplicate hostnames.
With multi-tailnet (Tailnet CRD) support, that scan reaches across
tailnet boundaries:

  * A Service exposed via the single-proxy path (tailscale.com/expose)
    on the primary tailnet would block a ProxyGroup ingress Service
    for the same hostname on a secondary tailnet, even though the two
    live in different reconcilers and different tailnet DNS namespaces.

  * Two ProxyGroups joined to different tailnets via spec.tailnet
    would also block one another for shared hostnames, again despite
    living in separate DNS namespaces.

In both cases the ProxyGroup ingress Service was silently dropped
(IngressSvcInvalid event raised, queue cleared, ConfigMap never
written, ProxyGroup never serves the backend).

This change tightens the check in two ways:

  * Skip Services that aren't themselves managed by the ProxyGroup
    reconciler (use isTailscaleService instead of shouldExpose).
  * For ProxyGroup-managed Services attached to a different
    ProxyGroup, look up that ProxyGroup and skip the duplicate
    report when spec.Tailnet differs from the current one. Fall
    through and flag the collision on lookup failure so genuine
    duplicates are not silently allowed.

Adds regression tests covering both the single-proxy and the
different-tailnet cases. Updates the existing TestValidateService
expected error to reflect the rephrased message.

Updates #20069

Signed-off-by: tsushanth <78000697+tsushanth@users.noreply.github.com>
2026-06-23 14:25:11 +01:00
Anton Tolchanov
f442cda999 ipn/ipnlocal: consider all DERP regions for exit node recommendations
When recommending an exit node, suggestExitNodeLocked ranks candidates by
the latency to their home DERP region, taken from the most recent netcheck
report. But netcheck alternates between full reports, which probe every
region, and incremental reports, which only re-probe the home region and a
handful of the fastest regions. When the most recent report is incremental,
the suggestion fell back to a random for exit nodes that are far away.

Now we rank candidates against the best recent latency, tracked by the
`netcheck.Client` - the same data that is used to pick the preferred
DERP. It uses a history of measurements which includes a full netcheck
report, so should cover all DERP regions.

Updates tailscale/corp#17516

Signed-off-by: Anton Tolchanov <anton@tailscale.com>
2026-06-22 12:28:09 +02:00
BeckyPauley
35a1a413f9 cmd/{containerboot,k8s-operator}: add 4via6 support in singleton egress (#19983)
Add support for configuring egress to destinations reachable via 4via6
subnet routes, using either the synthesized 4via6 address or the MagicDNS
name (in the form <IPv4-with-hyphens>-via-<siteID>[.*]).

Also update the Connector to validate and advertise 4via6 subnet routes.

Export net/netutil.ValidateViaPrefix so it can be reused by the Connector
validation logic.

This change only affects standalone egress proxies — ProxyGroup egress
requires IPv6 support before it can use 4via6.

Updates #19334

Change-Id: I6faecd6eb61ab55fc0cd97fe417af6b6a12fe7fc

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-06-18 16:13:10 +01:00
Brad Fitzpatrick
8f210454dd wgengine/netlog: stop using netmap.NetworkMap type, use LocalBackend
The Logger previously took a *netmap.NetworkMap at Startup and on every
ReconfigNetworkMap call, denormalizing it into per-IP and self lookup
maps. That denormalization is O(n) over all peers and ran on every
netmap update, contributing to the broader quadratic behavior we want
to eliminate when a single peer is added or removed.

Instead, this makes netlog ask LocalBackend (well, nodeBackend) for
the info it needs, letting us remove the netmap.NetworkMap type
entirely from the netlog package.

This is a dependency to removing the netmap.NetworkMap type from
upstream callers, like wgengine.Engine in general.

Updates #12542

Change-Id: Ib5f2de96e788a667332c0a6f7ac833b3d0053b5c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-17 15:11:57 -07:00
Bobi Gunardi
ca20611d11 util: add parse fallback helpers (#20022)
util/def: add def.Bool and def.Duration default parse helpers

Replace multiple instances of def.Bool and def.Duration with a new util/def
package.

Updates #20018

Co-authored-by: Bobby <boby@codelabs.co.id>
Co-authored-by: Simon Law <sfllaw@tailscale.com>
Signed-off-by: Bobby <boby@codelabs.co.id>
Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-15 15:58:51 -07:00
David Bond
7fb6751ddd cmd/k8s-operator: rework [unexpected] log lines (#20065)
* cmd/k8s-operator: rework [unexpected] log lines

This commit modifies several places in the operator logs where we
prepend `[unexpected]` to instead use an appropriate logging level.

The `[unexpected]` prefix is intended to be used when the program
violates some internal invariant (or for example, a database has
become corrupted). Many of these cases were simply log lines that
then fell back to a default value/behaviour. These have been releveled
to warnings.

Some of these log lines also seemed extraeneous as for the example of
service reconcilers logging when there is no proxy group annotation. As
far as I can tell we've never had any predicates for limiting the
services reconciled to ones with that annotation, so they can just
be removed to reduce log spam.

Fixes: #cleanup

Signed-off-by: David Bond <davidsbond93@gmail.com>

* Update cmd/k8s-operator/egress-services-readiness.go

Co-authored-by: BeckyPauley <64131207+BeckyPauley@users.noreply.github.com>
Signed-off-by: David Bond <davidsbond@users.noreply.github.com>

* Update cmd/k8s-operator/operator.go

Co-authored-by: BeckyPauley <64131207+BeckyPauley@users.noreply.github.com>
Signed-off-by: David Bond <davidsbond@users.noreply.github.com>

---------

Signed-off-by: David Bond <davidsbond93@gmail.com>
Signed-off-by: David Bond <davidsbond@users.noreply.github.com>
Co-authored-by: BeckyPauley <64131207+BeckyPauley@users.noreply.github.com>
2026-06-11 14:48:48 +01:00
Brad Fitzpatrick
6ab5d91071 go.mod: bump some deps to match corp
Updates tailscale/corp#43243
Updaets #20067

Change-Id: I27e19f34e2216f3ac1a4e2a6b38c0ac473b8c7ad
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-06-10 21:29:22 -05:00
David Bond
e4ea65d32d cmd/k8s-operator: workload identity support for multi-tailnet (#20016)
This commit modifies the reconciler for the `Tailnet` custom resource
to allow referenced secrets to specify an `audience` field. If a
referenced secret contains both an `audience` and `client_id` we assume
the user's intention is to use workload identity.

In that case, we configure the tailscale API client to authenticate
using the Kubernetes token request API against the operator's service
account. This requires the operator to be aware of its own service
account name.

A small change has also been made to the messages added to the `Tailnet`
CRD's status field in the even that it is missing scopes to make it
clearer that certain scopes may not be applied.

Closes: #19090
Updates: #19471

Signed-off-by: David Bond <davidsbond93@gmail.com>
2026-06-10 10:22:19 +01:00
Anthony
819f3ba7c1 cmd/k8s-operator: allow custom annotations on deployment (#17143)
Fixes #17188

Signed-off-by: Anthony SCHWARTZ <antho.schwartz@gmail.com>
Signed-off-by: Anthony SCHWARTZ <anthony.schwartz@ext.ec.europa.eu>
2026-06-09 15:25:40 +01:00
Nick Khyl
c0d0621417 logpolicy,tsnet: remove syspolicy dependency
tsnet depends on logpolicy, which in turn depended on util/syspolicy
because of a single LogTarget policy setting it uses.

In this commit, we replace that dependency with a feature.Hook,
which only tailscaled or its platform-specific alternatives should set.

Updates #20031

Signed-off-by: Nick Khyl <nickk@tailscale.com>
2026-06-05 16:21:27 -05:00
BeckyPauley
98f1ac0880 cmd/k8s-operator, net/netutil: revert 4via6 changes (#19990)
Reverts support 4via6 in egress proxy and connector (#19863)

Updates #19334

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-06-03 20:20:36 +01:00
Simon Law
2d6844c565 cmd/tailscale/cli: add routecheck command (#19641)
Introduce a new `tailscale routecheck` command which prints a report
of high-availability routers that are reachable.

This command rhymes with the `tailscale netcheck` command and but
instead of reporting on local network conditions, `routecheck` reports
on remote connectivity.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-01 11:50:24 -07:00
Simon Law
2ee9eacb94 client/local,ipn/localapi: add /localapi/v0/routecheck endpoint (#19640)
In order to support a `tailscale routecheck` command, we introduce the
`/localapi/v0/routecheck` endpoint to the local API. This endpoint
returns the most recent report collected by the routecheck client.
If `force=true` is an argument in the query string, then this endpoint
will actively probe before returning the report.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-06-01 11:06:14 -07:00
Artem Leshchev
5652b6c9c0 cmd/k8s-operator: fix token exchange for identity federation (#19845)
tailscale-client-go-v2 natively supports identity federation authentication,
and in #19010 the required authentication provider is used, but the manual
token exchange was never removed, so we were exchanging JWT token to an auth
token, and then were trying to use that auth token for exchange once again.
This commit removes the legacy mechanism, fully relying on
tailscale-client-go-v2 to handle authentication.

Fixes #19844

Signed-off-by: Artem Leshchev <matshch@avride.ai>
2026-05-27 16:45:07 +01:00
Jason Dillingham
0e2b3f31af cmd/k8s-operator: stabilize StaticEndpoints order in ProxyGroup reconciles (#19755)
findStaticEndpoints built its return slice by iterating nodes.Items in
the order returned by r.List, which is not guaranteed to be stable
across calls. When the resulting set of addresses already matched the
existing config Secret, the slice could still permute between
reconciles, making the marshalled config Secret differ byte-for-byte.
That tripped the DeepEqual check on the config Secret, which rewrote
the Secret, which fired a watch event, which re-enqueued the
ProxyGroup, looping forever.

Detect this case and return the existing currAddrs slice unchanged
when the resulting set is the same, preserving the "use the currently
used IPs first" intent without spurious writes.

Fixes #19700

Signed-off-by: Jason Dillingham <jasonmdillingham@gmail.com>
2026-05-27 14:28:04 +01:00
BeckyPauley
0ed6da2826 cmd/k8s-operator, net/netutil: support 4via6 in egress proxy and connector (#19863)
Add support for configuring egress to destinations reachable via 4via6
subnet routes. This change affects standalone egress proxy only- egress
ProxyGroup needs IPv6 support before being able to support 4via6. Egress may
be configured using either the synthesized 4via6 address or the MagicDNS
name (in the form
<IPv4-address-with-hyphens-instead-of-dots>-via-<siteid>[.*]).

Also update the Connector to validate and advertise 4via6 subnet routes.
Export net/netutil.ValidateViaPrefix so it can be reused by the Connector
validation logic.

Updates #19334

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-05-27 10:54:35 +01:00
Simon Law
7dabebc691 net/traffic: switch rendezvous hashing from SHA256 to FNV-1a (#19821)
In PR tailscale/corp#30448, we originally decided to break ties using
SHA256 for our rendezvous hashing algorithm. Now that we’ve had some
experience with it, we think that FNV-1a is a better choice. It
distributes bits evenly, it’s much faster, and it doesn’t need to be
cryptographically secure. The FNV designers recommend FNV-1a over the
deprecated FNV-1.

This PR makes the switch and updates the related tests, since changing
the algorithm changes which stable pick gets selected. As of 2026-05,
this is the best time to make this change, since there are almost no
clients in the wild with traffic steering enabled.

Updates #17366
Updates tailscale/corp#29964
Updates tailscale/corp#29966
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-21 10:11:59 -07:00
Simon Law
7ebca58042 net/traffic,ipn/ipnlocal: extract traffic steering utilities (#19682)
The traffic package contains helpers for evaluating traffic steering
scores and picking appropriate nodes. These were extracted from
ipnlocal.suggestExitNodeUsingTrafficSteering so they can be reused by
the new routecheck package to probe exit nodes in priority order.

Updates #17366
Updates tailscale/corp#33033

Signed-off-by: Simon Law <sfllaw@tailscale.com>
2026-05-21 08:28:27 -07:00
Aria Stewart
61277e3ad4 Construct IPv6 ingress URLs correctly
Fixes #19338

Signed-off-by: Aria Stewart <aredridel@dinhe.net>
2026-05-20 17:21:35 -07:00
Brad Fitzpatrick
87a74c3aa2 tsnet: make workload identity federation opt-in
The tailscale.com/wif package brings in the AWS SDK
(github.com/aws/aws-sdk-go-v2/{config,sts,...} and github.com/aws/smithy-go)
to support fetching ID tokens from AWS IMDS for workload identity
federation. Until now, tsnet pulled this in unconditionally via
feature/condregister/identityfederation, costing ~70 unwanted deps for
every tsnet program whether or not it uses workload identity federation.

These AWS SDK deps were originally removed from tsnet on 2025-09-29 by
commit 69c79cb9f ("ipn/store, feature/condregister: move AWS + Kube
store registration to condregister"). They were then accidentally added
back on 2026-01-14 by commit 6a6aa805d ("cmd,feature: add identity
token auto generation for workload identity", PR #18373) when the new
wif package was wired into tsnet via feature/identityfederation.

Drop the blanket import. tsnet programs that want workload identity
federation now opt in with:

    import _ "tailscale.com/feature/identityfederation"

The hook lookup in resolveAuthKey already uses GetOk and degrades
gracefully when the feature isn't linked, so existing programs that
don't use workload identity federation see no behavior change. The
tailscale CLI still imports the condregister wrapper directly, so its
behavior is also unchanged.

Lock this in with TestDeps additions: tailscale.com/wif as a BadDep,
plus substring checks in OnDep that fail on any github.com/aws/ or
k8s.io/ dependency creeping back in.

Also, switch cmd/gitops-pusher from the condregister wrapper to a
direct import of feature/identityfederation: gitops-pusher's auth flow
calls HookExchangeJWTForTokenViaWIF directly, so it shouldn't be
subject to the ts_omit_identityfederation build tag.

Updates #12614

Change-Id: I70599f2bdd4d3666b26a859d5b76caa5d6b94507
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-06 18:43:45 -07:00
Brad Fitzpatrick
883d4fd2cd wgengine/netstack, net/ping: stop using pro-bing and use our net/ping instead
Fixes #19633
Fixes #13760

Change-Id: I0fa9423523a3a0fb1dfcde57de0f26e51723ff97
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-05-04 14:05:24 -07:00
Tom Meadows
ee10f9881c cmd/k8s-operator: add authkey reissuing to recorder reconciler (#19556)
also fixes memory leak with authKeyReissuing map on ProxyGroup
reconciler authkey reissue.

Updates #19311

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-05-01 18:26:55 +01:00
Andrew Lytvynov
f15a4f4416 client/web: move API permission checks into handlers (#19576)
There are only a couple endpoints that check peer capabilities. Keeping
permission checks with the code that assumes they were performed, rather
than with the routing layer, feels easier to reason about.

Check that the caller is actually a peer and pass their capabilities via
a context value for handlers that want to check them.

Along with this, simplify the helper handler wrappers that are not
needed for most of the endpoints.

Updates #40851

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2026-05-01 09:01:53 -07:00
David Bond
644c3224e9 cmd/{containerboot,k8s-operator}: don't return pointers to maps (#19593)
This commit modifies the usage of the `egressservices.Configs` type
within containerboot and the k8s operator.

Originally it was being thrown around as a pointer which is not required
as maps are already pointers under the hood.

Signed-off-by: David Bond <davidsbond93@gmail.com>
2026-04-30 16:11:00 +01:00
David Bond
a29e42135b cmd/k8s-operator: add nodeSelector to DNSConfig resource (#19429)
This commit modifies the `DNSConfig` resource to allow customisation of
the `spec.nodeSelector` field in the nameserver pods.

Closes: https://github.com/tailscale/tailscale/issues/19419

Signed-off-by: David Bond <davidsbond93@gmail.com>
2026-04-29 15:56:33 +01:00
Daniel Pañeda
7735b15de3 cmd/k8s-operator: truncate long label values in metrics resources (#18895)
* cmd/k8s-operator: truncate long label values in metrics resources

Kubernetes label values have a 63-character limit, but resource names
can be up to 253 characters. When a Service or Ingress with a long
name is exposed via Tailscale, the operator fails to reconcile because
it uses the parent resource name directly as label values on metrics
Services.

Truncate label values that may exceed the limit by keeping the first
54 characters and appending a SHA256-based hash suffix to preserve
uniqueness.

Fixes #18894

Signed-off-by: Daniel Pañeda <daniel.paneda@clickhouse.com>
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>

* cmd/k8s-operator: move TruncateLabelValue to shared k8s-operator package

Move the label truncation helper to k8s-operator/utils.go so it can be
reused by other components that need to produce valid Kubernetes labels.

Signed-off-by: Daniel Pañeda <daniel.paneda@clickhouse.com>
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>

* cmd/k8s-operator: truncate long domain label values in cert resources

Applies TruncateLabelValue to certResourceLabels in order to prevent API
server validation failures. This covers both the HA Ingress and kube-apiserver
proxy reconcilers, as both flow through certResourceLabels.

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>

* cmd/k8s-operator: remove empty metrics_resources_test.go, use hyphens in test names to satisfy go vet

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>

---------

Signed-off-by: Daniel Pañeda <daniel.paneda@clickhouse.com>
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-04-28 14:11:59 +01:00
BeckyPauley
7477a6ee47 cmd/k8s-operator: use dynamic resource names in e2e ingress tests (#19536)
Replace hardcoded resource names with dynamically generated names in
k8s-operator-e2e ingress tests to avoid collisions with stale resources.

Updates #tailscale/corp#40612

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-04-27 13:40:46 +01:00
Andrew Lytvynov
ad9e6c1925 go.mod: bump github.com/google/go-containerregistry (#19500)
This drops an indirect dependency on the old github.com/docker/docker
(which was replaced with github.com/moby/moby) and fixes a couple recent
CVEs.

Updates #cleanup

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2026-04-23 10:39:27 -07:00
BeckyPauley
b239e92eb6 cmd/k8s-operator: add e2e test setup and l7 ingress test for multi-tailnet (#19426)
This change adds setup for a second tailnet to enable multi-tailnet e2e
tests. When running against devcontrol, a second tailnet is created via the
API. Otherwise, credentials are read from SECOND_TS_API_CLIENT_SECRET.

Also adds an l7 HA Ingress test for multi-tailnet.

Fixes tailscale/corp#37498

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-04-17 17:03:25 +01:00
Bjorn Stange
47ecbe5845 cmd/k8s-operator: add priorityClassName support to helm chart (#19236)
Expose priorityClassName in the operator Helm chart values so that
users can configure the operator deployment with a Kubernetes
PriorityClass. This prevents the operator pods from being preempted
by lower-priority workloads.

Fixes #19235

Signed-off-by: Bjorn Stange <bjorn.stange@expel.io>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 12:57:12 +01:00
David Bond
eea39eaf52 cmd/k8s-operator: add affinity rules to DNSConfig (#19360)
This commit modifies the `DNSConfig` custom resource to allow the
user to specify affinity rules on the nameserver pods.

Updates: https://github.com/tailscale/tailscale/issues/18556

Signed-off-by: David Bond <davidsbond93@gmail.com>
2026-04-15 22:39:04 +01:00
Jonathan Nobels
acc43356c6 control/controlclient: enable request signatures on macOS (#19317)
fixes tailscale/corp#39422

Updates tailscale/certstore for properly macOS support and
builds the request signing support into macOS builds.  iOS and builds
that do not use cGo are omitted.

Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
2026-04-15 14:11:14 -04:00
Fernando Serboncini
6b7caaf7ee cmd/k8s-operator: set PreferDualStack on ProxyGroup egress services (#19194)
On dual-stack clusters defaulting to IPv6, the ProxyGroup egress
service only got an IPv6 address, which causes request failures.
Individual egress proxies already set PreferDualStack correctly.

Fixes: #18768

Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
2026-04-09 13:33:39 -04:00
David Bond
85d6ba9473 cmd/k8s-operator: migrate to tailscale-client-go-v2 (#19010)
This commit modifies the kubernetes operator to use the `tailscale-client-go-v2`
package instead of the internal tailscale client it was previously using. This
now gives us the ability to expand out custom resources and features as they
become available via the API module.

The tailnet reconciler has also been modified to manage clients as tailnets
are created and removed, providing each subsequent reconciler with a single
`ClientProvider` that obtains a tailscale client for the respective tailnet
by name, or the operator's default when presented with a blank string.

Fixes: https://github.com/tailscale/corp/issues/38418

Signed-off-by: David Bond <davidsbond93@gmail.com>
2026-04-09 14:39:46 +01:00
Brad Fitzpatrick
5ef3713c9f cmd/vet: add subtestnames analyzer; fix all existing violations
Add a new vet analyzer that checks t.Run subtest names don't contain
characters requiring quoting when re-running via "go test -run". This
enforces the style guide rule: don't use spaces or punctuation in
subtest names.

The analyzer flags:
- Direct t.Run calls with string literal names containing spaces,
  regex metacharacters, quotes, or other problematic characters
- Table-driven t.Run(tt.name, ...) calls where tt ranges over a
  slice/map literal with bad name field values

Also fix all 978 existing violations across 81 test files, replacing
spaces with hyphens and shortening long sentence-like names to concise
hyphenated forms.

Updates #19242

Change-Id: Ib0ad96a111bd8e764582d1d4902fe2599454ab65
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-04-05 15:52:51 -07:00
BeckyPauley
e82ffe03ad cmd/k8s-operator: add further E2E tests for Ingress (#19219)
* cmd/k8s-operator/e2e: add L7 HA ingress test

Change-Id: Ic017e4a7e3affbc3e2a87b9b6b9c38afd65f32ed
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>

* cmd/k8s-operator: add further E2E tests for Ingress (#34833)

This change adds E2E tests for L3 HA Ingress and L7 Ingress (Standalone and
HA). Updates the existing L3 Ingress test to use the Service's Magic DNS
name to test connectivity.

Also refactors test setup to set TS_DEBUG_ACME_DIRECTORY_URL only for tests
running against devcontrol, and updates the Kind node image from v1.30.0 to
v1.35.0.

Fixes tailscale/corp#34833

Signed-off-by: Becky Pauley <becky@tailscale.com>

---------

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Signed-off-by: Becky Pauley <becky@tailscale.com>
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
2026-04-02 15:49:40 +01:00
Mike O'Driscoll
1403920367 derp,types,util: use bufio Peek+Discard for allocation-free fast reads (#19067)
Replace byte-at-a-time ReadByte loops with Peek+Discard in the DERP
read path. Peek returns a slice into bufio's internal buffer without
allocating, and Discard advances the read pointer without copying.

Introduce util/bufiox with a BufferedReader interface and ReadFull
helper that uses Peek+copy+Discard as an allocation-free alternative
to io.ReadFull.

  - derp.ReadFrameHeader: replace 5× ReadByte with Peek(5)+Discard(5),
    reading the frame type and length directly from the peeked slice.
    Remove now-unused readUint32 helper.

    name                  old ns/op  new ns/op  speedup
    ReadFrameHeader-8     24.2       12.4       ~2x
    (0 allocs/op in both)

  - key.NodePublic.ReadRawWithoutAllocating: replace 32× ReadByte with
    bufiox.ReadFull. Addresses the "Dear future" comment about switching
    away from byte-at-a-time reads once a non-escaping alternative exists.

    name                              old ns/op  new ns/op  speedup
    NodeReadRawWithoutAllocating-8    140        43.6       ~3.2x
    (0 allocs/op in both)

  - derpserver.handleFramePing: replace io.ReadFull with bufiox.ReadFull.

Updates tailscale/corp#38509

Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
2026-03-24 10:52:20 -04:00
Claus Lensbøl
85bb5f84a5 wgengine/magicsock,control/controlclient: do not overwrite discokey with old key (#18606)
When a client starts up without being able to connect to control, it
sends its discoKey to other nodes it wants to communicate with over
TSMP. This disco key will be a newer key than the one control knows
about.

If the client that can connect to control gets a full netmap, ensure
that the disco key for the node not connected to control is not
overwritten with the stale key control knows about.

This is implemented through keeping track of mapSession and use that for
the discokey injection if it is available. This ensures that we are not
constantly resetting the wireguard connection when getting the wrong
keys from control.

This is implemented as:
 - If the key is received via TSMP:
   - Set lastSeen for the peer to now()
   - Set online for the peer to false
 - When processing new keys, only accept keys where either:
   - Peer is online
   - lastSeen is newer than existing last seen

If mapSession is not available, as in we are not yet connected to
control, punt down the disco key injection to magicsock.

Ideally, we will want to have mapSession be long lived at some point in
the near future so we only need to inject keys in one location and then
also use that for testing and loading the cache, but that is a yak for
another PR.

Updates #12639

Signed-off-by: Claus Lensbøl <claus@tailscale.com>
2026-03-20 08:56:27 -04:00
Tom Proctor
621f71981c cmd/k8s-operator: fix Service reconcile triggers for default ProxyClass (#18983)
The e2e ingress test was very occasionally flaky. On looking at operator
logs from one failure, you can see the default ProxyClass was not ready
before the first reconcile loop for the exposed Service. The ProxyClass
became ready soon after, but no additional reconciles were triggered for
the exposed Service because we only triggered reconciles for Services
that explicitly named their ProxyClass.

This change adds additional list API calls for when it's the default
ProxyClass that's been updated in order to catch Services that use it by
default. It also adds indexes for the fields we need to search on to
ensure the list is efficient.

Fixes tailscale/corp#37533

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2026-03-13 14:31:16 +00:00
Tom Proctor
95a135ead1 cmd/{containerboot,k8s-operator}: reissue auth keys for broken proxies (#16450)
Adds logic for containerboot to signal that it can't auth, so the
operator can reissue a new auth key. This only applies when running with
a config file and with a kube state store.

If the operator sees reissue_authkey in a state Secret, it will create a
new auth key iff the config has no auth key or its auth key matches the
value of reissue_authkey from the state Secret. This is to ensure we
don't reissue auth keys in a tight loop if the proxy is slow to start or
failing for some other reason. The reissue logic also uses a burstable
rate limiter to ensure there's no way a terminally misconfigured
or buggy operator can automatically generate new auth keys in a tight loop.

Additional implementation details (ChaosInTheCRD):

- Added `ipn.NotifyInitialHealthState` to ipn watcher, to ensure that
  `n.Health` is populated when notify's are returned.
- on auth failure, containerboot:
  - Disconnects from control server
  - Sets reissue_authkey marker in state Secret with the failing key
  - Polls config file for new auth key (10 minute timeout)
  - Restarts after receiving new key to apply it

- modified operator's reissue logic slightly:
  - Deletes old device from tailnet before creating new key
  - Rate limiting: 1 key per 30s with initial burst equal to replica count
  - In-flight tracking (authKeyReissuing map) prevents duplicate API calls
    across reconcile loops

Updates #14080

Change-Id: I6982f8e741932a6891f2f48a2936f7f6a455317f


(cherry picked from commit 969927c47c3d4de05e90f5b26a6d8d931c5ceed4)

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-03-11 10:25:57 +00:00
Brad Fitzpatrick
f905871fb1 ipn/ipnlocal, feature/ssh: move SSH code out of LocalBackend to feature
This makes tsnet apps not depend on x/crypto/ssh and locks that in with a test.

It also paves the wave for tsnet apps to opt-in to SSH support via a
blank feature import in the future.

Updates #12614

Change-Id: Ica85628f89c8f015413b074f5001b82b27c953a9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-10 17:27:17 -07:00
David Bond
9522619031 cmd/k8s-operator: use correct tailnet client for L7 & L3 ingresses (#18749)
* cmd/k8s-operator: use correct tailnet client for L7 & L3 ingresses

This commit fixes a bug when using multi-tailnet within the operator
to spin up L7 & L3 ingresses where the client used to create the
tailscale services was not switching depending on the tailnet used
by the proxygroup backing the service/ingress.

Updates: https://github.com/tailscale/corp/issues/34561

Signed-off-by: David Bond <davidsbond93@gmail.com>

* cmd/k8s-operator: adding server url to proxygroups when a custom tailnet has been specified

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
(cherry picked from commit 3b21ac5504e713e32dfcd43d9ee21e7e712ac200)

---------

Signed-off-by: David Bond <davidsbond93@gmail.com>
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-03-10 10:33:55 +00:00
Fran Bull
a4614d7d17 appc,feature/conn25: conn25: send address assignments to connector
After we intercept a DNS response and assign magic and transit addresses
we must communicate the assignment to our connector so that it can
direct traffic when it arrives.

Use the recently added peerapi endpoint to send the addresses.

Updates tailscale/corp#34258
Signed-off-by: Fran Bull <fran@tailscale.com>
2026-03-09 14:10:38 -07:00
Brad Fitzpatrick
bd2a2d53d3 all: use Go 1.26 things, run most gofix modernizers
I omitted a lot of the min/max modernizers because they didn't
result in more clear code.

Some of it's older "for x := range 123".

Also: errors.AsType, any, fmt.Appendf, etc.

Updates #18682

Change-Id: I83a451577f33877f962766a5b65ce86f7696471c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-06 13:32:03 -08:00
Brad Fitzpatrick
2a64c03c95 types/ptr: deprecate ptr.To, use Go 1.26 new
Updates #18682

Change-Id: I62f6aa0de2a15ef8c1435032c6aa74a181c25f8f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-05 20:13:18 -08:00
Raj Singh
19e2c8c49f cmd/k8s-proxy: use L4 TCPForward instead of L7 HTTP proxy (#18179)
considerable latency was seen when using k8s-proxy with ProxyGroup
in the kubernetes operator. Switching to L4 TCPForward solves this.

Fixes tailscale#18171

Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
2026-03-05 18:47:54 +00:00
BeckyPauley
faf7f2bc45 cmd/k8s-operator: remove deprecated TS_EXPERIMENTAL_KUBE_API_EVENTS (#18893)
Remove the TS_EXPERIMENTAL_KUBE_API_EVENTS env var from the operator and its
helm chart. This has already been marked as deprecated, and has been
scheduled to be removed in release 1.96.

Add a check in helm chart to fail if the removed variable is set to true,
prompting users to move to ACLs instead.

Fixes: #18875

Signed-off-by: Becky Pauley <becky@tailscale.com>
2026-03-05 12:09:11 +00:00
Brad Fitzpatrick
d784dcc61b go.toolchain.branch: switch to Go 1.26
Updates #18682

Change-Id: I1eadfab950e55d004484af880a5d8df6893e85e8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2026-03-04 21:57:05 -08:00
Mike O'Driscoll
2c9ffdd188 cmd/tailscale,ipn,net/netutil: remove rp_filter strict mode warnings (#18863)
PR #18860 adds firewall rules in the mangle table to save outbound packet
marks to conntrack and restore them on reply packets before the routing
decision. When reply packets have their marks restored, the kernel uses
the correct routing table (based on the mark) and the packets pass the
rp_filter check.

This makes the risk check and reverse path filtering warnings unnecessary.

Updates #3310
Fixes tailscale/corp#37846

Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
2026-03-04 14:09:19 -05:00