Added in #20111, but it is too noisy under real load to be useful.
Updates #12542
Change-Id: Ib99a8966ade0bfa4281fccc057249819cdcdfe83
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
`go run` builds a manifest-less .exe, so Windows applies installer-
detection heuristics and requests admin privileges to programs that
contains "install", "setup", or "update". Rename to dodge that.
Updates #20133
Change-Id: I144d3fcb076d7a02e4a3eb9fd079ee022a035c76
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Add a workflow that requests review from @tailscale/k8s-devs on PRs
touching Kubernetes operator, kube libraries, container build, etc.
Also cleans up check out code on k8s and dataplane workflow.
Updates #cleanup
Change-Id: I6fd7cacf71e1299f7e8f546ef52c4063fbf6bab8
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
tailscale serve set-config now also accepts the legacy raw ipn.ServeConfig
format (as emitted by `tailscale serve status --json` and consumed via
TS_SERVE_CONFIG, which has no "version" field), so the common
serve-status-edit-set workflow stops failing. Only the services-oriented
content is applied; any node-level fields are skipped with a warning to
stderr pointing users at get-config to migrate.
Fixestailscale/corp#39793
Signed-off-by: Brendan Creane <bcreane@gmail.com>
Bumps wireguard-go pin to include the roaming endpoints fix, and
two internal enhancements.
Pulls stock wireguard-go for non-tailscale simulation in tests,
to use its endpoint discovery mechanism.
Updates #20082
Change-Id: I2ff282cb7fe4ab099ce5e780a1d40ae86a6a6964
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
Package features/conn25 wires up the hooks directly on the tun wrapper
without needing to go through the userspace engine, so this codepath is
unused and not needed.
Updates #cleanup
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
Add an UpdatePeers method to the cache. This allows us to support netmap peer deltas,
by allowing just the peers to be updated in an existing cache. As a safety check, reject
an update if there was no base netmap data to apply a change to.
Then, when processing peer mutations in the backend, capture any changes that should
be applied to the cache and update it, if one is enabled.
Updates #12542
Change-Id: I2f8790a8fdc5e85fce6700ba4821a8cb10dddffa
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Since deltas are only (at present) received from the control plane, processing
a delta signifies we are no longer operating on a netmap fully loaded from
cache, even if most of the netmap is still in the same configuration.
Updates #12542
Change-Id: I84132c4bf2dde6e5c1c57144645edb986b051dca
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Flakeytest seems to not work on vmtest. We have a few PRs that will fix
the problem on these tests, so skip to unblock.
Updates #19843
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
The hook fires when a flow is removed for any reason (LRU capacity eviction,
tuple-collision displacement, or idle-time expiry). The hook is invoked
exactly once per flow, after the flow table mutex is released, so callbacks
may safely acquire other locks.
We rename the IPMapper interface to Conn25Datapath, and add
ClientFlowCreated/ClientFlowRemoved methods so *Conn25 can keep client-side
address assignments alive while traffic is in flight. Those methods are
currently stubbed for future work.
Connector flows do not currently call these methods.
Updates tailscale/corp#38630
Updates tailscale/corp#43180
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
The returned error in the signature is left over from previous
implementations and was only returning nil.
If we know NewFlow will succeed we can fire a create hook (implemented
in a future commit) before NewFlow, which will prevent a remove hook for
a flow from firing before the create hook for the same flow.
Updates tailscale/corp#38630
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
Adds tailscaled_serve_{inbound,outbound}_bytes_total, labeled by Tailscale
Service name, by wrapping the peer-facing conn in tcpHandlerForVIPService.
Per-service counters persist for the process lifetime rather than being
evicted on serve-config changes.
Fixes#19572
Signed-off-by: Raj Singh <raj@tailscale.com>
Co-authored-by: Ethan Smith <ethan.smith@grafana.com>
When running under the macOS sandbox, "tailscale configure kubeconfig"
refused outright whenever $KUBECONFIG was set, assuming the path would
not be writable. Yet when $KUBECONFIG was unset it happily relied on the
home-relative-path entitlement to write to ~/.kube/config, so the two
paths made inconsistent assumptions about what the sandbox can reach.
Resolve the kubeconfig path first, then check whether the target file
(or the nearest existing parent directory) is actually writable. Only
report an error if it is not, and include macOS sandbox guidance in that
error since a path outside the home directory is the likely cause. This
lets a $KUBECONFIG that does point under the home directory work, rather
than being rejected unconditionally.
Fixes#20007
Change-Id: I9880363c38b981efaed7e97367851ddacf647be1
Signed-off-by: James Tucker <james@tailscale.com>
This adds testcontrol support for expiring individual node keys,
in order to enable test scenarios involving to key-expiry and
extension.
Updates #19326
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
authRoutine snapshots c.loginGoal, runs TryLogin without the lock,
then writes back loggedIn/loginGoal under the lock. If a concurrent
Login() or Logout() changes the goal during the in-flight request,
the write-back overwrites the new intent: the more recent login goal
is silently dropped, or a logout is reverted to logged-in.
Gate both the URL-followup and success commits on c.loginGoal still
matching the goal we were processing. Stale results are ignored and
the next iteration runs with the current goal.
Updates #19326
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
When a client's node key expires and the user clicks "Login" (or runs
`tailscale up`), the Login() method was cancelling the map poll context.
This caused key extension notifications from the server to be lost,
leaving clients stuck in NeedsLogin state even after an admin extended
their key.
The fix has three parts:
1. Login(): Don't cancel mapCtx if we have valid credentials (loggedIn=true)
or a valid node key. This allows the map poll to continue receiving
server notifications while the auth flow proceeds in parallel.
2. mapRoutine(): Poll when we have a node key, even if !loggedIn. This
handles the tsnet restart scenario where control returns an AuthURL
(so loggedIn=false) but we still have a valid node key that can
receive map updates.
3. sendStatus()/UpdateFullNetmap(): Forward netmaps when we have a node
key, not just when loggedIn. This ensures the backend sees key expiry
changes even when the auth flow hasn't completed.
"First successful flow wins": if a key extension arrives via map poll,
the client recovers automatically. If the auth flow completes first,
that works too. Either way, the client is no longer stuck.
This aligns with the SeamlessKeyRenewal philosophy: maintain connectivity
paths while authentication proceeds, allowing server-initiated recovery.
Fixes#19326
Change-Id: I26dbbc1fa7c1159ba075362e44d02814355d6b44
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* cmd/k8s-operator: rework [unexpected] log lines
This commit modifies several places in the operator logs where we
prepend `[unexpected]` to instead use an appropriate logging level.
The `[unexpected]` prefix is intended to be used when the program
violates some internal invariant (or for example, a database has
become corrupted). Many of these cases were simply log lines that
then fell back to a default value/behaviour. These have been releveled
to warnings.
Some of these log lines also seemed extraeneous as for the example of
service reconcilers logging when there is no proxy group annotation. As
far as I can tell we've never had any predicates for limiting the
services reconciled to ones with that annotation, so they can just
be removed to reduce log spam.
Fixes: #cleanup
Signed-off-by: David Bond <davidsbond93@gmail.com>
* Update cmd/k8s-operator/egress-services-readiness.go
Co-authored-by: BeckyPauley <64131207+BeckyPauley@users.noreply.github.com>
Signed-off-by: David Bond <davidsbond@users.noreply.github.com>
* Update cmd/k8s-operator/operator.go
Co-authored-by: BeckyPauley <64131207+BeckyPauley@users.noreply.github.com>
Signed-off-by: David Bond <davidsbond@users.noreply.github.com>
---------
Signed-off-by: David Bond <davidsbond93@gmail.com>
Signed-off-by: David Bond <davidsbond@users.noreply.github.com>
Co-authored-by: BeckyPauley <64131207+BeckyPauley@users.noreply.github.com>
Prevent tailscale ssh from automatically adding a username when
connecting to a server, only forward one if provided. The previous
behaviour prevented username overrides in the ssh configuration, since
the provided username takes precedence to the configured one.
This also keeps the tailscale ssh a thin wrapper around ssh by not
adding any extra arguments unless required.
Fixes#19357
Signed-off-by: Örjan Fors <o@42mm.org>
I previously (in #20096) had only considered the tailscaled deps
and forgot about the CLI deps. This does the CLI ones too.
containerboot and k8s-operator aren't applicable because they build
from oss already.
Updates tailscale/corp#43243
Updates #20067
Change-Id: I66790f822b5d040e7fcf90feabca24669f69cf61
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We aren't supposed to be using CODEOWNERS as blocking
reviews, blocking global cleanups.
(This is why we want to move to go/policybot)
Updates tailscale/corp#13972
Change-Id: I380258e2d4ffd0720d57d891adab06c8ca388617
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The 1 minute timeout was hitting timers inside wireguard-go, leading
stale connections hanging forever. Increasing the timeout to 2 minutes
makes a small subset of cached connections establish direct connections
slightly slower.
Updates to wireguard-go will allow a better hook for when to send these
messages in the future. This change only makes fixes the error mode but
if we have better triggers coming in wireguard-go, we should be using
those.
Updates #20081
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
9be21088f4 changed sending disco pings so
a callMeMaybe would be not be gated by endpoints existing if the node
was running off of a cached netmap.
This commit partly reverts that change, but keeps in a few bug fixes in
that commit and the tests that was introduced and now skipped.
The behaviour prior to 9be21088f4 is
retained.
Updates #20085
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
This commit modifies the reconciler for the `Tailnet` custom resource
to allow referenced secrets to specify an `audience` field. If a
referenced secret contains both an `audience` and `client_id` we assume
the user's intention is to use workload identity.
In that case, we configure the tailscale API client to authenticate
using the Kubernetes token request API against the operator's service
account. This requires the operator to be aware of its own service
account name.
A small change has also been made to the messages added to the `Tailnet`
CRD's status field in the even that it is missing scopes to make it
clearer that certain scopes may not be applied.
Closes: #19090
Updates: #19471
Signed-off-by: David Bond <davidsbond93@gmail.com>
Add NotifyInProcessNoDisconnect for in-process IPN bus subscribers that
must apply every bus update. When such a subscriber falls behind, block
Notify production instead of sending the terminal fell-behind message and
closing the watch.
This is intentionally not available over LocalAPI, where a slow or stuck
out-of-process client should still be disconnected rather than allowed to
stall tailscaled. In-process callers that use the bit must keep their
callbacks fast and must not call back into LocalBackend from the callback.
Updates #20062
Change-Id: I730ad61a07475243bb226fba2262c1a3ded211ae
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
New-style IPN bus subscribers consume stateful delta streams. Reject
NotifyRateLimit when it is combined with those subscription bits so
tailscaled cannot merge or delay messages that clients need to apply in
order.
Also stop silently dropping notifications when a watcher falls behind.
Remove the watcher, replace its stale queue with one terminal ErrMessage
notification, and close the watch.
Updates #20062
Change-Id: Id9d402ea76f4011cd23f122adf62f30dd4b6f90b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This removes deprecated magic-dns formats for 4via6 subnet routers.
These are superseded by the current format: Q-R-S-T-via-X.
Fixes#20053
Change-Id: I0eed1f057f856f248c4dc8ce3b751f6c7edcfbfd
Signed-off-by: Becky Pauley <becky@tailscale.com>
macOS 26.4 emits RTM_MISS on the routing socket for every failed route
lookup. skipRouteMessage never inspected the message type, so each miss
woke the monitor as a link change and triggered a netcheck. On networks
without an IPv6 default route the netcheck's IPv6 DERP probes fail and
emit more RTM_MISS messages, sustaining the loop indefinitely: netchecks
run at roughly 40x the intended rate, with sustained probe traffic and
corresponding CPU and battery cost.
RTM_MISS scales with traffic volume, not network state, and is never
the leading signal for a topology change: route withdrawals emit
RTM_DELETE synchronously before any subsequent lookup can miss, so
ignoring it loses no signal. Other routing daemons (bird, dhcpcd, frr)
ignore it as well.
Same fix as coder/tailscale@e956a95074.
Fixes#19324
Signed-off-by: Doug Bryant <dougbryant@anthropic.com>
Did you know that Gilbert Baker used the Pantone color scale when
designing the rainbow flag? I suppose that's not too surprising. There
are also other color scales like munsell and werner. I guess the rainbow
itself is a color scale, with its seven "roygbiv" colors. (It's also
a fish, with both a tail and scales.) We have so many ways to measure
color on so many different scales. And it turns out "pride" itself is
a scale.
Updates #words
Signed-off-by: Will Norris <will@tailscale.com>
Add a vmtest that guards the fix in #20025: after an in-process control
client swap (profile switch / interactive re-login), magicsock's NetInfo
dedup cache (netInfoLast) must be cleared so the structurally-identical
post-switch NetInfo (same PreferredDERP, same NAT shape) is re-reported to
the new control session rather than suppressed as unchanged.
The test brings a node up, pins its home DERP so the reported NetInfo is
identical across the switch, records the home DERP the test control learned,
switches to a fresh login profile on the same control/network/NAT/DERP, and
asserts the control re-learns the same non-zero home DERP for the node's new
identity. Without ResetNetInfoLast the assertion times out at HomeDERP=0.
To support this, vnet now serves the test control on port 443 (TLS) in
addition to port 80: an immediate re-login makes a fresh noise dial, and
because the prior dial was recent the control client forces an HTTPS (443)
dial (controlhttp.Dialer.forceNoise443), which the harness previously did
not answer. The control endpoint gets its own self-signed cert (the existing
selfSignedDERPCert helper, renamed to the generic selfSignedCert); the cert
is not validated since control noise dials authenticate via the Noise
handshake, so it only needs a TLS peer to complete the forced 443 dial.
Add Env.ForcePreferredDERP and Env.Relogin helpers for the above.
Updates #20024
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
Track lastSeen on each cached flow and add a sweeper goroutine
that periodically removes flows idle past the idle timeout.
Introduce tunables for idle timeout, maximum flows removed per sweep (to
limit mutex hold time), and the sweeper interval.
Also cap the previously-unlimited tables: 10k client flows, 100k
connector flows.
Updates tailscale/corp#38630
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
To avoid breaking downstream code, add deprecated aliases for all the
old names.
Updates tailscale/corp#37904
Change-Id: I86d0b0d7da371946440b181c665448f91c3ef8d2
Signed-off-by: Alex Chan <alexc@tailscale.com>
Assign the Kubernetes operator, kube libraries, container build
commands, and related paths to @tailscale/k8s-devs.
Updates #cleanup
Change-Id: I9d8c7ebfd9a2b6401dd8cb0ff335151afe58357c
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
tsnet depends on logpolicy, which in turn depended on util/syspolicy
because of a single LogTarget policy setting it uses.
In this commit, we replace that dependency with a feature.Hook,
which only tailscaled or its platform-specific alternatives should set.
Updates #20031
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This is a refinement of #19916. Previously, we would only emit a latency log
when going from a cached netmap to an uncached one (i.e., from the control
plane). We would like to know the latency in both conditions, though, so
instead use the validity of the previous self state.
Updates #12639
Updates tailscale/projects#27
Change-Id: I6bbeb5d3162f1f98cdb3dcd244f67ef31c170957
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
We don't need to log if the policy doesn't actually say that hardware
attestation must be enabled.
Updates #cleanup
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
magicsock de-duplicates NetInfo callbacks against c.netInfoLast, a cache
that lives on the long-lived magicsock.Conn. That cache survives a control
client swap (interactive login or profile switch), where only the control
client (and its own per-client NetInfo dedup) is replaced. As a result, the
first netcheck after the swap produces a structurally-identical NetInfo
(same PreferredDERP, same NAT shape), magicsock suppresses it as unchanged,
and the new control session never learns our home DERP. Peers can't reach
the node over DERP until some unrelated NetInfo field happens to change.
Add Conn.ResetNetInfoLast to clear the dedup cache, and call it from
LocalBackend.setControlClientLocked whenever a control client is installed,
so the next netcheck re-reports the current NetInfo to the new client.
netInfoLast is only a dedup/optimization cache (all readers nil-guard, and
it is recomputed by every netcheck), so clearing it can only add a delivery,
never lose or misroute one; it is scoped to control-client lifecycle events,
not steady-state operation.
Updates #17887Fixes#20024
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
This adds a fake vnet ACME service, TXT-backed SetDNS support, and a
VM test that fetches a certificate with tailscale cert, serves it with
tailscale serve, and verifies HTTPS from a second node.
This adds coverage motivated by #19915.
Updates #13038
Change-Id: Ie1e53409509337d81c8fbceb63f59f3dfbd48207
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This patch adds examples for unmarshalling the JSON outputs of the
following commands:
tailscale dns query --json
tailscale dns status --json
It also adds an example usage of `tailscale dns` to both
jsonoutput.DNSQueryResult and jsonoutput.DNSStatusResult.
Updates #13326
Updates #18750
Signed-off-by: Simon Law <sfllaw@tailscale.com>
Adds eailed critters and scaly ones. No anole was harmed (it was already
in the list), and the loris was turned away at the door for being
suspiciously tailless.
Removes a word that was misread/misinterpreted and starts a rejection
list in the test suite.
Updates #words
Signed-off-by: James Tucker <james@tailscale.com>