Before sending a fix for #18991, this adds an integration test that
locks in that the proxymap WhoIs code works with two nodes running as
different users, with the second node running a localhost service and
able to use its local tailscaled to identify a Tailscale connection
from the other tailscaled.
Updates #18991
Change-Id: I6fbb0810204d77d2ac558f0cc786b73e3248d031
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For the purpose of improved observability of UDP socket receive buffer
overflows on Linux.
Updates tailscale/corp#37679
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Changed the mapping to store the transit IPs to be indexed by
peer IP rather than NodeID because the data path only has access
to the peer's IP. This change means that IPv4 transit IPs need to
be indexed by the peer's IPv4 address, and IPv6 transit IPs need to
be indexed by the peer's IPv6 address. It is an error if the peer
does not have an address of the same family as the transit IP.
It is also an error if the transit and destination IP families do
not match.
Added a check to ensure that the TransitIPRequest.App matches a
configured app on the connector.
Added additional TransitIPResponse codes to identify the new errors
and change the exsting use of the Other code to use it's own
specific code.
Added logging for the error cases, since they generally indicate that
a peer has constructed a bad request or that there is a config
mismatch between the peer and the local netmap.
Added a test framework for handleConnectorTransitIPRequest and moved
the existing tests into the framework and added new tests.
Fixestailscale/corp#37143
Signed-off-by: George Jones <george@tailscale.com>
The e2e ingress test was very occasionally flaky. On looking at operator
logs from one failure, you can see the default ProxyClass was not ready
before the first reconcile loop for the exposed Service. The ProxyClass
became ready soon after, but no additional reconciles were triggered for
the exposed Service because we only triggered reconciles for Services
that explicitly named their ProxyClass.
This change adds additional list API calls for when it's the default
ProxyClass that's been updated in order to catch Services that use it by
default. It also adds indexes for the fields we need to search on to
ensure the list is efficient.
Fixestailscale/corp#37533
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
This commit adds a "fallback" mechanism to tsnet to allow
the consumer to set "TS_CONTROL_URL" to override the control server.
This allows tsnet applications to gain support for an alternative
control server by just updating without explicitly exposing the
ControlURL option.
Updates #16934
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
Add two small APIs to support out-of-tree projects to exchange custom
signaling messages over DERP without requiring disco protocol
extensions:
- OnDERPRecv callback on magicsock.Options / wgengine.Config: called for
every non-disco DERP packet before the peer map lookup, allowing callers
to intercept packets from unknown peers that would otherwise be dropped.
- SendDERPPacketTo method on magicsock.Conn: sends arbitrary bytes to a
node key via a DERP region, creating the connection if needed. Thin
wrapper around the existing internal sendAddr.
Also allow netstack.Start to accept a nil LocalBackend for use cases
that wire up TCP/UDP handlers directly without a full LocalBackend.
Updates tailscale/corp#24454
Change-Id: I99a523ef281625b8c0024a963f5f5bf5d8792c17
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
After switching from cellular to wifi without ipv6, ForeachInterface still sees rmnet prefixes, so HaveV6 stays true, and magicsock keeps attempting ipv6 connections that either route through cellular or time out for users on wifi without ipv6
This:
-Adds SetAndroidBindToNetworkFunc, a callback to bind the socket to the selected Android Network object
Updates tailscale/tailscale#6152
Signed-off-by: kari-ts <kari@tailscale.com>
In TestUserspaceEnginePortReconfig, when selecting a port, use a random offset rather than searching in a continguous range in case there is a range that is blocked
Updates tailscale/tailscale#2855
Signed-off-by: kari-ts <kari@tailscale.com>
ReadFromUDPAddrPort worked if UDP GRO was unsupported, but we don't
actually want attempted usage, nor does any exist today. Future work
on tailscale/corp#37679 would have required more complexity in this
method, vs clarifying the API intents.
Updates tailscale/corp#37679
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Fix three independent flake sources, at least as debugged by Claude,
though empirically no longer flaking as it was before:
1. Poll for connection counter data instead of reading immediately.
The conncount callback fires asynchronously on received WireGuard
traffic, so after counts.Reset() there is no guarantee the counter
has been repopulated before checkStats reads it. Use tstest.WaitFor
with a 5s timeout to retry until a matching connection appears.
2. Replace the *2 symmetry assumption in global metric assertions.
metricSendUDP and friends are AggregateCounters that sum per-conn
expvars from both magicsock instances. The old assertion assumed
both instances had identical packet counts, which breaks under
asymmetric background WireGuard activity (handshake retries, etc).
The new assertGlobalMetricsMatchPerConn computes the actual sum of
both conns' expvars and compares against the AggregateCounter value.
3. Tolerate physical stats being 0 when user metrics are non-zero.
A rebind event replaces the socket mid-measurement, resetting the
physical connection counter while user metrics still reflect packets
processed before the rebind. Log instead of failing in this case.
Also move counts.Reset() after metric reads and reorder the reset
sequence (counts before metrics) to minimize the race window.
Fixestailscale/tailscale#13420
Change-Id: I7b090a4dc229a862c1a52161b3f2547ec1d1f23f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Adds logic for containerboot to signal that it can't auth, so the
operator can reissue a new auth key. This only applies when running with
a config file and with a kube state store.
If the operator sees reissue_authkey in a state Secret, it will create a
new auth key iff the config has no auth key or its auth key matches the
value of reissue_authkey from the state Secret. This is to ensure we
don't reissue auth keys in a tight loop if the proxy is slow to start or
failing for some other reason. The reissue logic also uses a burstable
rate limiter to ensure there's no way a terminally misconfigured
or buggy operator can automatically generate new auth keys in a tight loop.
Additional implementation details (ChaosInTheCRD):
- Added `ipn.NotifyInitialHealthState` to ipn watcher, to ensure that
`n.Health` is populated when notify's are returned.
- on auth failure, containerboot:
- Disconnects from control server
- Sets reissue_authkey marker in state Secret with the failing key
- Polls config file for new auth key (10 minute timeout)
- Restarts after receiving new key to apply it
- modified operator's reissue logic slightly:
- Deletes old device from tailnet before creating new key
- Rate limiting: 1 key per 30s with initial burst equal to replica count
- In-flight tracking (authKeyReissuing map) prevents duplicate API calls
across reconcile loops
Updates #14080
Change-Id: I6982f8e741932a6891f2f48a2936f7f6a455317f
(cherry picked from commit 969927c47c)
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
This amends the session creation and auth status querying logic of the device UI
backend. On creation of new browser sessions we now store a PendingAuth flag
as part of the session that indicates a pending auth process that needs to be
awaited. On auth status queries, the server initiates a polling for the auth result
if it finds this flag to be true. Once the polling is completes, the flag is set to false.
Why this change was necessary: with regular browser settings, the device UI
frontend opens the control auth URL in a new tab and starts polling for the
results of the auth flow in the current tab. With certain browser settings (that
we still want to support), however, the auth URL opens in the same tab, thus
aborting the subsequent call to auth/session/wait that initiates the polling,
and preventing successful registration of the auth results in the session
status. The new logic ensures the polling happens on the next call to /api/auth
in these kinds of scenarios.
In addition to ensuring the auth wait happens, we now also revalidate the auth
state whenever an open tab regains focus, so that auth changes effected in one
tab propagate to other tabs without the need to refresh. This improves the
experience for all users of the web client when they've got multiple tabs open,
regardless of their browser settings.
Fixes#11905
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
This makes tsnet apps not depend on x/crypto/ssh and locks that in with a test.
It also paves the wave for tsnet apps to opt-in to SSH support via a
blank feature import in the future.
Updates #12614
Change-Id: Ica85628f89c8f015413b074f5001b82b27c953a9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Two issues caused TestCollectPanic to flake:
1. ETXTBSY: The test exec'd the tailscaled binary directly without
going through StartDaemon/awaitTailscaledRunnable, so it lacked
the retry loop that other tests use to work around a mysterious
ETXTBSY on GitHub Actions.
2. Shared filch files: The test didn't pass --statedir or TS_LOGS_DIR,
so all parallel test instances wrote panic logs to the shared system
state directory (~/.local/share/tailscale). Concurrent runs would
clobber each other's filch log files, causing the second run to not
find the panic data from the first.
Fix both by adding awaitTailscaledRunnable before the first exec, and
passing --statedir and TS_LOGS_DIR to isolate each test's log files,
matching what StartDaemon does.
It now passes x/tools/cmd/stress.
Fixes#15865
Change-Id: If18b9acf8dbe9a986446a42c5d98de7ad8aae098
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When IPv6 is unavailable on a system, AddConnmarkSaveRule() and
DelConnmarkSaveRule() would panic with a nil pointer dereference.
Both methods directly iterated over []iptablesInterface{i.ipt4, i.ipt6}
without checking if ipt6 was nil.
Use `getTables()` instead to properly retrieve the available tables
on a given system
Fixes#3310
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
Raw byte accessors for key types, mirroring existing patterns
(NodePublic.Raw32 and DiscoPublicFromRaw32 already exist).
NodePrivate.Raw32 returns the raw 32 bytes of a node private key.
DiscoPrivateFromRaw32 parses a 32-byte raw value as a DiscoPrivate.
Updates tailscale/corp#24454
Change-Id: Ibc08bed14ab359eddefbebd811c375b6365c7919
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
* cmd/k8s-operator: use correct tailnet client for L7 & L3 ingresses
This commit fixes a bug when using multi-tailnet within the operator
to spin up L7 & L3 ingresses where the client used to create the
tailscale services was not switching depending on the tailnet used
by the proxygroup backing the service/ingress.
Updates: https://github.com/tailscale/corp/issues/34561
Signed-off-by: David Bond <davidsbond93@gmail.com>
* cmd/k8s-operator: adding server url to proxygroups when a custom tailnet has been specified
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
(cherry picked from commit 3b21ac5504e713e32dfcd43d9ee21e7e712ac200)
---------
Signed-off-by: David Bond <davidsbond93@gmail.com>
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
We did so for Linux and macOS already, so also do so for Windows. We
only didn't already because originally we never produced binaries for
it (due to our corp repo not needing them), and later because we had
no ./tool/go wrapper. But we have both of those things now.
Updates #18884
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When a recording upload fails mid-session, killProcessOnContextDone
writes the termination message to ss.Stderr() and kills the process.
Meanwhile, run() takes the ss.ctx.Done() path and proceeds to
ss.Exit(), which tears down the SSH channel. The termination message
write races with the channel teardown, so the client sometimes never
receives it.
Fix by adding an exitHandled channel that killProcessOnContextDone
closes when done. run() now waits on this channel after ctx.Done()
fires, ensuring the termination message is fully written before
the SSH channel is torn down.
Fixes#7707
Change-Id: Ib60116c928d3af46d553a4186a72963c2c731e3e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
After we intercept a DNS response and assign magic and transit addresses
we must communicate the assignment to our connector so that it can
direct traffic when it arrives.
Use the recently added peerapi endpoint to send the addresses.
Updates tailscale/corp#34258
Signed-off-by: Fran Bull <fran@tailscale.com>
This change reintroduces UserProfile.Groups, a slice that contains
the ACL-defined and synced groups that a user is a member of.
The slice will only be non-nil for clients with the node attribute
see-groups, and will only contain groups that the client is allowed
to see as per the app payload of the see-groups node attribute.
For example:
```
"nodeAttrs": [
{
"target": ["tag:dev"],
"app": {
"tailscale.com/see-groups": [{"groups": ["group:dev"]}]
}
},
[...]
]
```
UserProfile.Groups will also be gated by a feature flag for the time
being.
Updates tailscale/corp#31529
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
Users on FreeBSD run into a similar problem as has been reported for
Linux #11682 and fixed in #11682: because the tailscaled binaries
that we distribute are static and don't link cgo tailscaled fails to
fetch group IDs that are returned via NSS when spawning an ssh child
process.
This change extends the fallback on the 'id' command that was put in
place as part of #11682 to FreeBSD. More precisely, we try to fetch
the group IDs with the 'id' command first, and only if that fails do
we fall back on the logic in the os/user package.
Updates #14025
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
I omitted a lot of the min/max modernizers because they didn't
result in more clear code.
Some of it's older "for x := range 123".
Also: errors.AsType, any, fmt.Appendf, etc.
Updates #18682
Change-Id: I83a451577f33877f962766a5b65ce86f7696471c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The txRecords buffer had two compounding bugs that caused the
overflow guard to fire on every send tick under high DERP server load,
spamming logs at the full send rate (e.g. 100x/second).
First, int(packetTimeout.Seconds()) truncates fractional-second timeouts,
under-allocating the buffer. Second, the capacity was sized to exactly the
theoretical maximum number of in-flight records with no headroom,
and the expiry check used strict > rather than >=, so records at exactly
the timeout boundary were never evicted by applyTimeouts,
leaving len==cap on the very next tick.
Fixestailscale/corp#37696
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
This hook addition is motivated by the Connectors 2025 work, in which
NATed "Transit IPs" are used to route interesting traffic to the
appropriate peer, without advertising the actual real IPs.
It overlaps with #17858, and specifically with the WIP PR #17861.
If that work completes, this hook may be replaced by other ones
that fit the new WireGuard configuration paradigm.
Fixestailscale/corp#37146
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
This had gotten flaky with Go 1.26.
Use synctest + AllocsPerRun to make it fast and deterministic.
Updates #18682
Change-Id: If673d6ecd8c1177f59c1b9c0f3fca42309375dff
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Otherwise it gets confused on new(123) etc.
Updates #18682
Change-Id: I9e2e93ea24f2b952b2396dceaf094b4db64424b0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Fix its/it's, who's/whose, wether/whether, missing apostrophes
in contractions, and other misspellings across the codebase.
Updates #cleanup
Change-Id: I20453b81a7aceaa14ea2a551abba08a2e7f0a1d8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
considerable latency was seen when using k8s-proxy with ProxyGroup
in the kubernetes operator. Switching to L4 TCPForward solves this.
Fixes tailscale#18171
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
OpenWrt is changing to using alpine like `apk` for package installation
over its previous opkg. Additionally, they are not using the same repo
files as alpine making installation fail.
Add support for the new repository files and ensure that the required
package detection system uses apk.
Updates #18535
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
This commit adds `--json` output mode to dns debug commands.
It defines structs for the data that is returned from:
`tailscale dns status` and `tailscale dns query <DOMAIN>` and
populates that as it runs the diagnostics.
When all the information is collected, it is serialised to JSON
or string built into an output and returned to the user.
The structs are defined and exported to golang consumers of this command
can use them for unmarshalling.
Updates #13326
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Remove the TS_EXPERIMENTAL_KUBE_API_EVENTS env var from the operator and its
helm chart. This has already been marked as deprecated, and has been
scheduled to be removed in release 1.96.
Add a check in helm chart to fail if the removed variable is set to true,
prompting users to move to ACLs instead.
Fixes: #18875
Signed-off-by: Becky Pauley <becky@tailscale.com>