Rather than printing `unknown subcommand: drive` for any Taildrive
commands run in the macOS GUI, print an error message directing the user
to the GUI client and the docs page.
Updates #17210Fixes#18823
Change-Id: I6435007b5911baee79274b56e3ee101e6bb6d809
Signed-off-by: Alex Chan <alexc@tailscale.com>
Updates #19050
When tsnet.Server.start() is called with both Hostname and Dir explicitly
set, os.Executable() failure should not prevent the server from starting.
Extend the existing ios fallback to also cover darwin, where the same
failure occurs when the Go runtime is embedded in a framework launched
via Xcode's debug launcher.
Signed-off-by: Prakash Rudraraju <prakashrj@yahoo.com>
Also implement a limit of one on the number of goroutines that can be
waiting to do a reconfig via AuthReconfig, to prevent extensions from
calling too fast and taxing resources.
Even with the protection, the new method should only be used in
experimental or proof-of-concept contexts. The current intended use is
for an extension to be able force a reconfiguration of WireGuard, and
have the reconfiguration call back into the extension for extra Allowed
IPs.
If in the future if WireGuard is able to reconfigure individual peers more
dynamically, an extension might be able to hook into that process, and
this method on ipnext.Host may be deprecated.
Fixestailscale/corp#38120
Updates tailscale/corp#38124
Updates tailscale/corp#38125
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
The actual secret is passed through argon2 first, so a timing attack is
not feasible remotely, and pretty unlikely locally. Still, clean this
up.
Fixes#19063
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
* ipn: reject advertised routes with non-address bits set
The config file path, EditPrefs local API, and App Connector API were
accepting invalid subnet route prefixes with non-address bits set (e.g.,
2a01:4f9:c010:c015::1/64 instead of 2a01:4f9:c010:c015::/64). All three
paths now reject prefixes where prefix != prefix.Masked() with an error
message indicating the expected masked form.
Updates tailscale/corp#36738
Signed-off-by: Brendan Creane <bcreane@gmail.com>
* address review comments
Signed-off-by: Brendan Creane <bcreane@gmail.com>
---------
Signed-off-by: Brendan Creane <bcreane@gmail.com>
When a client starts up without being able to connect to control, it
sends its discoKey to other nodes it wants to communicate with over
TSMP. This disco key will be a newer key than the one control knows
about.
If the client that can connect to control gets a full netmap, ensure
that the disco key for the node not connected to control is not
overwritten with the stale key control knows about.
This is implemented through keeping track of mapSession and use that for
the discokey injection if it is available. This ensures that we are not
constantly resetting the wireguard connection when getting the wrong
keys from control.
This is implemented as:
- If the key is received via TSMP:
- Set lastSeen for the peer to now()
- Set online for the peer to false
- When processing new keys, only accept keys where either:
- Peer is online
- lastSeen is newer than existing last seen
If mapSession is not available, as in we are not yet connected to
control, punt down the disco key injection to magicsock.
Ideally, we will want to have mapSession be long lived at some point in
the near future so we only need to inject keys in one location and then
also use that for testing and loading the cache, but that is a yak for
another PR.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
This populates UserProfile.Groups in the WhoIs response from the
local backend with the groups of the corresponding user in the
netmap.
This allows tsnet apps to see (and e.g. forward) which groups a
user making a request belongs to - as long as the tsnet app runs
on a node that been granted the tailscale.com/visible-groups
capability via node attributes. If that's not the case or the
user doesn't belong to any groups allow-listed via the node
attribute, Groups won't be populated.
Updates tailscale/corp#31529
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
Two methods could deadlock during shutdown when closing the wrapper.
Ensure that the writers are aware of the wrapper being closed.
Fixes#19037
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Currently IP forwarding health check is done on sending MapRequests.
Move ip forwarding to the health service to gain the benefits
of the health tracker and perodic monitoring out of band from
the MapRequest path. ipnlocal now provides a closure to
the health service to provide the check if forwarding is broken.
Removed `skipIPForwardingCheck` from controlclient/direct.go,
it wasn't being used as the comments describe it, that check
has moved to ipnlocal for the closure to the health tracker.
Updates #18976
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
These were previously swappable for historical reasons that are no
longer relevant.
Removing the indirection enables future inlining optimizations if we
simplify further.
Updates tailscale/corp#38703
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Introduce a datapathHandler that implements hooks that will
receive packets from the tstun.Wrapper. This commit does not wire
those up just yet.
Perform DNAT from Magic IP to Transit IP on outbound flows on clients,
and reverse SNAT in the reverse direction.
Perform DNAT from Transit IP to final destination IP on outbound flows
on connectors, and reverse SNAT in the reverse direction.
Introduce FlowTable to cache validated flows by 5-tuple for fast lookups
after the first packet.
Flow expiration is not covered, and is intended as future work before
the feature is officially released.
Fixestailscale/corp#34249Fixestailscale/corp#35995
Co-authored-by: Fran Bull <fran@tailscale.com>
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
Adds envknobs to override the netstack default TCP keepalive idle time
(~2h) and probe interval (75s) for forwarded connections.
When a tailnet peer goes away without closing its connections (pod
deleted, peer removed from the netmap, silent network partition), the
forwardTCP io.Copy goroutines block until keepalive fires: the
gvisor-side Read waits on a peer that will never send again, and the
backend-side Read waits on a backend that is alive and idle. With the
netstack default of 7200s idle + 9×75s probes, dead-peer detection
takes a little over two hours. Under high-churn forwarding — many
short-lived peers, or peers holding thousands of proxied connections
that drop at once — stuck goroutines accumulate faster than they clear.
The existing SetKeepAlive(true) at this site enables keepalive without
setting the timers; the TODO above it noted "a shorter default might
be better" and "might be a useful user-tunable". This makes both
timers tunable without changing the defaults: unset preserves the ~2h
behavior, which is the right trade-off for battery-powered peers.
The two knobs are independent — setting one leaves the other at the
netstack default. The options are set before SetKeepAlive(true) so the
timer arms with the configured values rather than the defaults —
matches the order in ipnlocal/local.go for SSH keepalive.
Updates #4522
Signed-off-by: Josef Bacik <josefbacik@anthropic.com>
After #18179 switched to L4 TCPForward, EnsureCertLoops found no
domains since it only checked service.Web entries. Certs were never
provisioned, leaving kube-apiserver ProxyGroups stuck at 0/N ready.
Fixes#19019
Signed-off-by: Raj Singh <raj@tailscale.com>
When we are mapping a dns response, if it is a connector domain, change
the source IP addresses for our magic IP addresses. This will allow the
tailscaled to DNAT the traffic for the domain to the connector.
Updates tailscale/corp#34258
Signed-off-by: Fran Bull <fran@tailscale.com>
Before sending a fix for #18991, this adds an integration test that
locks in that the proxymap WhoIs code works with two nodes running as
different users, with the second node running a localhost service and
able to use its local tailscaled to identify a Tailscale connection
from the other tailscaled.
Updates #18991
Change-Id: I6fbb0810204d77d2ac558f0cc786b73e3248d031
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For the purpose of improved observability of UDP socket receive buffer
overflows on Linux.
Updates tailscale/corp#37679
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Changed the mapping to store the transit IPs to be indexed by
peer IP rather than NodeID because the data path only has access
to the peer's IP. This change means that IPv4 transit IPs need to
be indexed by the peer's IPv4 address, and IPv6 transit IPs need to
be indexed by the peer's IPv6 address. It is an error if the peer
does not have an address of the same family as the transit IP.
It is also an error if the transit and destination IP families do
not match.
Added a check to ensure that the TransitIPRequest.App matches a
configured app on the connector.
Added additional TransitIPResponse codes to identify the new errors
and change the exsting use of the Other code to use it's own
specific code.
Added logging for the error cases, since they generally indicate that
a peer has constructed a bad request or that there is a config
mismatch between the peer and the local netmap.
Added a test framework for handleConnectorTransitIPRequest and moved
the existing tests into the framework and added new tests.
Fixestailscale/corp#37143
Signed-off-by: George Jones <george@tailscale.com>
The e2e ingress test was very occasionally flaky. On looking at operator
logs from one failure, you can see the default ProxyClass was not ready
before the first reconcile loop for the exposed Service. The ProxyClass
became ready soon after, but no additional reconciles were triggered for
the exposed Service because we only triggered reconciles for Services
that explicitly named their ProxyClass.
This change adds additional list API calls for when it's the default
ProxyClass that's been updated in order to catch Services that use it by
default. It also adds indexes for the fields we need to search on to
ensure the list is efficient.
Fixestailscale/corp#37533
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
This commit adds a "fallback" mechanism to tsnet to allow
the consumer to set "TS_CONTROL_URL" to override the control server.
This allows tsnet applications to gain support for an alternative
control server by just updating without explicitly exposing the
ControlURL option.
Updates #16934
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
Add two small APIs to support out-of-tree projects to exchange custom
signaling messages over DERP without requiring disco protocol
extensions:
- OnDERPRecv callback on magicsock.Options / wgengine.Config: called for
every non-disco DERP packet before the peer map lookup, allowing callers
to intercept packets from unknown peers that would otherwise be dropped.
- SendDERPPacketTo method on magicsock.Conn: sends arbitrary bytes to a
node key via a DERP region, creating the connection if needed. Thin
wrapper around the existing internal sendAddr.
Also allow netstack.Start to accept a nil LocalBackend for use cases
that wire up TCP/UDP handlers directly without a full LocalBackend.
Updates tailscale/corp#24454
Change-Id: I99a523ef281625b8c0024a963f5f5bf5d8792c17
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
After switching from cellular to wifi without ipv6, ForeachInterface still sees rmnet prefixes, so HaveV6 stays true, and magicsock keeps attempting ipv6 connections that either route through cellular or time out for users on wifi without ipv6
This:
-Adds SetAndroidBindToNetworkFunc, a callback to bind the socket to the selected Android Network object
Updates tailscale/tailscale#6152
Signed-off-by: kari-ts <kari@tailscale.com>
In TestUserspaceEnginePortReconfig, when selecting a port, use a random offset rather than searching in a continguous range in case there is a range that is blocked
Updates tailscale/tailscale#2855
Signed-off-by: kari-ts <kari@tailscale.com>
ReadFromUDPAddrPort worked if UDP GRO was unsupported, but we don't
actually want attempted usage, nor does any exist today. Future work
on tailscale/corp#37679 would have required more complexity in this
method, vs clarifying the API intents.
Updates tailscale/corp#37679
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Fix three independent flake sources, at least as debugged by Claude,
though empirically no longer flaking as it was before:
1. Poll for connection counter data instead of reading immediately.
The conncount callback fires asynchronously on received WireGuard
traffic, so after counts.Reset() there is no guarantee the counter
has been repopulated before checkStats reads it. Use tstest.WaitFor
with a 5s timeout to retry until a matching connection appears.
2. Replace the *2 symmetry assumption in global metric assertions.
metricSendUDP and friends are AggregateCounters that sum per-conn
expvars from both magicsock instances. The old assertion assumed
both instances had identical packet counts, which breaks under
asymmetric background WireGuard activity (handshake retries, etc).
The new assertGlobalMetricsMatchPerConn computes the actual sum of
both conns' expvars and compares against the AggregateCounter value.
3. Tolerate physical stats being 0 when user metrics are non-zero.
A rebind event replaces the socket mid-measurement, resetting the
physical connection counter while user metrics still reflect packets
processed before the rebind. Log instead of failing in this case.
Also move counts.Reset() after metric reads and reorder the reset
sequence (counts before metrics) to minimize the race window.
Fixestailscale/tailscale#13420
Change-Id: I7b090a4dc229a862c1a52161b3f2547ec1d1f23f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Adds logic for containerboot to signal that it can't auth, so the
operator can reissue a new auth key. This only applies when running with
a config file and with a kube state store.
If the operator sees reissue_authkey in a state Secret, it will create a
new auth key iff the config has no auth key or its auth key matches the
value of reissue_authkey from the state Secret. This is to ensure we
don't reissue auth keys in a tight loop if the proxy is slow to start or
failing for some other reason. The reissue logic also uses a burstable
rate limiter to ensure there's no way a terminally misconfigured
or buggy operator can automatically generate new auth keys in a tight loop.
Additional implementation details (ChaosInTheCRD):
- Added `ipn.NotifyInitialHealthState` to ipn watcher, to ensure that
`n.Health` is populated when notify's are returned.
- on auth failure, containerboot:
- Disconnects from control server
- Sets reissue_authkey marker in state Secret with the failing key
- Polls config file for new auth key (10 minute timeout)
- Restarts after receiving new key to apply it
- modified operator's reissue logic slightly:
- Deletes old device from tailnet before creating new key
- Rate limiting: 1 key per 30s with initial burst equal to replica count
- In-flight tracking (authKeyReissuing map) prevents duplicate API calls
across reconcile loops
Updates #14080
Change-Id: I6982f8e741932a6891f2f48a2936f7f6a455317f
(cherry picked from commit 969927c47c)
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
This amends the session creation and auth status querying logic of the device UI
backend. On creation of new browser sessions we now store a PendingAuth flag
as part of the session that indicates a pending auth process that needs to be
awaited. On auth status queries, the server initiates a polling for the auth result
if it finds this flag to be true. Once the polling is completes, the flag is set to false.
Why this change was necessary: with regular browser settings, the device UI
frontend opens the control auth URL in a new tab and starts polling for the
results of the auth flow in the current tab. With certain browser settings (that
we still want to support), however, the auth URL opens in the same tab, thus
aborting the subsequent call to auth/session/wait that initiates the polling,
and preventing successful registration of the auth results in the session
status. The new logic ensures the polling happens on the next call to /api/auth
in these kinds of scenarios.
In addition to ensuring the auth wait happens, we now also revalidate the auth
state whenever an open tab regains focus, so that auth changes effected in one
tab propagate to other tabs without the need to refresh. This improves the
experience for all users of the web client when they've got multiple tabs open,
regardless of their browser settings.
Fixes#11905
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
This makes tsnet apps not depend on x/crypto/ssh and locks that in with a test.
It also paves the wave for tsnet apps to opt-in to SSH support via a
blank feature import in the future.
Updates #12614
Change-Id: Ica85628f89c8f015413b074f5001b82b27c953a9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Two issues caused TestCollectPanic to flake:
1. ETXTBSY: The test exec'd the tailscaled binary directly without
going through StartDaemon/awaitTailscaledRunnable, so it lacked
the retry loop that other tests use to work around a mysterious
ETXTBSY on GitHub Actions.
2. Shared filch files: The test didn't pass --statedir or TS_LOGS_DIR,
so all parallel test instances wrote panic logs to the shared system
state directory (~/.local/share/tailscale). Concurrent runs would
clobber each other's filch log files, causing the second run to not
find the panic data from the first.
Fix both by adding awaitTailscaledRunnable before the first exec, and
passing --statedir and TS_LOGS_DIR to isolate each test's log files,
matching what StartDaemon does.
It now passes x/tools/cmd/stress.
Fixes#15865
Change-Id: If18b9acf8dbe9a986446a42c5d98de7ad8aae098
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When IPv6 is unavailable on a system, AddConnmarkSaveRule() and
DelConnmarkSaveRule() would panic with a nil pointer dereference.
Both methods directly iterated over []iptablesInterface{i.ipt4, i.ipt6}
without checking if ipt6 was nil.
Use `getTables()` instead to properly retrieve the available tables
on a given system
Fixes#3310
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
Raw byte accessors for key types, mirroring existing patterns
(NodePublic.Raw32 and DiscoPublicFromRaw32 already exist).
NodePrivate.Raw32 returns the raw 32 bytes of a node private key.
DiscoPrivateFromRaw32 parses a 32-byte raw value as a DiscoPrivate.
Updates tailscale/corp#24454
Change-Id: Ibc08bed14ab359eddefbebd811c375b6365c7919
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>