Add more explicit `--bind-address` and `--bind-port` flags to the `tailscale netcheck` CLI to give users control over UDP probes' source IP and UDP port.
This was already supported in a less documented manner via the` TS_DEBUG_NETCHECK_UDP_BIND` environment variable. The environment variable reference is preserved and used as a fallback value in the absence of these new CLI flags.
Updates tailscale/corp#36833
Signed-off-by: Amal Bansode <amal@tailscale.com>
--auth is already best-effort, but we saw some CI failures due to
failing to fetch stats when cigocached was overwhelmed recently. Make
sure it fails more gracefully in the absence of cigocached like the rest
of cigocacher already does.
Updates tailscale/corp#37059
Change-Id: I0703b30b1c5a7f8c649879a87e6bcd2278610208
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
fixestailscale/corp#37048
We're duplicating logic in AnyInterfaceUp in the ChangeDelta
and we're duplicating it wrong. The new State has the logic
for this based on the HaveV6 and HaveV4 flags.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
Stop stripping the "*." prefix from wildcard domains when used
as storage keys. Instead, replace "*" with "wildcard_" only at
the filesystem boundary in certFile and keyFile. This prevents
wildcard and non-wildcard certs from colliding in storage.
Updates #1196
Updates #7081
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Updating a node on a testcontrol server should trigger netmap updates to
all connected streaming clients. This was not the case previous to this
change and consequently caused race conditions in tests. It was possible
for a test to call UpdateNode and for connected nodes to never see the
update propagate.
Updates #16340Fixes#18703
Signed-off-by: Harry Harpham <harry@tailscale.com>
This commit implements a reconciler for the new `ProxyGroupPolicy`
custom resource. When created, all `ProxyGroupPolicy` resources
within the same namespace are merged into two `ValidatingAdmissionPolicy`
resources, one for egress and one for ingress.
These policies use CEL expressions to limit the usage of the
"tailscale.com/proxy-group" annotation on `Service` and `Ingress`
resources on create & update.
Included here is also a new e2e test that ensures that resources that
violate the policy return an error on creation, and that once the
policy is changed to allow them they can be created.
Closes: https://github.com/tailscale/corp/issues/36830
Signed-off-by: David Bond <davidsbond93@gmail.com>
This commit is based on ff0978ab, and extends #18497 to connect network map
caching to the LocalBackend. As implemented, only "whole" netmap values are
stored, and we do not yet handle incremental updates. As-written, the feature must
be explicitly enabled via the TS_USE_CACHED_NETMAP envknob, and must be
considered experimental.
Updates #12639
Co-Authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I48a1e92facfbf7fb3a8e67cff7f2c9ab4ed62c83
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
TestOnlyTaggedPeersCanBeDialed has a race condition:
- The test untags ps[2] and waits until ps[0] sees this tag dropped from
ps[2] in the netmap.
- Later the test tries to dial ps[2] from ps[0] and expects the dial to
fail as authorization to dial relies on the presence of the tag, now
removed from ps[2].
- However, the authorization layer caches the status used to consult peer
tags. When the dial happens before the cache times out, the test fails.
- Due to a bug in testcontrol.Server.UpdateNode, which the test uses to
remove the tag, netmap updates are not immediately triggered. The test
has to wait for the next natural set of netmap updates, which on my
machine takes about 22 seconds. As a result, the cache in the
authorization layer times out and the test passes.
- If one fixes the bug in UpdateNode, then netmap updates happen
immediately, the cache is no longer timed out when the dial occurs, and
the test fails.
Fixes#18720
Updates #18703
Signed-off-by: Harry Harpham <harry@tailscale.com>
This adds a new ControlKnob to make MagicDNS IPv6 registration
(telling systemd/etc) opt-out rather than opt-in.
Updates #15404
Change-Id: If008e1cb046b792c6aff7bb1d7c58638f7d650b1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
OnPolicyChange can observe a conn in activeConns before authentication
completes. The previous `c.info == nil` guard was itself a data race
against clientAuth writing c.info, and even when c.info appeared
non-nil, c.localUser could still be nil, causing a nil pointer
dereference at c.localUser.Username.
Add an authCompleted atomic.Bool to conn, stored true after all auth
fields are written in clientAuth. OnPolicyChange checks this atomic
instead of c.info, which provides the memory barrier guaranteeing all
prior writes are visible to the concurrent reader.
Updates tailscale/corp#36268 (fixes, but we might want to cherry-pick)
Co-authored-by: Gesa Stupperich <gesa@tailscale.com>
Change-Id: I4c69843541f5f9f04add9bf431e320c65a203a39
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The gocross-wrapper.sh bash script already checks TS_GO_NEXT (as of
a374cc344e) to select go.toolchain.next.rev over go.toolchain.rev,
but when TS_USE_GOCROSS=1 the Go binary itself was hardcoded to read
go.toolchain.rev. This makes gocross also respect the TS_GO_NEXT=1
environment variable.
Updates tailscale/corp#36382
Change-Id: I04bef25a34e7ed3ccb1bfdb33a3a1f896236c6ee
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In PR #18681, we started logging which exit nodes were being
suggested. However, we did not log if there were errors encountered.
This patch corrects this oversight.
Updates: tailscale/corp#29964
Updates: tailscale/corp#36446
Signed-off-by: Simon Law <sfllaw@tailscale.com>
This switches our gokrazy builds to use a new variant of cmd/gok called
opinionated about using monorepos: https://github.com/bradfitz/monogok
And with that, we can get rid of all the go.mod files and builddir forests
under gokrazy/**.
Updates #13038
Updates gokrazy/gokrazy#361
Change-Id: I9f18fbe59b8792286abc1e563d686ea9472c622d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
(*health.Tracker).CurrentState() returns an empty state when there are no client-side
warnables, even when there are control-health messages, which is incorrect.
This fixes it.
Updates tailscale/corp#37275
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Store packet filter rules in the cache. The match expressions are derived
from the filter rules, so these do not need to be stored explicitly, but ensure
they are properly reconstructed when the cache is read back.
Update the tests to include these fields, and provide representative values.
Updates #12639
Change-Id: I9bdb972a86d2c6387177d393ada1f54805a2448b
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
fixestailscale/corp#36708
Sets up a set of metrics to report watchdog timeouts for wgengine and
reports an event for any watchdog timeout.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
In the absence of a better mechanism, writing unqualified hostnames to the hosts file may be required
for MagicDNS to work on some Windows environments, such as domain-joined machines. It can also
improve MagicDNS performance on non-domain joined devices when we are not the device's primary
DNS resolver.
At the same time, updating the hosts file can be slow and expensive, especially when it already contains
many entries, as was previously reported in #14327. It may also have negative side effects, such as interfering
with the system's DNS resolution policies.
Additionally, to fix#18712, we had to extend hosts file usage to domain-joined machines when we are not
the primary DNS resolver. For the reasons above, this change may introduce risk.
To allow customers to disable hosts file updates remotely without disabling MagicDNS entirely, whether on
domain-joined machines or not, this PR introduces the `disable-hosts-file-updates` node attribute.
Updates #18712
Updates #14327
Signed-off-by: Nick Khyl <nickk@tailscale.com>
On domain-joined Windows devices the primary search domain (the one the device is joined to)
always takes precedence over other search domains. This breaks MagicDNS when we are the primary
resolver on the device (see #18712). To work around this Windows behavior, we should write MagicDNS
host names the hosts file just as we do when we're not the primary resolver.
This commit does exactly that.
Fixes#18712
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This commit adds a new custom resource definition to the kubernetes
operator named `ProxyGroupPolicy`. This resource is namespace scoped
and is used as an allow list for which `ProxyGroup` resources can be
used within its namespace.
The `spec` contains two fields, `ingress` and `egress`. These should
contain the names of `ProxyGroup` resources to denote which can be
used as values in the `tailscale.com/proxy-group` annotation within
`Service` and `Ingress` resources.
The intention is for these policies to be merged within a namespace and
produce a `ValidatingAdmissionPolicy` and `ValidatingAdmissionPolicyBinding`
for both ingress and egress that prevents users from using names of
`ProxyGroup` resources in those annotations.
Closes: https://github.com/tailscale/corp/issues/36829
Signed-off-by: David Bond <davidsbond93@gmail.com>
Adds a new track for release candidates. Supports querying by track in
version and updating to RCs in update for supported platforms.
updates #18193
Signed-off-by: Will Hannah <willh@tailscale.com>
Two methods were recently added to the testcontrol.Server type:
AddDNSRecords and SetGlobalAppCaps. These two methods should trigger
netmap updates for all nodes connected to the Server instance, the way
that other state-change methods do (see SetNodeCapMap, for example).
This will also allow us to get rid of Server.ForceNetmapUpdate, which
was a band-aid fix to force the netmap updates which should have been
triggered by the aforementioned methods.
Fixestailscale/corp#37102
Signed-off-by: Harry Harpham <harry@tailscale.com>
Instead of relying on the local timezone, which may cause
non-deterministic behavior in some CIs, we force timezone
to be UTC on default created clocks.
Fixes: tailscale/corp#37005
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
This updates the TS_GO_NEXT=1 (testing) toolchain to Go 1.26.0
The default one is still Go 1.25.x.
Updates #18682
Change-Id: I99747798c166ce162ee9eee74baa9ff6744a62f6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When traffic steering is enabled, some users are suggested an exit
node that is inappropriately far from their location. This seems to
happen right when the client connects to the control plane and the
client eventually fixes itself. But whenever an affected client
reconnects, its suggested exit node flaps, and this happens often
enough to be noticeable because connections drop whenever the exit
node is switched. This should not happen, since the map response that
contains the list of suggested exit nodes that the client picks from,
also contains the scores for those nodes.
Since our current logging and diagnostic tools don’t give us enough
insight into what is happening, this PR adds additional logging when:
- traffic steering scores are used to suggest an exit node
- an exit node is suggested, no matter how it was determined
Updates: tailscale/corp#29964
Updates: tailscale/corp#36446
Signed-off-by: Simon Law <sfllaw@tailscale.com>
Updates rotateLocked so that we hold the activeStderrWriteForTest write
lock around the dup2Stderr call, rather than acquiring it only after
dup2 was already compelete. This ensures no stderrWriteForTest calls
can race with the dup2 syscall. The now unused waitIdleStderrForTest has
been removed.
On macOS, dup2 and write on the same file descriptor are not atomic with
respect to each other, when rotateLocked called dup2Stderr to redirect
the stderr fd to a new file, concurrent goroutines calling
stderrWriteForTest could observe the fd in a transiently invalid state,
resulting in the bad file descripter.
Fixestailscale/corp#36953
Signed-off-by: James Scott <jim@tailscale.com>
Restore synchronous method calls from LocalBackend to magicsock.Conn
for node views, filter, and delta mutations. The eventbus delivery
introduced in 8e6f63cf1 was invalid for these updates because
subsequent operations in the same call chain depend on magicsock
already having the current state. The Synchronize/settleEventBus
workaround was fragile and kept requiring more workarounds and
introducing new mystery bugs.
Since eventbus was added, we've since learned more about when to use
eventbus, and this wasn't one of the cases.
We can take another swing at using eventbus for netmap changes in a
future change.
Fixes#16369
Updates #18575 (likely fixes)
Change-Id: I79057cc9259993368bb1e350ff0e073adf6b9a8f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
fixestailscale/tailscale#18436
Queries can still make their way to the forwarder when accept-dns is disabled.
Since we have not configured the forwarder if --accept-dns is false, this errors out
(correctly) but it also generates a persistent health warning. This forwards the
Pref setting all the way through the stack to the forwarder so that we can be more
judicious about when we decide that the forward path is unintentionally missing, vs
simply not configured.
Testing:
tailscale set --accept-dns=false. (or from the GUI)
dig @100.100.100.100 example.com
tailscale status
No dns related health warnings should be surfaced.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
-Wait does not just wait for the created process; it waits for the
entire process tree rooted at that process! This can cause the shell
to wait indefinitely if something in that tree fired up any background
processes.
Instead we call WaitForExit on the returned process.
Updates https://github.com/tailscale/corp/issues/29940
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
app connector packets
We introduce the Conn25PacketHooks interface to be used as a nil-able
field in userspaceEngine. The engine then plumbs through the functions
to the corresponding tstun.Wrapper intercepts.
The new intercepts run pre-filter when egressing toward WireGuard,
and post-filter when ingressing from WireGuard. This is preserve the
design invariant that the filter recognizes the traffic as interesting
app connector traffic.
This commit does not plumb through implementation of the interface, so
should be a functional no-op.
Fixestailscale/corp#35985
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
bart has gained a bunch of purported performance and usability
improvements since the current version we are using (0.18.0,
from 1y ago)
Updates tailscale/corp#36982
Signed-off-by: Amal Bansode <amal@tailscale.com>