fixestailscale/corp#36708
Sets up a set of metrics to report watchdog timeouts for wgengine and
reports an event for any watchdog timeout.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
In the absence of a better mechanism, writing unqualified hostnames to the hosts file may be required
for MagicDNS to work on some Windows environments, such as domain-joined machines. It can also
improve MagicDNS performance on non-domain joined devices when we are not the device's primary
DNS resolver.
At the same time, updating the hosts file can be slow and expensive, especially when it already contains
many entries, as was previously reported in #14327. It may also have negative side effects, such as interfering
with the system's DNS resolution policies.
Additionally, to fix#18712, we had to extend hosts file usage to domain-joined machines when we are not
the primary DNS resolver. For the reasons above, this change may introduce risk.
To allow customers to disable hosts file updates remotely without disabling MagicDNS entirely, whether on
domain-joined machines or not, this PR introduces the `disable-hosts-file-updates` node attribute.
Updates #18712
Updates #14327
Signed-off-by: Nick Khyl <nickk@tailscale.com>
On domain-joined Windows devices the primary search domain (the one the device is joined to)
always takes precedence over other search domains. This breaks MagicDNS when we are the primary
resolver on the device (see #18712). To work around this Windows behavior, we should write MagicDNS
host names the hosts file just as we do when we're not the primary resolver.
This commit does exactly that.
Fixes#18712
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This commit adds a new custom resource definition to the kubernetes
operator named `ProxyGroupPolicy`. This resource is namespace scoped
and is used as an allow list for which `ProxyGroup` resources can be
used within its namespace.
The `spec` contains two fields, `ingress` and `egress`. These should
contain the names of `ProxyGroup` resources to denote which can be
used as values in the `tailscale.com/proxy-group` annotation within
`Service` and `Ingress` resources.
The intention is for these policies to be merged within a namespace and
produce a `ValidatingAdmissionPolicy` and `ValidatingAdmissionPolicyBinding`
for both ingress and egress that prevents users from using names of
`ProxyGroup` resources in those annotations.
Closes: https://github.com/tailscale/corp/issues/36829
Signed-off-by: David Bond <davidsbond93@gmail.com>
Adds a new track for release candidates. Supports querying by track in
version and updating to RCs in update for supported platforms.
updates #18193
Signed-off-by: Will Hannah <willh@tailscale.com>
Two methods were recently added to the testcontrol.Server type:
AddDNSRecords and SetGlobalAppCaps. These two methods should trigger
netmap updates for all nodes connected to the Server instance, the way
that other state-change methods do (see SetNodeCapMap, for example).
This will also allow us to get rid of Server.ForceNetmapUpdate, which
was a band-aid fix to force the netmap updates which should have been
triggered by the aforementioned methods.
Fixestailscale/corp#37102
Signed-off-by: Harry Harpham <harry@tailscale.com>
Instead of relying on the local timezone, which may cause
non-deterministic behavior in some CIs, we force timezone
to be UTC on default created clocks.
Fixes: tailscale/corp#37005
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
This updates the TS_GO_NEXT=1 (testing) toolchain to Go 1.26.0
The default one is still Go 1.25.x.
Updates #18682
Change-Id: I99747798c166ce162ee9eee74baa9ff6744a62f6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When traffic steering is enabled, some users are suggested an exit
node that is inappropriately far from their location. This seems to
happen right when the client connects to the control plane and the
client eventually fixes itself. But whenever an affected client
reconnects, its suggested exit node flaps, and this happens often
enough to be noticeable because connections drop whenever the exit
node is switched. This should not happen, since the map response that
contains the list of suggested exit nodes that the client picks from,
also contains the scores for those nodes.
Since our current logging and diagnostic tools don’t give us enough
insight into what is happening, this PR adds additional logging when:
- traffic steering scores are used to suggest an exit node
- an exit node is suggested, no matter how it was determined
Updates: tailscale/corp#29964
Updates: tailscale/corp#36446
Signed-off-by: Simon Law <sfllaw@tailscale.com>
Updates rotateLocked so that we hold the activeStderrWriteForTest write
lock around the dup2Stderr call, rather than acquiring it only after
dup2 was already compelete. This ensures no stderrWriteForTest calls
can race with the dup2 syscall. The now unused waitIdleStderrForTest has
been removed.
On macOS, dup2 and write on the same file descriptor are not atomic with
respect to each other, when rotateLocked called dup2Stderr to redirect
the stderr fd to a new file, concurrent goroutines calling
stderrWriteForTest could observe the fd in a transiently invalid state,
resulting in the bad file descripter.
Fixestailscale/corp#36953
Signed-off-by: James Scott <jim@tailscale.com>
Restore synchronous method calls from LocalBackend to magicsock.Conn
for node views, filter, and delta mutations. The eventbus delivery
introduced in 8e6f63cf1 was invalid for these updates because
subsequent operations in the same call chain depend on magicsock
already having the current state. The Synchronize/settleEventBus
workaround was fragile and kept requiring more workarounds and
introducing new mystery bugs.
Since eventbus was added, we've since learned more about when to use
eventbus, and this wasn't one of the cases.
We can take another swing at using eventbus for netmap changes in a
future change.
Fixes#16369
Updates #18575 (likely fixes)
Change-Id: I79057cc9259993368bb1e350ff0e073adf6b9a8f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
fixestailscale/tailscale#18436
Queries can still make their way to the forwarder when accept-dns is disabled.
Since we have not configured the forwarder if --accept-dns is false, this errors out
(correctly) but it also generates a persistent health warning. This forwards the
Pref setting all the way through the stack to the forwarder so that we can be more
judicious about when we decide that the forward path is unintentionally missing, vs
simply not configured.
Testing:
tailscale set --accept-dns=false. (or from the GUI)
dig @100.100.100.100 example.com
tailscale status
No dns related health warnings should be surfaced.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
-Wait does not just wait for the created process; it waits for the
entire process tree rooted at that process! This can cause the shell
to wait indefinitely if something in that tree fired up any background
processes.
Instead we call WaitForExit on the returned process.
Updates https://github.com/tailscale/corp/issues/29940
Signed-off-by: Aaron Klotz <aaron@tailscale.com>
app connector packets
We introduce the Conn25PacketHooks interface to be used as a nil-able
field in userspaceEngine. The engine then plumbs through the functions
to the corresponding tstun.Wrapper intercepts.
The new intercepts run pre-filter when egressing toward WireGuard,
and post-filter when ingressing from WireGuard. This is preserve the
design invariant that the filter recognizes the traffic as interesting
app connector traffic.
This commit does not plumb through implementation of the interface, so
should be a functional no-op.
Fixestailscale/corp#35985
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
bart has gained a bunch of purported performance and usability
improvements since the current version we are using (0.18.0,
from 1y ago)
Updates tailscale/corp#36982
Signed-off-by: Amal Bansode <amal@tailscale.com>
This updates the URL shown by systemd to the new URL used by the docs
after the recent migration.
Fixes#18646
Signed-off-by: Tim Walters <tim@tailscale.com>
Add new "webbrowser" and "colorable" feature tags so that the
github.com/toqueteos/webbrowser and mattn/go-colorable packages
can be excluded from minbox builds.
Updates #12614
Change-Id: Iabd38b242f5a56aa10ef2050113785283f4e1fe8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This commit adds a bool named PeerRelay to Hostinfo, to identify the host's status of acting as a peer relay.
Considering the RelayServerPort number can be 0, I just made this a bool in stead of a port number. If the port
info is needed in future this would also help indicating if the port was set to 0 (meaning any port in peer relay
context).
Updates tailscale/corp#35862
Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
Under extremely high load it appears we may have some retention issues
as a result of queue depth build up, but there is currently no direct
way to observe this. The scenario does not trigger the slow subscriber
log message, and the event stream debugging endpoint produces a
saturating volume of information.
Updates tailscale/corp#36904
Signed-off-by: James Tucker <james@tailscale.com>
Currently the expvar exporter attempts to write expvar.String, which
breaks the Prometheus metric page.
Updates tailscale/corp#36552
Signed-off-by: Anton Tolchanov <anton@tailscale.com>
concurrent netmaps that if the first is logged in, it is never skipped.
This should have been covered be the skip test case, but that case
wasn't updated to include level set state.
Updates #12639
Updates #17869
Signed-off-by: James Tucker <james@tailscale.com>
If any profiles exist and an Authkey is provided via syspolicy, the
AuthKey is ignored on backend start, preventing re-auth attempts. This
is useful for one-time device provisioning scenarios, skipping authKey
use after initial setup when the authKey may no longer be valid.
updates #18618
Signed-off-by: Will Hannah <willh@tailscale.com>
Use the parsed and validated advertise tags value from prefs instead of
doing a strings.Split on the raw tags value as an input to the OAuth and
identity federation auth key generation methods.
The previous strings.Split method would return an array with a single
empty string element which would pass downstream length checks on the
tags argument before eventually failing with a confusing message when
hitting the API.
Fixes https://github.com/tailscale/tailscale/issues/18617
Signed-off-by: Mario Minardi <mario@tailscale.com>
Package feature/conn25 is excludeable from a build via the featuretag.
Test it is excluded for minimal builds.
Updates #12614
Signed-off-by: Fran Bull <fran@tailscale.com>
We already had a featuretag for clientupdate, but the CLI wasn't using
it, making the "minbox" build (minimal combined tailscaled + CLI
build) larger than necessary.
Updates #12614
Change-Id: Idd7546c67dece7078f25b8f2ae9886f58d599002
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This resolves a gap in test coverage, ensuring Server.ListenService
functions as expected in combination with user-supplied TUN devices
Fixestailscale/corp#36603
Co-authored-by: Harry Harpham <harry@tailscale.com>
Signed-off-by: Harry Harpham <harry@tailscale.com>
When the NodeAttrDNSSubdomainResolve capability is present, enable
wildcard certificate issuance to cover all single-level subdomains
of a node's CertDomain.
Without the capability, only exact CertDomain matches are allowed,
so node.ts.net yields a cert for node.ts.net. With the capability,
we now generate wildcard certificates. Wildcard certs include both
the wildcard and base domain in their SANs, and ACME authorization
requests both identifiers. The cert filenames are kept still based
on the base domain with the wildcard prefix stripped, so we aren't
creating separate files. DNS challenges still used the base domain
The checkCertDomain function is replaced by resolveCertDomain that
both validates and returns the appropriate cert domain to request.
Name validation is now moved earlier into GetCertPEMWithValidity()
Fixes#1196
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Not all Linux distros use systemd yet, for example GL.iNet KVM devices
use busybox's init, which is similar to SysV init.
This is a best-effort restart attempt after the update, it probably
won't cover 100% of init.d setups out there.
Fixes#18567
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Found by @cmol. When rewriting the same value into the cache, we were dropping
the unchanged keys, resulting in the cache being pruned incorrectly.
Also update the tests to catch this.
Updates #12639
Change-Id: Iab67e444eb7ddc22ccc680baa2f6a741a00eb325
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
This commit fixes an issue within containerboot that arose from the
kubernetes operator. When users enable metrics on custom resources that
are running on dual stack or ipv6 only clusters, they end up with an error
as we pass the hostport combintation using $(POD_IP):PORT.
In go, `netip.ParseAddrPort` expects square brackets `[]` to wrap the host
portion of an ipv6 address and would naturally, crash.
When loading the containerboot configuration from the environment we now
check if the `TS_LOCAL_ADDR_PORT` value contains the pod's v6 ip address.
If it does & does not already contain brackets, we add the brackets in.
Closes: #15762Closes: #15467
Signed-off-by: David Bond <davidsbond93@gmail.com>
This provides a mechanism to block, waiting for Tailscale's IP to be
ready for a bind/listen, to gate the starting of other services.
It also adds a new --assert=[IP] option to "tailscale ip", for services
that want extra paranoia about what IP is in use, if they're worried about
having switched to the wrong tailnet prior to reboot or something.
Updates #3340
Updates #11504
... and many more, IIRC
Change-Id: I88ab19ac5fae58fd8c516065bab685e292395565
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The forwarder was not setting the Truncated (TC) flag when UDP DNS
responses exceeded either the EDNS buffer size (if present) or the
RFC 1035 default 512-byte limit. This affected DoH, TCP fallback,
and UDP response paths.
The fix ensures checkResponseSizeAndSetTC is called in all code paths
that return UDP responses, enforcing both EDNS and default UDP size
limits.
Added comprehensive unit tests and consolidated duplicate test helpers.
Updates #18107
Signed-off-by: Brendan Creane <bcreane@gmail.com>