The funnel command is sort of an alias for the serve command. This means
that the subcommands added to serve to support Services appear as
subcommands for funnel as well, despite having no meaning for funnel.
This change removes all such Services-specific subcommands from funnel.
Fixestailscale/corp#34167
Signed-off-by: Harry Harpham <harry@tailscale.com>
Ensure that hardware attestation keys are not added to tailscaled
state stores that are Kubernetes Secrets or AWS SSM as those Tailscale
devices should be able to be recreated on different nodes, for example,
when moving Pods between nodes.
Updates tailscale/tailscale#18302
Signed-off-by: Irbe Krumina <irbekrm@gmail.com>
TPM-based features have been incredibly painful due to the heterogeneous
devices in the wild, and many situations in which the TPM "changes" (is
reset or replaced). All of this leads to a lot of customer issues.
We hoped to iron out all the kinks and get all users to benefit from
state encryption and hardware attestation without manually opting in,
but the long tail of kinks is just too long.
This change disables TPM-based features on Windows and Linux by default.
Node state should get auto-decrypted on update, and old attestation keys
will be removed.
There's also tailscaled-on-macOS, but it won't have a TPM or Keychain
bindings anyway.
Updates #18302
Updates #15830
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Soft-fail on initial unmarshal and try again, ignoring the
AttestationKey. This helps in cases where something about the
attestation key storage (usually a TPM) is messed up. The old key will
be lost, but at least the node can start again.
Updates #18302
Updates #15830
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
Send LOGIN audit messages to the kernel audit subsystem on Linux
when users successfully authenticate to Tailscale SSH. This provides
administrators with audit trail integration via auditd or journald,
recording details about both the Tailscale user (whois) and the
mapped local user account.
The implementation uses raw netlink sockets to send AUDIT_USER_LOGIN
messages to the kernel audit subsystem. It requires CAP_AUDIT_WRITE
capability, which is checked at runtime. If the capability is not
present, audit logging is silently skipped.
Audit messages are sent to the kernel (pid 0) and consumed by either
auditd (written to /var/log/audit/audit.log) or journald (available
via journalctl _TRANSPORT=audit), depending on system configuration.
Note: This may result in duplicate messages on a system where
auditd/journald audit logs are enabled and the system has and supports
`login -h`. Sadly Linux login code paths are still an inconsistent wild
west so we accept the potential duplication rather than trying to avoid
it.
Fixes#18332
Signed-off-by: James Tucker <james@tailscale.com>
GCP Certificate Manager requires an email contact on ACME accounts.
Add --acme-email flag that is required for --certmode=gcp and
optional for --certmode=letsencrypt.
Fixes#18277
Signed-off-by: Raj Singh <raj@tailscale.com>
An error returned by net.Listener.Accept() causes the owning http.Server to shut down.
With the deprecation of net.Error.Temporary(), there's no way for the http.Server to test
whether the returned error is temporary / retryable or not (see golang/go#66252).
Because of that, errors returned by (*safesocket.winIOPipeListener).Accept() cause the LocalAPI
server (aka ipnserver.Server) to shut down, and tailscaled process to exit.
While this might be acceptable in the case of non-recoverable errors, such as programmer errors,
we shouldn't shut down the entire tailscaled process for client- or connection-specific errors,
such as when we couldn't obtain the client's access token because the client attempts to connect
at the Anonymous impersonation level. Instead, the LocalAPI server should gracefully handle
these errors by denying access and returning a 401 Unauthorized to the client.
In tailscale/tscert#15, we fixed a known bug where Caddy and other apps using tscert would attempt
to connect at the Anonymous impersonation level and fail. However, we should also fix this on the tailscaled
side to prevent a potential DoS, where a local app could deliberately open the Tailscale LocalAPI named pipe
at the Anonymous impersonation level and cause tailscaled to exit.
In this PR, we defer token retrieval until (*WindowsClientConn).Token() is called and propagate the returned token
or error via ipnauth.GetConnIdentity() to ipnserver, which handles it the same way as other ipnauth-related errors.
Fixes#18212Fixestailscale/tscert#13
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This gauge will be reworked to include endpoint state in future.
Updates tailscale/corp#30820
Change-Id: I66f349d89422b46eec4ecbaf1a99ad656c7301f9
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
In dynamically changing environments where ACME account keys and certs
are stored separately, it can happen that the account key would get
deleted (and recreated) between issuances. If that is the case,
we currently fail renewals and the only way to recover is for users
to delete certs.
This adds a config knob to allow opting out of the replaces extension
and utilizes it in the Kubernetes operator where there are known
user workflows that could end up with this edge case.
Updates #18251
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Adding both user and client metrics for peer relay forwarded bytes and
packets, and the total endpoints gauge.
User metrics:
tailscaled_peer_relay_forwarded_packets_total{transport_in, transport_out}
tailscaled_peer_relay_forwarded_bytes_total{transport_in, transport_out}
tailscaled_peer_relay_endpoints_total{}
Where the transport labels can be of "udp4" or "udp6".
Client metrics:
udprelay_forwarded_(packets|bytes)_udp(4|6)_udp(4|6)
udprelay_endpoints
RELNOTE: Expose tailscaled metrics for peer relay.
Updates tailscale/corp#30820
Change-Id: I1a905d15bdc5ee84e28017e0b93210e2d9660259
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
Adds support for targeting FQDNs that are a Tailscale Service. Uses the
same method of searching for Services as the tailscale configure
kubeconfig command. This fixes using the tailscale.com/tailnet-fqdn
annotation for Kubernetes Service when the specified FQDN is a Tailscale
Service.
Fixes#16534
Change-Id: I422795de76dc83ae30e7e757bc4fbd8eec21cc64
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Signed-off-by: Becky Pauley <becky@tailscale.com>
IsZero is required by the interface, so we should use that before trying
to serialize the key.
Updates #35412
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
When the TS_DEBUG_DNS_FORWARD_SEND envknob is turned on, also log the
source IP:port of the query that tailscaled is forwarding.
Updates tailscale/corp#35374
Signed-off-by: Andrew Dunham <andrew@tailscale.com>
updates tailscale/corp#33891
Addresses several older the TODO's in netmon. This removes the
Major flag precomputes the ChangeDelta state, rather than making
consumers of ChangeDeltas sort that out themselves. We're also seeing
a lot of ChangeDelta's being flagged as "Major" when they are
not interesting, triggering rebinds in wgengine that are not needed. This
cleans that up and adds a host of additional tests.
The dependencies are cleaned, notably removing dependency on netmon
itself for calculating what is interesting, and what is not. This includes letting
individual platforms set a bespoke global "IsInterestingInterface"
function. This is only used on Darwin.
RebindRequired now roughly follows how "Major" was historically
calculated but includes some additional checks for various
uninteresting events such as changes in interface addresses that
shouldn't trigger a rebind. This significantly reduces thrashing (by
roughly half on Darwin clients which switching between nics). The individual
values that we roll into RebindRequired are also exposed so that
components consuming netmap.ChangeDelta can ask more
targeted questions.
Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
The existing client metric methods only support incrementing (or
decrementing) a delta value. This new method allows setting the metric
to a specific value.
Updates tailscale/corp#35327
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
This commit also introduces a sync.Mutex for guarding mutatable fields
on serverEndpoint, now that it is no longer guarded by the sync.Mutex
in Server.
These changes reduce lock contention and by effect increase aggregate
throughput under high flow count load. A benchmark on Linux with AWS
c8gn instances showed a ~30% increase in aggregate throughput (37Gb/s
vs 28Gb/s) for 12 tailscaled flows.
Updates tailscale/corp#35264
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Add flags:
* --cigocached-host to support alternative host resolution in other
environments, like the corp repo.
* --stats to reduce the amount of bash script we need.
* --version to support a caching tool/cigocacher script that will
download from GitHub releases.
Updates tailscale/corp#10808
Change-Id: Ib2447bc5f79058669a70f2c49cef6aedd7afc049
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
tcpHandlerForVIPService was missing ProxyProtocol support that
tcpHandlerForServe already had. Extract the shared logic into
forwardTCPWithProxyProtocol helper and use it in both handlers.
Fixes#18172
Signed-off-by: Raj Singh <raj@tailscale.com>
Add metrics about logtail uploading and underlying buffer.
Add metrics to the in-memory buffer implementation.
Updates tailscale/corp#21363
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
PR #18033 skipped tests for the versions of Linux 6.6 and 6.12 that
had a regression in /proc/net/tcp that causes seek operations to fail
with “illegal seek”.
This PR skips tests for Linux 6.14.0, which is the default Ubuntu
kernel, that also contains this regression.
Updates #16966
Signed-off-by: Simon Law <sfllaw@tailscale.com>
The filch implementation is fairly broken:
* When Filch.cur exceeds MaxFileSize, it calls moveContents
to copy the entirety of cur into alt (while holding the write lock).
By nature, this is the movement of a lot of data in a hot path,
meaning that all log calls will be globally blocked!
It also means that log uploads will be blocked during the move.
* The implementation of moveContents is buggy in that
it copies data from cur into the start of alt,
but fails to truncate alt to the number of bytes copied.
Consequently, there are unrelated lines near the end,
leading to out-of-order lines when being read back.
* Data filched via stderr do not directly respect MaxFileSize,
which is only checked every 100 Filch.Write calls.
This means that it is possible that the file grows far beyond
the specified max file size before moveContents is called.
* If both log files have data when New is called,
it also copies the entirety of cur into alt.
This can block the startup of a process copying lots of data
before the process can do any useful work.
* TryReadLine is implemented using bufio.Scanner.
Unfortunately, it will choke on any lines longer than
bufio.MaxScanTokenSize, rather than gracefully skip over them.
The re-implementation avoids a lot of these problems
by fundamentally eliminating the need for moveContent.
We enforce MaxFileSize by simply rotating the log files
whenever the current file exceeds MaxFileSize/2.
This is a constant-time operation regardless of file size.
To more gracefully handle lines longer than bufio.MaxScanTokenSize,
we skip over these lines (without growing the read buffer)
and report an error. This allows subsequent lines to be read.
In order to improve debugging, we add a lot of metrics.
Note that the the mechanism of dup2 with stderr
is inherently racy with a the two file approach.
The order of operations during a rotation is carefully chosen
to reduce the race window to be as short as possible.
Thus, this is slightly less racy than before.
Updates tailscale/corp#21363
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Re-instate the linking of iptables installed in Tailscale container
to the legacy iptables version. In environments where the legacy
iptables is not needed, we should be able to run nftables instead,
but this will ensure that Tailscale keeps working in environments
that don't support nftables, such as some Synology NAS hosts.
Updates #17854
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
Add --certmode=gcp for using Google Cloud Certificate Manager's
public CA instead of Let's Encrypt. GCP requires External Account
Binding (EAB) credentials for ACME registration, so this adds
--acme-eab-kid and --acme-eab-key flags.
The EAB key accepts both base64url and standard base64 encoding
to support both ACME spec format and gcloud output.
Fixestailscale/corp#34881
Signed-off-by: Raj Singh <raj@tailscale.com>
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When using the resolve.conf file for setting DNS, it is possible that
some other services will trample the file and overwrite our set DNS
server. Experiments has shown this to be a racy error depending on how
quickly processes start.
Make an attempt to trample back the file a limited number of times if
the file is changed.
Updates #16635
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
When peers request an IP address mapping to be stored, the connector
stores it in memory.
Fixestailscale/corp#34251
Signed-off-by: Fran Bull <fran@tailscale.com>
To save rebuilding cigocacher on each CI job, build it on-demand, and
publish a release similar to how we publish releases for tool/go to
consume. Once the first release is done, we can add a new
tool/cigocacher script that pins to a specific release for each branch
to download.
Updates tailscale/corp#10808
Change-Id: I7694b2c2240020ba2335eb467522cdd029469b6c
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
It appears (*controlclient.Auto).Shutdown() can still deadlock when called with b.mu held, and therefore the changes in #18127 are unsafe.
This reverts #18127 until we figure out what causes it.
This reverts commit d199ecac80.
Signed-off-by: Nick Khyl <nickk@tailscale.com>
This improves our test coverage of the Bootstrap() method, especially
around catching AUMs that shouldn't pass validation.
Updates #cleanup
Change-Id: Idc61fcbc6daaa98c36d20ec61e45ce48771b85de
Signed-off-by: Alex Chan <alexc@tailscale.com>
Previously, if users attempted to expose a headless Service to tailnet,
this just silently did not work.
This PR makes the operator throw a warning event + update Service's
status with an error message.
Updates #18139
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
The event queue gets deleted events, which means that sometimes
the object that should be reconciled no longer exists.
Don't log user facing errors if that is the case.
Updates #18141
Signed-off-by: Irbe Krumina <irbe@tailscale.com>
The service was starting after systemd itself, and while this
surprisingly worked for some situations, it broke for others.
Change it to start after a GUI has been initialized.
Updates #17656
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Previously we only set this when it updated, which was fine for the first
call to Start(), but after that point future updates would be skipped if
nothing had changed. If Start() was called again, it would wipe the peer API
endpoints and they wouldn't get added back again, breaking exit nodes (and
anything else requiring peer API to be advertised).
Updates tailscale/corp#27173
Signed-off-by: James Sanderson <jsanderson@tailscale.com>