When a recording upload fails mid-session, killProcessOnContextDone
writes the termination message to ss.Stderr() and kills the process.
Meanwhile, run() takes the ss.ctx.Done() path and proceeds to
ss.Exit(), which tears down the SSH channel. The termination message
write races with the channel teardown, so the client sometimes never
receives it.
Fix by adding an exitHandled channel that killProcessOnContextDone
closes when done. run() now waits on this channel after ctx.Done()
fires, ensuring the termination message is fully written before
the SSH channel is torn down.
Fixes#7707
Change-Id: Ib60116c928d3af46d553a4186a72963c2c731e3e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
After we intercept a DNS response and assign magic and transit addresses
we must communicate the assignment to our connector so that it can
direct traffic when it arrives.
Use the recently added peerapi endpoint to send the addresses.
Updates tailscale/corp#34258
Signed-off-by: Fran Bull <fran@tailscale.com>
This change reintroduces UserProfile.Groups, a slice that contains
the ACL-defined and synced groups that a user is a member of.
The slice will only be non-nil for clients with the node attribute
see-groups, and will only contain groups that the client is allowed
to see as per the app payload of the see-groups node attribute.
For example:
```
"nodeAttrs": [
{
"target": ["tag:dev"],
"app": {
"tailscale.com/see-groups": [{"groups": ["group:dev"]}]
}
},
[...]
]
```
UserProfile.Groups will also be gated by a feature flag for the time
being.
Updates tailscale/corp#31529
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
Users on FreeBSD run into a similar problem as has been reported for
Linux #11682 and fixed in #11682: because the tailscaled binaries
that we distribute are static and don't link cgo tailscaled fails to
fetch group IDs that are returned via NSS when spawning an ssh child
process.
This change extends the fallback on the 'id' command that was put in
place as part of #11682 to FreeBSD. More precisely, we try to fetch
the group IDs with the 'id' command first, and only if that fails do
we fall back on the logic in the os/user package.
Updates #14025
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
I omitted a lot of the min/max modernizers because they didn't
result in more clear code.
Some of it's older "for x := range 123".
Also: errors.AsType, any, fmt.Appendf, etc.
Updates #18682
Change-Id: I83a451577f33877f962766a5b65ce86f7696471c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The txRecords buffer had two compounding bugs that caused the
overflow guard to fire on every send tick under high DERP server load,
spamming logs at the full send rate (e.g. 100x/second).
First, int(packetTimeout.Seconds()) truncates fractional-second timeouts,
under-allocating the buffer. Second, the capacity was sized to exactly the
theoretical maximum number of in-flight records with no headroom,
and the expiry check used strict > rather than >=, so records at exactly
the timeout boundary were never evicted by applyTimeouts,
leaving len==cap on the very next tick.
Fixestailscale/corp#37696
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
This hook addition is motivated by the Connectors 2025 work, in which
NATed "Transit IPs" are used to route interesting traffic to the
appropriate peer, without advertising the actual real IPs.
It overlaps with #17858, and specifically with the WIP PR #17861.
If that work completes, this hook may be replaced by other ones
that fit the new WireGuard configuration paradigm.
Fixestailscale/corp#37146
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
This had gotten flaky with Go 1.26.
Use synctest + AllocsPerRun to make it fast and deterministic.
Updates #18682
Change-Id: If673d6ecd8c1177f59c1b9c0f3fca42309375dff
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Otherwise it gets confused on new(123) etc.
Updates #18682
Change-Id: I9e2e93ea24f2b952b2396dceaf094b4db64424b0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Fix its/it's, who's/whose, wether/whether, missing apostrophes
in contractions, and other misspellings across the codebase.
Updates #cleanup
Change-Id: I20453b81a7aceaa14ea2a551abba08a2e7f0a1d8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
considerable latency was seen when using k8s-proxy with ProxyGroup
in the kubernetes operator. Switching to L4 TCPForward solves this.
Fixes tailscale#18171
Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
OpenWrt is changing to using alpine like `apk` for package installation
over its previous opkg. Additionally, they are not using the same repo
files as alpine making installation fail.
Add support for the new repository files and ensure that the required
package detection system uses apk.
Updates #18535
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
This commit adds `--json` output mode to dns debug commands.
It defines structs for the data that is returned from:
`tailscale dns status` and `tailscale dns query <DOMAIN>` and
populates that as it runs the diagnostics.
When all the information is collected, it is serialised to JSON
or string built into an output and returned to the user.
The structs are defined and exported to golang consumers of this command
can use them for unmarshalling.
Updates #13326
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Remove the TS_EXPERIMENTAL_KUBE_API_EVENTS env var from the operator and its
helm chart. This has already been marked as deprecated, and has been
scheduled to be removed in release 1.96.
Add a check in helm chart to fail if the removed variable is set to true,
prompting users to move to ACLs instead.
Fixes: #18875
Signed-off-by: Becky Pauley <becky@tailscale.com>
Go 1.26's url.Parser is stricter and made our tests elsewhere fail
with this scheme because when these listen addresses get shoved
into a URL, it can't parse back out.
I verified this makes tests elsewhere pass with Go 1.26.
Updates #18682
Change-Id: I04dd3cee591aa85a9417a0bbae2b6f699d8302fa
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
runtime.NumCPU() returns the number of CPUs on the host, which in
containerized environments is the node's CPU count rather than the
container's CPU limit. This causes excessive memory allocation in
pods with low CPU requests running on large nodes, as each socket's
packetReadLoop allocates significant buffer memory.
Use runtime.GOMAXPROCS(0) instead, which is container-aware since
Go 1.25 and respects CPU limits set via cgroups.
Fixes#18774
Signed-off-by: Daniel Pañeda <daniel.paneda@clickhouse.com>
We use the TS_USE_CACHED_NETMAP knob to condition loading a cached netmap, but
were hitherto writing the map out to disk even when it was disabled. Let's not
do that; the two should travel together.
Updates #12639
Change-Id: Iee5aa828e2c59937d5b95093ea1ac26c9536721e
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
After fixing the flakey tests in #18811 and #18814 we can enable running
the natlab testsuite running on CI generally.
Fixes#18810
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
This is a minimal hacky fix for a case where the portlist poller extension
could miss updates to NetMap's CollectServices bool.
Updates tailscale/corp#36813
Change-Id: I9b50de8ba8b09e4a44f9fbfe90c9df4d8ab4d586
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
PR #18860 adds firewall rules in the mangle table to save outbound packet
marks to conntrack and restore them on reply packets before the routing
decision. When reply packets have their marks restored, the kernel uses
the correct routing table (based on the mark) and the packets pass the
rp_filter check.
This makes the risk check and reverse path filtering warnings unnecessary.
Updates #3310Fixestailscale/corp#37846
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
When a Linux system acts as an exit node or subnet router with strict
reverse path filtering (rp_filter=1), reply packets may
be dropped because they fail the RPF check. Reply packets arrive on the
WAN interface but the routing table indicates they should have arrived
on the Tailscale interface, causing the kernel to drop them.
This adds firewall rules in the mangle table to save outbound packet
marks to conntrack and restore them on reply packets before the routing
decision. When reply packets have their marks restored, the kernel uses
the correct routing table (based on the mark) and the packets pass the
rp_filter check.
Implementation adds two rules per address family (IPv4/IPv6):
- mangle/OUTPUT: Save packet marks to conntrack for NEW connections
with non-zero marks in the Tailscale fwmark range (0xff0000)
- mangle/PREROUTING: Restore marks from conntrack to packets for
ESTABLISHED,RELATED connections before routing decision and rp_filter
check
The workaround is automatically enabled when UseConnmarkForRPFilter is
set in the router configuration, which happens when subnet routes are
advertised on Linux systems.
Both iptables and nftables implementations are provided, with automatic
backend detection.
Fixes#3310Fixes#14409Fixes#12022Fixes#15815Fixes#9612
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
We should only add one entry to our magic ips for each domain+dst and
look up any existing entry instead of always creating a new one.
Fixestailscale/corp#34252
Signed-off-by: Fran Bull <fran@tailscale.com>
To be less spammy in stable, add a nob that disables the creation and
processing of TSMPDiscoKeyAdvertisements until we have a proper rollout
mechanism.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
The "public key moved" panic has caused confusion on multiple occasions,
and is a known issue for Mullvad. Add a loose heuristic to detect
Mullvad nodes, and trigger distinct panics for Mullvad and non-Mullvad
instances, with a link to the associated bug.
When this occurs again with Mullvad, it'll be easier for somebody to
find the existing bug.
If it occurs again with something other than Mullvad, it'll be more
obvious that it's a distinct issue.
Updates tailscale/corp#27300
Change-Id: Ie47271f45f2ff28f767578fcca5e6b21731d08a1
Signed-off-by: Alex Chan <alexc@tailscale.com>
Subtle floating point imprecision can propagate and lead to
trigonometric functions receiving inputs outside their
domain, thus returning NaN. Clamp the input to the valid domain
to prevent this.
Also adds a fuzz test for SphericalAngleTo.
Updates tailscale/corp#37518
Signed-off-by: Amal Bansode <amal@tailscale.com>
Some CI runner images now have cigocacher baked in. Skip building if
it's already present.
Updates tailscale/corp#35667
Change-Id: I5ea0d606d44b1373bc1c8f7bca4ab780e763e2a9
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Normalize 0.0.0.0 and :: to wildcard in resolveListenAddr so listeners
match incoming connections.
Fix ephemeral port allocation across all three modes: extract assigned
ports from gVisor listeners (TUN TCP and UDP), and add an ephemeral port
allocator for netstack TCP.
Updates #6815
Updates #12182Fixes#14042
Signed-off-by: James Tucker <jftucker@gmail.com>
I was confused when everything I was reading in the CI failure was
saying `go mod tidy`, but the thing that was actually failing was
related to nix flakes. Rename the pipeline and step name to the `make
tidy` that it actually runs.
Updates #16637
Signed-off-by: James Tucker <james@tailscale.com>
Server.Close held s.mu for the entire shutdown duration, including
netstack.Close (which waits for gVisor goroutines to exit) and
lb.Shutdown. gVisor callbacks like getTCPHandlerForFlow acquire s.mu via
listenerForDstAddr, so any in-flight gVisor goroutine attempting that
callback during stack shutdown would deadlock with Close.
Replace the mu-guarded closed bool with a sync.Once, and release s.mu
after closing listeners but before the heavy shutdown operations. Also
cancel shutdownCtx before netstack.Close so pending handlers observe
cancellation rather than contending on the lock.
Updates #18423
Signed-off-by: James Tucker <james@tailscale.com>
TestDial in particular sometimes gets stuck in CI for minutes, letting
chantun drop packets during shutdown avoids blocking shutdown.
Updates #18423
Signed-off-by: James Tucker <jftucker@gmail.com>
Windows and macOS are not covered by this change, as neither have safely
distinct names to make it easy to do so. This covers the requested case
on Linux.
Updates #18824
Signed-off-by: James Tucker <james@tailscale.com>
When a tsnet.Server dials its own Tailscale IP, TCP SYN packets are
silently dropped. In inject(), outbound packets with dst=self fail the
shouldSendToHost check and fall through to WireGuard, which has no peer
for the node's own address.
Fix this by detecting self-addressed packets in inject() using isLocalIP
and delivering them back into gVisor's network stack as inbound packets
via a new DeliverLoopback method on linkEndpoint. The outbound packet
must be re-serialized into a new PacketBuffer because outbound packets
have their headers parsed into separate views, but DeliverNetworkPacket
expects raw unparsed data.
Updates #18829
Signed-off-by: James Tucker <james@tailscale.com>
Using the new wait command from #18574 provide a tailscale-online.target
that has a similar usage model to the conventional
`network-online.target`.
Updates #3340
Updates #11504
Signed-off-by: James Tucker <james@tailscale.com>
Adds freedesktop as an option for installing autostart desktop files for
starting the systray application.
Fixes#18766
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
derpActiveFunc was being called immediately as a bare goroutine,
before startGate was resolved. For the firstDerp case, startGate
is c.derpStarted which only closes after dc.Connect() completes,
so derpActiveFunc was firing before the DERP connection existed.
We now block it with the same logic used by runDerpReader and by
runDerpWriter.
Updates: #18810
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
This makes Set.MarshalJSON produce deterministic output in many cases now.
We still need to do make it deterministic for non-ordered types.
Updates #18808
Change-Id: I7f341ec039c661a8e88d07d7f4dc0f15d5d4ab86
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>