tailscale

mirror of https://github.com/tailscale/tailscale.git synced 2026-05-31 20:19:46 -04:00

Author	SHA1	Message	Date
M. J. Fromberger	e8be3b7989	wgengine/magicsock,types/logger: add latency logs for initial peer contacts In order to allow us to measure the performance effects of client-side netmap caching, both with and without the feature enabled, add logs to record how long it takes after a client restart or profile switch for the node to establish contact with peers, relative to the first uncached netmap. We do this by keeping track of a timestamp when the connection is constructed, and logging a record for "new" peer contacts that records how long (in microseconds) it took from the time the peer was recorded as a candidate. The message includes whether the contact was via DERP or direct, and whether a cached netmap was in use at the time. This builds on and extends the counters from #19699, but here we include new contacts whether or not a cached netmap is in use, so that we can establish a baseline for comparison. Updates #12639 Updates tailscale/projects#27 Change-Id: I4f6d050e221f3881848d05a0425c4a5d1a59294c Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>	2026-05-29 14:38:06 -07:00
Brad Fitzpatrick	412c812d76	ipn/ipnlocal: use ACME ALPN for authorized Funnel non-CertDomain domains If a user explicitly adds a non-ts.net (not a CertDomain domain) domain like "foo.com" to their serve config as a web target that's also an allowed funnel domain (using raw "tailscale serve set-config"), then use the new ALPN cert fetching (from `b553969b`) to get certs for that domain. This is just plumbing; there's no new product functionality to actually enable this easily client-side, and it also has no visible product surface to enable it server-side. Updates tailscale/corp#41736 Change-Id: Ie2e421ac9611bce64bba3de6a454b2d505ea0e8a Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-28 13:33:45 -07:00
Tom Proctor	788a49eca5	.github/workflows: run vet on GitHub-hosted runners (#19913 ) The github-ci-vm machine that runs our self-hosted CI for this repo is only designed for the `vm` job in test.yml. That uses a different cache dir which is causing github-ci-vm's small disk to fill up. Switch to ubuntu 24.04 like the rest of our CI for this repo that doesn't require anything special. Updates tailscale/corp#40465 Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>	2026-05-28 21:30:46 +01:00
James Tucker	524a374f01	tsnet: wait for peer in netmap before pinging in setupTwoClientTest If we dispatch a ping too early (after a later patch removes a 250ms blockage) then the ping may be lost due to the peers not yet knowing about each other. The ping is retained in order to setup and ensure a wireguard session prior to test flow. Updates #19822 Change-Id: I6cfea28931646a9387b6ffc2654e72cd846f4e55 Signed-off-by: James Tucker <james@tailscale.com> Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-28 11:27:54 -07:00
Brad Fitzpatrick	c086992f4f	cmd/tailscale/cli: add whoami subcommand Add a "tailscale whoami" subcommand that is equivalent to running "tailscale whois $(tailscale ip -4)" but more ergonomic. It supports the --json flag just like whois, and shares the WhoIsResponse rendering code with whois. Fixes #19907 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I8f33ba7a5608bab7dffa8213303beb5f345936d3	2026-05-28 10:49:17 -07:00
Alex Chan	9d126aec34	all: remove network lock references from private method names Updates tailscale/corp#37904 Change-Id: I312d46d958209ca3d1152d1877fb91a57c91798d Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-28 18:00:36 +01:00
Brendan Creane	8d90a6ab1e	ipn/ipnlocal: add HTTP/2 Content-Type tests for serve reverse proxy (#19905 ) Adds two tests exercising the HTTP/2-inbound -> plaintext HTTP/1.1 backend path through serve's reverseProxy and through the full serveWebHandler entry point (with a funnel serveHTTPContext). Updates #19866 Signed-off-by: Brendan Creane <bcreane@gmail.com>	2026-05-28 09:46:36 -07:00
Alex Chan	f4a280cdbd	all: update a few more references to network/tailnet lock Updates tailscale/corp#37904 Change-Id: I746b06328e080fa2b9ff28a2d099f95645aa3d0b Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-28 16:44:16 +01:00
Alex Chan	446ae97491	ipn: improve --exit-node hostname error during startup When parsing the `tailscale up --exit-node=ARG` argument, we try to resolve hostnames by searching the list of peers. However, at startup, the peer list is empty, causing hostname lookups to trivially fail with an unhelpful "invalid value" erorr. Improve the error message when the peer list is empty to inform the user that hostnames cannot be resolved during startup, and advise them to use the exit node's Tailscale IP address instead. Also, clarify that hostnames must be peer hostnames, not arbitrary hostnames. Fixes #19882 Change-Id: I9390a427c2863d657cf46c5e33b43cb3c5363764 Signed-off-by: Alex Chan <alexc@tailscale.com>	2026-05-28 16:43:45 +01:00
dragondscv	4b8115bb2c	cmd/containerboot: clamp MSS to PMTU for proxy group pods (#19686 ) Single-pod ingress/egress proxies already called ClampMSSToPMTU when setting up forwarding rules, but the proxy group (HA) code paths in egressservices.go and ingressservices.go did not. This caused TCP connections through proxy group pods to suffer from MSS/MTU mismatch issues in environments where path MTU discovery is not working. Add ClampMSSToPMTU calls in the egress sync loop (alongside the existing EnsureSNATForDst call) and in addDNATRuleForSvc (alongside the existing EnsureDNATRuleForSvc call), mirroring what the single-pod forwarding rules already do. Also add MSS clamping assertions to TestSyncIngressConfigs and track ClampMSSToPMTU calls in FakeNetfilterRunner. Fixes issue #19812 https://github.com/tailscale/tailscale/issues/19812. Tracking internal ticket TSS-86326. Signed-off-by: Jay Tung <ltung@crusoeenergy.com> Co-authored-by: Jay Tung <ltung@crusoeenergy.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 12:57:38 +01:00
Brad Fitzpatrick	782c73bf41	cmd/containerboot: fix data race in TestContainerBoot Parallel subtests share ipn.Notify pointers (e.g. runningNotify). When multiple subtests reached the same phase concurrently, they all wrote to the shared notify's InitialStatus field without synchronization, triggering the race detector. Fix by shallow-copying ipn.Notify before setting InitialStatus, so each test iteration works on its own copy. Updates #19380 Change-Id: I9dd40037e02146166f006f4f7c1ddcc47adba191 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 18:40:03 -07:00
James Tucker	25b8ed8d9e	control/controlknobs,net/{batching,tstun},wgengine: add nodecaps to disable UDP & TUN GRO/GSO Add four control-plane node attributes that let us disable UDP GSO/GRO on the magicsock UDP socket and UDP/TCP GRO on the Tailscale TUN device. These complement the pre-existing TS_DEBUG_DISABLE_UDP_{GRO,GSO} and TS_TUN_DISABLE_{UDP,TCP}_GRO envknobs. They exist so we can mitigate upstream Linux kernel regressions on a deployed fleet without requiring a client release, after two incidents (#13041, #19777) where buggy kernel patches landed upstream and the fix took an excessively long time to reach downstream distros. Knob changes are reacted to in setNetworkMapInternal / SetNetworkMap via a comparison against a cached "last applied" value and only an actual transition triggers work: magicsock Rebind()+ReSTUN for UDP, ApplyGROKnobs for TUN. The TUN side is gated by buildfeatures.HasGRO and is one-way (wireguard-go GRO disablement is sticky); re-enabling requires a client restart. Updates #13041 Updates #19777 Change-Id: I802993070afa659cc06809bb0bfbb7f8a0cdb273 Signed-off-by: James Tucker <james@tailscale.com>	2026-05-27 17:10:14 -07:00
Brad Fitzpatrick	94af1b00fb	cmd/testwrapper, tstest: move test sharding out of test code Previously, sharding required tests to opt in by calling tstest.Shard, which used a process-global counter to assign each test to a shard. This had two problems: most tests didn't call it, so they ran on every shard (defeating the purpose), and shard assignments were unstable (depended on call order, so adding a test could reshuffle others). Remove tstest.Shard and tstest.SkipOnUnshardedCI entirely. Instead, have testwrapper implement sharding automatically for all tests: when TS_TEST_SHARD=N/M is set, it uses "go list -json" (no compilation) to find test source files, scans them for top-level Test/Benchmark/ Example/Fuzz function names, and filters by fnv32a(name) % M == N-1. The filtered names are passed as an anchored -run regex to go test. Using go list instead of "go test -list" avoids linking the test binary twice (Go's build cache does not cache test binary linking). Fixes #19886 Change-Id: I62ab7b3d757324d4c5fd0b5de50c1e3742681791 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 16:53:17 -07:00
James Scott	db60aa8eca	logtail: gate "logtail started" behind TS_DEBUG_LOGTAIL envknob (#19891 ) Gates the unnecessary "logtail started" message behind the debug envknob TS_DEBUG_LOGTAIL. This is extra log spam that isn't needed unless we are debugging. Updates tailscale/corp#40908 Signed-off-by: James Scott <jim@tailscale.com>	2026-05-27 15:48:44 -07:00
kari-ts	1a17ec1988	net/netmon: in Android, replace system/bin/ip call with cached LinkProperties gateway (#19804 ) bind() on NETLINK_ROUTE sockets does not work on Android 11+ (https://developer.android.com/identity/user-data-ids#mac-11-plus) . Since system/bin/ip uses bind(), likelyHomeRouterIPHelper() always fails on Andoroid 11+, so that GatewayAndSelfIP never caches the result, causing repeated ip process spawns on every periodic ReSTUN. This replaces the system/bin/ip fallback with a cached gateway IP pushed from Android’s ConnectivityManager via LinkProperties.getRoutes(). This is the same patterm used by UpdateLastKnownDefaultRouteInterface for the interface name (see https://github.com/tailscale/tailscale/pull/11784/). We keep the proc/net/route path as a fallback for early startup before NetworkChangeCallback has fired. Updates tailscale/tailscale#18622 Updates tailscale/tailscale#13352 Signed-off-by: kari-ts <kari@tailscale.com>	2026-05-27 15:42:48 -07:00
Brad Fitzpatrick	c9fb05b6f5	ipn/ipnlocal: don't dup-suppress UserProfiles on IPNBus on profile switches Fixes #19889 Change-Id: I324a735c13772c0c79ed7392c0baa5064b34823b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 14:47:02 -07:00
Brad Fitzpatrick	364b952d62	cmd/containerboot: track peers from IPN bus updates, stop using netmap.NetworkMap Some tests in another repo were broken by tailscale/tailscale#19607. This fixes them, by finishing off the rest of the migration away from netmap.NetworkMap on the IPN bus in containerboot. Containerboot used to rebuild a full NetworkMap-shaped view while reacting to IPN bus notifications. Now it insteads has its own netmapState type (immutable) of exactly what it needs to track, and sends those immutable values around, making cheap edits of new immutable values when an IPN bus edit arrives. This should make cmd/containerboot scale to much larger tailnets now too. Fixes #19852 Fixes tailscale/corp#42347 Updates #12542 Change-Id: I88adaf061f85f677f954a764935e6654329d75a6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 14:12:48 -07:00
Fran Bull	80dc7a8d07	feature/conn25: disallow addrs assignment overwriting. We don't want addr assignments to be lost from the collection before they can be returned to the IP pools, otherwise we will get orphan addresses marked inUse in the pools that will never be returned. Fixes tailscale/corp#39975 Signed-off-by: Fran Bull <fran@tailscale.com>	2026-05-27 13:54:40 -07:00
Patrick O'Doherty	8501be1990	go.mod: bump dependencies to resolve govulncheck warnings (#19884 ) Bump the following: go get -u github.com/moby/spdystream@v0.5.1 go get -u golang.org/x/crypto@v0.52.0 go get -u golang.org/x/net@v0.55.0 to resolve open govulncheck warnings. Updates #cleanup Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>	2026-05-27 12:24:59 -07:00
James Tucker	dea49bb4da	net/batching: add envknobs to disable UDP GRO & GSO It is sometimes useful when diagnosing subtle and specific performance problems to rule out GRO/GSO independently and/or toggle them to influence packet pacing. Updates #17835 Updates tailscale/corp#31164 Signed-off-by: James Tucker <james@tailscale.com>	2026-05-27 12:05:00 -07:00
James Tucker	d1912167dc	feature/taildrop: replace outgoing-file progress channel with synchronous reporter serveFilePut tracked outgoing-file progress through an unbuffered progressUpdates channel whose close was owned by the request goroutine while writers were spread across manifest parsing, the progresstracking.Reader callback, singleFilePut failure paths, and the success path. That writer-closes mismatch made the send-on-closed-channel panic effectively unfixable in place. Replace it with a request-scoped outgoingProgress reporter. Transfer code reports state by method call; the reporter coalesces hot-path updates and is flushed once via defer in serveFilePut. With no producer channel to close, the panic is structurally impossible. Fixes #19115 Fixes #19817 Change-Id: I8f00d982d2c79880dfc1f8104c5eed06e94b5a6c Signed-off-by: James Tucker <james@tailscale.com>	2026-05-27 12:00:34 -07:00
Brad Fitzpatrick	f277bfb09d	release/dist/synology: add GOARM=7,softfloat mode for hi3535 Fixes #6860 Change-Id: I36f3101e75dab35d03e76693555ac93da893f8d5 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 10:54:15 -07:00
Claus Lensbøl	9be21088f4	wgengine/{,magicsock},tstest/natlab/vmtest: send disco on cached netmap (#19878 ) Originally found when adding tests for working with cached netmaps, and finding the added tests to be flakey. When working off of a cached netmap, if a node exists in the cached netmap but does not yet have any endpoints, DERP connections are available but not direct ones. By sending callMeMaybe to nodes without endpoints in the cached netmap, we can establish direct connections for this edge case. Aditionally, ensure that TSMP disco advert messages are not sent if the endpoint does not have a valid address yet. Fixes #19843 Updates #19597 Signed-off-by: Claus Lensbøl <claus@tailscale.com>	2026-05-27 13:05:12 -04:00
Brad Fitzpatrick	b553969b03	ipnlocal: try ACME TLS-ALPN for Funnel renewals Use TLS-ALPN-01 for Funnel certificate renewals only when the node already has a cached certificate, and fall back to DNS-01 with a fresh order if the ALPN path is unavailable or fails. Dynamically advertise acme-tls/1 only while an ACME challenge certificate is pending, and add client metrics for DNS-01 and TLS-ALPN-01 start/success/failure paths. Updates tailscale/corp#41736 Fixes tailscale/corp#42320 Change-Id: I5adc6ea129237f9ef592f84fc1a8953c80bc9d5c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 09:30:23 -07:00
Jordan Whited	4aef023765	cmd/tailscaled,types/logger: remove TS_DEBUG_MEMORY and associated logger Commit `e5a8cf3b1` added feature/runtimemetrics, which emits heap bytes and total process memory as clientmetrics when the NodeAttrEmitRuntimeMetrics capability is set. That subsumes the job of the TS_DEBUG_MEMORY envknob, whose only effect is to prefix every log line with Go heap+stack and Maxrss via logger.RusagePrefixLog. Updates tailscale/corp#39434 Signed-off-by: Jordan Whited <jordan@tailscale.com>	2026-05-27 09:09:05 -07:00
Artem Leshchev	5652b6c9c0	cmd/k8s-operator: fix token exchange for identity federation (#19845 ) tailscale-client-go-v2 natively supports identity federation authentication, and in #19010 the required authentication provider is used, but the manual token exchange was never removed, so we were exchanging JWT token to an auth token, and then were trying to use that auth token for exchange once again. This commit removes the legacy mechanism, fully relying on tailscale-client-go-v2 to handle authentication. Fixes #19844 Signed-off-by: Artem Leshchev <matshch@avride.ai>	2026-05-27 16:45:07 +01:00
License Updater	77010351f0	licenses: update license notices Signed-off-by: License Updater <noreply+license-updater@tailscale.com>	2026-05-27 08:38:44 -07:00
Brad Fitzpatrick	2c965ab540	types/netmap, ipn/ipnlocal, control/controlclient: rename NodeMutationAdd to NodeMutationUpsert NodeMutationAdd was a misleading name: a PeersChanged entry in a MapResponse can represent either a truly new peer or a full replacement for an existing peer that couldn't be expressed as a PeerChangedPatch. Calling it "Add" implied it was always a completely new node, which is wrong. (I'd changed my mind on the design of mapping add/delete events to NodeMutations halfway through #19607 and forgot to update the name, even though I'd updated half the docs) Rename it to NodeMutationUpsert to reflect the actual semantics: the node should be inserted or replaced in the peer map regardless of whether it already existed. Updates #19607 Updates #12542 Change-Id: Iebd3daddb3318cba02e115a1b184fcb3ee8f83d6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 08:37:14 -07:00
Brad Fitzpatrick	a8f40a2ca5	ipn/ipnlocal: add missing bus notify of peers on full netmap The prior `aa5da2e5f2` ("process node adds/removes in constant time") commit missed a bus notification case, where new-style subscribers set NotifyNoNetmap and then the controlclient map routing sends a full update (rather than a delta). Those profiles + peers need to be put on the bus too. I noticed this only when porting the Android app over to use the new bus stuff. Updates #19607 Updates #12542 Change-Id: I82c35011d2c532222ca27f7d4e790522c31bd156 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-27 08:03:47 -07:00
Jason Dillingham	0e2b3f31af	cmd/k8s-operator: stabilize StaticEndpoints order in ProxyGroup reconciles (#19755 ) findStaticEndpoints built its return slice by iterating nodes.Items in the order returned by r.List, which is not guaranteed to be stable across calls. When the resulting set of addresses already matched the existing config Secret, the slice could still permute between reconciles, making the marshalled config Secret differ byte-for-byte. That tripped the DeepEqual check on the config Secret, which rewrote the Secret, which fired a watch event, which re-enqueued the ProxyGroup, looping forever. Detect this case and return the existing currAddrs slice unchanged when the resulting set is the same, preserving the "use the currently used IPs first" intent without spurious writes. Fixes #19700 Signed-off-by: Jason Dillingham <jasonmdillingham@gmail.com>	2026-05-27 14:28:04 +01:00
Erisa A	e2a0d45418	cmd/tailscale/cli: fix time parsing in debug daemon-logs (#19875 ) Fixes #19874 Signed-off-by: Erisa A <erisa@tailscale.com>	2026-05-27 12:30:28 +01:00
BeckyPauley	0ed6da2826	cmd/k8s-operator, net/netutil: support 4via6 in egress proxy and connector (#19863 ) Add support for configuring egress to destinations reachable via 4via6 subnet routes. This change affects standalone egress proxy only- egress ProxyGroup needs IPv6 support before being able to support 4via6. Egress may be configured using either the synthesized 4via6 address or the MagicDNS name (in the form <IPv4-address-with-hyphens-instead-of-dots>-via-<siteid>[.*]). Also update the Connector to validate and advertise 4via6 subnet routes. Export net/netutil.ValidateViaPrefix so it can be reused by the Connector validation logic. Updates #19334 Signed-off-by: Becky Pauley <becky@tailscale.com>	2026-05-27 10:54:35 +01:00
Jordan Whited	e5a8cf3b18	control/controlknobs,feature/*,ipn/ipnlocal,tailcfg: add runtimemetrics Emit runtime metrics as clientmetrics when the NodeAttrEmitRuntimeMetrics NodeCapability is present. We start small with just 2 metrics: heap bytes and total process memory. Updates tailscale/corp#39434 Signed-off-by: Jordan Whited <jordan@tailscale.com>	2026-05-26 16:02:01 -07:00
Fran Bull	2eb45c2457	feature/conn25: extend assignment expiry on use When we use assigned addresses in response to a DNS request, extend the expiry on the assignment. Updates tailscale/corp#39975 Signed-off-by: Fran Bull <fran@tailscale.com>	2026-05-26 07:28:47 -07:00
Michael Ben-Ami	5877809097	feature/conn25: unify FlowTable storage to prepare for expiry Previously we had two maps keyed on a direction-specific tuple, with distinct values containing the data (action) for that direction. Values pointed at each other across maps to ensure they were removed at the same time in the case of tuple overwrite, but LRU eviction was per-map. So if LRU was turned on, it was possible for one direction's data (action) to be evicted and leave the other direction dangling. NewFlow replaces the two direction-specific flow constructors, and lookups return the direction-specific PacketAction directly. Now the values in each map point to the same element, with data for both directions in the element. A linked list also points to the elements to implement LRU. The previous flowtrack.Cache is removed. The single LRU structure will allow us to implement idle time expiration by walking the list backward starting with the least recently used flow, and stopping after a fixed number of flows, or at the first non-expired flow. We add commented-out unused placeholder fields for tracking the "last seen" timestamp, and an on-removal hook, to document the intent for the follow-up expiry work. Updates tailscale/corp#38630 Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>	2026-05-26 10:09:48 -04:00
Yago Raña Gayoso	26952d53fa	scripts/installer.sh: update KDE Linux link (#19857 ) Signed-off-by: Yago Raña Gayoso <yago.rana.gayoso@gmail.com>	2026-05-24 21:40:42 +01:00
Simon Law	da8cd5cc7f	ipn/ipnlocal: fix documentation typo, NodeAttrCacheNetworkMaps (#19851 ) Updates #cleanup Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 22:19:10 -07:00
Simon Law	988615dbad	ipn/ipnlocal,tstest/integration: pause the control client consistently (#19846 ) There are two places where tailscaled transitions into a paused state: 1. tailscaled’s controlclient is initially created, 2. tailscale down, or the GUI equivalent, commands it to. This patch unifies the implementation of both scenarios into LocalBackend.shouldPauseControlClientLocked to prevent the implementation from drifting. The flaky tstest/integration.TestNoControlConnWhenDown test exposed this mismatch, but only by accident. This patch also changes TestNode.MustDown so that it runs `tailscale down` and then waits for the testcontrol server to finish handling any associated /machine/map requests. Fixes #19831 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-22 17:58:44 -07:00
Adrian Dewhurst	5d8f401956	net/dns: fix handling non-IP single split DNS Fixes #19834 Change-Id: I4d48efed00cd080b14c6fd713ff21e53a5a6ee3c Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>	2026-05-22 20:45:58 -04:00
Brad Fitzpatrick	5295e3e119	ipn/{ipnstate,ipnlocal}: add integer NodeID to PeerStatus In `aa5da2e5f2` we made the IPN bus include deltas, including the PeersRemoved, sending a slice of integer NodeIDs that were removed. But when updating xcode, I realized there was no way to map those integers to the stable node IDs used in other places. I was consdering changing the just-added ipn.Notify.PeersRemoved from an IntID to a string StableID, but then it doesn't match the MapResponse wire protocol, which we've tried to match so far. Instead, just add the integer ID as well. Callers can use whichever world they want, having both. It's a little regrettable that we still have two worlds of IDs, but oh well. Neither is really suitable to a hypothetical future fully federated world of control servers anyway, so we'll need a third type later anyway, so just live with the two we have for now. Updates #12542 Change-Id: Ib8fd48a265e1da1f8779152f141f624a7f7260e9 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-22 08:16:55 -07:00
Amal Bansode	e32b9bde1d	control/controlclient: fix deadlock in map session change queue processing (#19828 ) Holding an exclusive lock while writing to the unbuffered changequeue chan is likely going to deadlock when the run() path may try to grab the same lock before reading from the chan to drain it (on map session close). This causes the client to stop processing new map responses and TSMP disco key advertisements. There is a good probability of inducing this deadlock using the old code and new test added in this commit: TestUpdateDiscoForNodeCallback/test_deadlock. Also fix an unintentional regression in how the client responds to a mapResponse sleep command. `85bb5f84a5` moved the processing of mapResponses into a new goroutine, serialized via mapSession's changequeue. Thus, controlclient stopped sleeping in the same goroutine servicing mapResponses/control connections. This commit brings us back to sleeping synchronously in the same goroutine as controlclient. Updates #12639 Signed-off-by: Amal Bansode <amal@tailscale.com> Signed-off-by: Claus Lensbøl <claus@tailscale.com> Co-authored-by: Claus Lensbøl <claus@tailscale.com>	2026-05-22 07:13:18 -07:00
Simon Law	fd2405ca8f	tstest/integration: mark TestNoControlConnWhenDown as a flaky test (#19832 ) Updates #19831 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-21 17:36:09 -07:00
Simon Law	7dabebc691	net/traffic: switch rendezvous hashing from SHA256 to FNV-1a (#19821 ) In PR tailscale/corp#30448, we originally decided to break ties using SHA256 for our rendezvous hashing algorithm. Now that we’ve had some experience with it, we think that FNV-1a is a better choice. It distributes bits evenly, it’s much faster, and it doesn’t need to be cryptographically secure. The FNV designers recommend FNV-1a over the deprecated FNV-1. This PR makes the switch and updates the related tests, since changing the algorithm changes which stable pick gets selected. As of 2026-05, this is the best time to make this change, since there are almost no clients in the wild with traffic steering enabled. Updates #17366 Updates tailscale/corp#29964 Updates tailscale/corp#29966 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-21 10:11:59 -07:00
Brad Fitzpatrick	aa5da2e5f2	ipn/ipnlocal, control/controlclient: process node adds/removes in constant time For large tailnets (~50k+ nodes) with frequent peer churn (ephemeral GitHub Actions workers etc.), tailscaled used to rebuild the full netmap and fan it out on the IPN bus on every MapResponse that added or removed a peer. There were two O(N) costs per delta: the full netmap rebuild + every Notify.NetMap encode to every bus watcher. This change tackles both: 1. Plumb O(1) peer add/remove through the delta path. PeersChanged and PeersRemoved no longer prevent the delta happy path; instead, they mutate the per-node-backend peer map in place. 2. Restrict ipn.Notify.NetMap emission to the platforms whose host GUIs still depend on it (Windows, macOS, iOS) and migrate in-tree consumers off it everywhere else: - Migrate reactive consumers (containerboot, kube agents, sniproxy, tsconsensus, etc.) off Notify.NetMap to the previously-added Notify.SelfChange signal so they no longer have to subscribe to the full netmap. - Add ipn.NotifyNoNetMap so GUI clients on "legacy-emit" platforms that have already migrated can opt out of the per-watcher NetMap encode. - Gate Notify.NetMap emission on the producer side by a compile- time GOOS check, so the supporting code is dead-code-eliminated on Linux and other geese where no GUI consumer needs it. Re-running BenchmarkGiantTailnet from tstest/largetailnet, which was added along with baseline numbers on unmodified main in `ad5436af0d`, the per-delta cost (one peer add+remove pair) is now ~O(1) regardless of tailnet size N: N no-watcher (ms/op) bus-watcher (ms/op) before now factor before now factor 10000 32 0.11 300x 166 0.13 1300x 50000 222 0.11 2000x 865 0.13 6700x 100000 504 0.12 4100x 1765 0.13 13400x 250000 1551 0.12 12500x 4696 0.15 32400x Updates #12542 Change-Id: I94e34b37331d1a8ec74c299deffadf4d061fda9e Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-21 09:26:19 -07:00
Brad Fitzpatrick	2703f91174	wgengine/magicsock: fix data race in TestSetDERPMapDoReStun SetDERPMap spawns a goroutine that calls ReSTUN, which logs via the test logger. If the test returns before that goroutine logs, the goroutine races with testing cleanup. Use tstest.WhileTestRunningLogger so the goroutine's logf call becomes a no-op once the test finishes. Fixes #19829 Change-Id: I1097f98e40ffd1c5dd7fb7a715c918255853e3c6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-21 08:51:50 -07:00
Simon Law	7ebca58042	net/traffic,ipn/ipnlocal: extract traffic steering utilities (#19682 ) The traffic package contains helpers for evaluating traffic steering scores and picking appropriate nodes. These were extracted from ipnlocal.suggestExitNodeUsingTrafficSteering so they can be reused by the new routecheck package to probe exit nodes in priority order. Updates #17366 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>	2026-05-21 08:28:27 -07:00
Fran Bull	dbe92f98b5	feature/conn25: set assignment expiry based on dns response TTL Updates tailscale/corp#39975 Signed-off-by: Fran Bull <fran@tailscale.com>	2026-05-21 07:25:29 -07:00
Brad Fitzpatrick	f3a117e813	net/tsdial: run happy eyeballs across A and AAAA in UserDial When tailscaled is running in userspace-networking mode behind an exit node (e.g. as a SOCKS5 proxy), it resolves a hostname and then dials a single resolved IP through the tunnel. If the name has both A and AAAA, Go's net.Resolver merges them and we pick ips[0], which on an IPv6-native host is usually AAAA. If the exit node has no IPv6 egress (or vice versa), the dial fails silently through the tunnel and the user sees a hang. Resolve all candidates and race connect attempts across address families with a 300ms happy-eyeballs delay, matching Go's net.Dialer default and the existing pattern in net/dnscache (commit `ee0a03b14`). First success wins; losers are cancelled and any conns they produce are closed. A failBoost channel wakes the launcher when a connect fails fast (e.g. ICMP "no route" via the tunnel) so we don't sit on the 300ms timer when the answer is already known. userDialResolve is refactored into userDialResolveAll (returns the full candidate list) plus a thin single-IP wrapper for callers like UserDialPlan that don't race. UserDial's per-IP dispatch (netstack vs peer dialer vs SystemDial vs std) is extracted to dialOneUser so each candidate can route correctly on its own merits. Also fix serveDial in localapi to pass the original hostname to UserDial rather than a pre-resolved IP, so the race can fire. This fix is single-ended: it works against any exit node, including old ones, with no protocol changes. The trade-off versus filtering on the exit-node side via PeerAPI DoH is that every dial through an unreachable-family exit node costs one failed connect attempt per cache window, rather than zero, which is acceptable given the simplicity. Fixes #19792 Fixes #13257 Change-Id: I9d7645d0034caf3ee22ecdd8070798353f77e94b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>	2026-05-20 18:35:55 -07:00
James Tucker	36c52ef383	tstest/integration/testcontrol: fix serveMap read-modify-write race serveMap cloned s.nodes[nk], mutated the clone outside the mutex, then wrote it back via updateNodeLocked. A concurrent UpdateNode, SetNodeCapMap, or other writer landing between the clone and the writeback would be silently clobbered. Mutate the live node under the mutex instead. Surfaces in tsnet's TestListenService as a flaky ErrUntaggedServiceHost panic: the test calls control.UpdateNode to attach a tag, a concurrent updateRoutine map request from the host races, and the host's next netmap arrives with Tags=[]. Updates #19822 Change-Id: I6c5ebd5e5bf79a40316f53f627157230773cb469 Signed-off-by: James Tucker <james@tailscale.com>	2026-05-20 18:29:58 -07:00
Aria Stewart	61277e3ad4	Construct IPv6 ingress URLs correctly Fixes #19338 Signed-off-by: Aria Stewart <aredridel@dinhe.net>	2026-05-20 17:21:35 -07:00

1 2 3 4 5 ...

10695 Commits