Adds VIP service name resolution to the MagicDNS map so that service
names like "mydb" and "mydb.<tailnet>" resolve to the service VIP address.
Uses the same address family iteration as the peer loop to avoid inserting
unreachable IPv4 addresses on IPv6-only nodes.
Fixes#19097
Signed-off-by: Raj Singh <raj@tailscale.com>
tsnet has a 5s sleep as part of its logic waiting to log successful auth.
Add an additional channel that will interrupt this sleep early if the
local backend's state changes before then. This is early enough in the
bootstrap logic that the local client has not been set up yet, so we
subscribe directly on the local backend in keeping with the rest of the
function, but it would be nice to port the whole function to the new
eventbus in a separate change.
Note this does not affect how quickly auth actually happens, it just
ensures we more responsively log the fact that auth state has changed.
Updates #16340
Change-Id: I7a28fd3927bbcdead9a5aad39f4a3596b5f659b0
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
When racing multiple upstream DNS resolvers, a REFUSED (RCode 5) response
from a broken or misconfigured resolver could win the race and be returned
to the client before healthier resolvers had a chance to respond with a
valid answer. This caused complete DNS failure in cases where, e.g., a
broken upstream resolver returned REFUSED quickly while a working resolver
(such as 1.1.1.1) was still responding.
Previously, only SERVFAIL (RCode 2) was treated as a soft error. REFUSED
responses were returned as successful bytes and could win the race
immediately. This change also treats REFUSED as a soft error in the UDP
and TCP forwarding paths, so the race continues until a better answer
arrives. If all resolvers refuse, the first REFUSED response is returned
to the client.
Additionally, SERVFAIL responses from upstream resolvers are now returned
verbatim to the client rather than replaced with a locally synthesized
packet. Synthesized SERVFAIL responses were authoritative and guaranteed
to include a question section echoing the original query; upstream
responses carry no such guarantees but may include extended error
information (e.g. RFC 8914 extended DNS errors) that would otherwise
be lost.
Fixes#19024
Signed-off-by: Brendan Creane <bcreane@gmail.com>
This path is currently only used by DERP servers that have also
enabled `verify-clients` to ensure that only authorized clients
within a Tailnet are allowed to use said DERP server.
The previous naive linear scan in NodeByKey would almost
certainly lead to bad outcomes with a large enough netmap, so
address an existing todo by building a map of node key -> node ID.
Updates #19042
Signed-off-by: Amal Bansode <amal@tailscale.com>
This repo's module is tailscale.com, and the tailscale-client-go-v2 repo
uses tailscale.com/client/tailscale/v2. It seems from #19010 that if we
have the client module as a dependency in this module, go vet will start
to consider the client module as part of tailscale.com/...
I'm not sure if this is a bug in go vet, but for now let's take the easy
fix and specify ./... instead. In my testing, it seems like this is
sufficient to make sure it just walks the file hierarchy and doesn't
find the client module as a sub-path.
Updates tailscale/corp#38418
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Rather than printing `unknown subcommand: drive` for any Taildrive
commands run in the macOS GUI, print an error message directing the user
to the GUI client and the docs page.
Updates #17210Fixes#18823
Change-Id: I6435007b5911baee79274b56e3ee101e6bb6d809
Signed-off-by: Alex Chan <alexc@tailscale.com>
Updates #19050
When tsnet.Server.start() is called with both Hostname and Dir explicitly
set, os.Executable() failure should not prevent the server from starting.
Extend the existing ios fallback to also cover darwin, where the same
failure occurs when the Go runtime is embedded in a framework launched
via Xcode's debug launcher.
Signed-off-by: Prakash Rudraraju <prakashrj@yahoo.com>
Also implement a limit of one on the number of goroutines that can be
waiting to do a reconfig via AuthReconfig, to prevent extensions from
calling too fast and taxing resources.
Even with the protection, the new method should only be used in
experimental or proof-of-concept contexts. The current intended use is
for an extension to be able force a reconfiguration of WireGuard, and
have the reconfiguration call back into the extension for extra Allowed
IPs.
If in the future if WireGuard is able to reconfigure individual peers more
dynamically, an extension might be able to hook into that process, and
this method on ipnext.Host may be deprecated.
Fixestailscale/corp#38120
Updates tailscale/corp#38124
Updates tailscale/corp#38125
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
The actual secret is passed through argon2 first, so a timing attack is
not feasible remotely, and pretty unlikely locally. Still, clean this
up.
Fixes#19063
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
* ipn: reject advertised routes with non-address bits set
The config file path, EditPrefs local API, and App Connector API were
accepting invalid subnet route prefixes with non-address bits set (e.g.,
2a01:4f9:c010:c015::1/64 instead of 2a01:4f9:c010:c015::/64). All three
paths now reject prefixes where prefix != prefix.Masked() with an error
message indicating the expected masked form.
Updates tailscale/corp#36738
Signed-off-by: Brendan Creane <bcreane@gmail.com>
* address review comments
Signed-off-by: Brendan Creane <bcreane@gmail.com>
---------
Signed-off-by: Brendan Creane <bcreane@gmail.com>
When a client starts up without being able to connect to control, it
sends its discoKey to other nodes it wants to communicate with over
TSMP. This disco key will be a newer key than the one control knows
about.
If the client that can connect to control gets a full netmap, ensure
that the disco key for the node not connected to control is not
overwritten with the stale key control knows about.
This is implemented through keeping track of mapSession and use that for
the discokey injection if it is available. This ensures that we are not
constantly resetting the wireguard connection when getting the wrong
keys from control.
This is implemented as:
- If the key is received via TSMP:
- Set lastSeen for the peer to now()
- Set online for the peer to false
- When processing new keys, only accept keys where either:
- Peer is online
- lastSeen is newer than existing last seen
If mapSession is not available, as in we are not yet connected to
control, punt down the disco key injection to magicsock.
Ideally, we will want to have mapSession be long lived at some point in
the near future so we only need to inject keys in one location and then
also use that for testing and loading the cache, but that is a yak for
another PR.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
This populates UserProfile.Groups in the WhoIs response from the
local backend with the groups of the corresponding user in the
netmap.
This allows tsnet apps to see (and e.g. forward) which groups a
user making a request belongs to - as long as the tsnet app runs
on a node that been granted the tailscale.com/visible-groups
capability via node attributes. If that's not the case or the
user doesn't belong to any groups allow-listed via the node
attribute, Groups won't be populated.
Updates tailscale/corp#31529
Signed-off-by: Gesa Stupperich <gesa@tailscale.com>
Two methods could deadlock during shutdown when closing the wrapper.
Ensure that the writers are aware of the wrapper being closed.
Fixes#19037
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Currently IP forwarding health check is done on sending MapRequests.
Move ip forwarding to the health service to gain the benefits
of the health tracker and perodic monitoring out of band from
the MapRequest path. ipnlocal now provides a closure to
the health service to provide the check if forwarding is broken.
Removed `skipIPForwardingCheck` from controlclient/direct.go,
it wasn't being used as the comments describe it, that check
has moved to ipnlocal for the closure to the health tracker.
Updates #18976
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
These were previously swappable for historical reasons that are no
longer relevant.
Removing the indirection enables future inlining optimizations if we
simplify further.
Updates tailscale/corp#38703
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Introduce a datapathHandler that implements hooks that will
receive packets from the tstun.Wrapper. This commit does not wire
those up just yet.
Perform DNAT from Magic IP to Transit IP on outbound flows on clients,
and reverse SNAT in the reverse direction.
Perform DNAT from Transit IP to final destination IP on outbound flows
on connectors, and reverse SNAT in the reverse direction.
Introduce FlowTable to cache validated flows by 5-tuple for fast lookups
after the first packet.
Flow expiration is not covered, and is intended as future work before
the feature is officially released.
Fixestailscale/corp#34249Fixestailscale/corp#35995
Co-authored-by: Fran Bull <fran@tailscale.com>
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
Adds envknobs to override the netstack default TCP keepalive idle time
(~2h) and probe interval (75s) for forwarded connections.
When a tailnet peer goes away without closing its connections (pod
deleted, peer removed from the netmap, silent network partition), the
forwardTCP io.Copy goroutines block until keepalive fires: the
gvisor-side Read waits on a peer that will never send again, and the
backend-side Read waits on a backend that is alive and idle. With the
netstack default of 7200s idle + 9×75s probes, dead-peer detection
takes a little over two hours. Under high-churn forwarding — many
short-lived peers, or peers holding thousands of proxied connections
that drop at once — stuck goroutines accumulate faster than they clear.
The existing SetKeepAlive(true) at this site enables keepalive without
setting the timers; the TODO above it noted "a shorter default might
be better" and "might be a useful user-tunable". This makes both
timers tunable without changing the defaults: unset preserves the ~2h
behavior, which is the right trade-off for battery-powered peers.
The two knobs are independent — setting one leaves the other at the
netstack default. The options are set before SetKeepAlive(true) so the
timer arms with the configured values rather than the defaults —
matches the order in ipnlocal/local.go for SSH keepalive.
Updates #4522
Signed-off-by: Josef Bacik <josefbacik@anthropic.com>
After #18179 switched to L4 TCPForward, EnsureCertLoops found no
domains since it only checked service.Web entries. Certs were never
provisioned, leaving kube-apiserver ProxyGroups stuck at 0/N ready.
Fixes#19019
Signed-off-by: Raj Singh <raj@tailscale.com>
When we are mapping a dns response, if it is a connector domain, change
the source IP addresses for our magic IP addresses. This will allow the
tailscaled to DNAT the traffic for the domain to the connector.
Updates tailscale/corp#34258
Signed-off-by: Fran Bull <fran@tailscale.com>
Before sending a fix for #18991, this adds an integration test that
locks in that the proxymap WhoIs code works with two nodes running as
different users, with the second node running a localhost service and
able to use its local tailscaled to identify a Tailscale connection
from the other tailscaled.
Updates #18991
Change-Id: I6fbb0810204d77d2ac558f0cc786b73e3248d031
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For the purpose of improved observability of UDP socket receive buffer
overflows on Linux.
Updates tailscale/corp#37679
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Changed the mapping to store the transit IPs to be indexed by
peer IP rather than NodeID because the data path only has access
to the peer's IP. This change means that IPv4 transit IPs need to
be indexed by the peer's IPv4 address, and IPv6 transit IPs need to
be indexed by the peer's IPv6 address. It is an error if the peer
does not have an address of the same family as the transit IP.
It is also an error if the transit and destination IP families do
not match.
Added a check to ensure that the TransitIPRequest.App matches a
configured app on the connector.
Added additional TransitIPResponse codes to identify the new errors
and change the exsting use of the Other code to use it's own
specific code.
Added logging for the error cases, since they generally indicate that
a peer has constructed a bad request or that there is a config
mismatch between the peer and the local netmap.
Added a test framework for handleConnectorTransitIPRequest and moved
the existing tests into the framework and added new tests.
Fixestailscale/corp#37143
Signed-off-by: George Jones <george@tailscale.com>
The e2e ingress test was very occasionally flaky. On looking at operator
logs from one failure, you can see the default ProxyClass was not ready
before the first reconcile loop for the exposed Service. The ProxyClass
became ready soon after, but no additional reconciles were triggered for
the exposed Service because we only triggered reconciles for Services
that explicitly named their ProxyClass.
This change adds additional list API calls for when it's the default
ProxyClass that's been updated in order to catch Services that use it by
default. It also adds indexes for the fields we need to search on to
ensure the list is efficient.
Fixestailscale/corp#37533
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
This commit adds a "fallback" mechanism to tsnet to allow
the consumer to set "TS_CONTROL_URL" to override the control server.
This allows tsnet applications to gain support for an alternative
control server by just updating without explicitly exposing the
ControlURL option.
Updates #16934
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
Add two small APIs to support out-of-tree projects to exchange custom
signaling messages over DERP without requiring disco protocol
extensions:
- OnDERPRecv callback on magicsock.Options / wgengine.Config: called for
every non-disco DERP packet before the peer map lookup, allowing callers
to intercept packets from unknown peers that would otherwise be dropped.
- SendDERPPacketTo method on magicsock.Conn: sends arbitrary bytes to a
node key via a DERP region, creating the connection if needed. Thin
wrapper around the existing internal sendAddr.
Also allow netstack.Start to accept a nil LocalBackend for use cases
that wire up TCP/UDP handlers directly without a full LocalBackend.
Updates tailscale/corp#24454
Change-Id: I99a523ef281625b8c0024a963f5f5bf5d8792c17
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
After switching from cellular to wifi without ipv6, ForeachInterface still sees rmnet prefixes, so HaveV6 stays true, and magicsock keeps attempting ipv6 connections that either route through cellular or time out for users on wifi without ipv6
This:
-Adds SetAndroidBindToNetworkFunc, a callback to bind the socket to the selected Android Network object
Updates tailscale/tailscale#6152
Signed-off-by: kari-ts <kari@tailscale.com>
In TestUserspaceEnginePortReconfig, when selecting a port, use a random offset rather than searching in a continguous range in case there is a range that is blocked
Updates tailscale/tailscale#2855
Signed-off-by: kari-ts <kari@tailscale.com>