Add a new UsingCachedNetworkMap flag to the ipnstate.Status message.
Populate it as true when self has a network map that was loaded from cache,
vs. being directly provided by the control plane.
Updates #12639
Change-Id: I0fd5b1d6ec8df5cec5d4a3fa6b263293124780d1
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
This is a follow-up to #19117, adding a debug CLI command allowing the operator
to explicitly discard cached netmap data, as a safety and recovery measure.
Updates #12639
Change-Id: I5c3c47c0204754b9c8e526a4ff8f69d6974db6d0
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
When getting a full map from control, disco keys for the nodes will also
be delivered. When communicating with a peer that is running without
being connected to control, and having that connection running based on
a TSMP learned disco key, we need to avoid overwriting the disco key for
that peer with the stale one control knows about.
Add a filter that filteres out keys from control, and replace them with
the TSMP learned disco keys.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
* cmd/k8s-operator/e2e: add L7 HA ingress test
Change-Id: Ic017e4a7e3affbc3e2a87b9b6b9c38afd65f32ed
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
* cmd/k8s-operator: add further E2E tests for Ingress (#34833)
This change adds E2E tests for L3 HA Ingress and L7 Ingress (Standalone and
HA). Updates the existing L3 Ingress test to use the Service's Magic DNS
name to test connectivity.
Also refactors test setup to set TS_DEBUG_ACME_DIRECTORY_URL only for tests
running against devcontrol, and updates the Kind node image from v1.30.0 to
v1.35.0.
Fixestailscale/corp#34833
Signed-off-by: Becky Pauley <becky@tailscale.com>
---------
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Signed-off-by: Becky Pauley <becky@tailscale.com>
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
We have ~2.5k nodes running Void Linux, which report a version string
like `1.96.2_1 (Void Linux)`. Previously these versions would fail to
parse, because we only expect a hyphen and `extraCommits` after the
major/minor/patch numbers.
Fix the version parsing logic to handle this case.
Updates #19148
Change-Id: Ica4f172d080af266af7f0d69bb31483a095cd199
Signed-off-by: Alex Chan <alexc@tailscale.com>
Add a new tailcfg.NodeCapability (NodeAttrCacheNetworkMaps) to control whether
a node with support for caching network maps will attempt to do so. Update the
capability version to reflect this change (mainly as a safety measure, as the
control plane does not currently need to know about it).
Use the presence (or absence) of the node attribute to decide whether to create
and update a netmap cache for each profile. If caching is disabled, discard the
cached data; this allows us to use the presence of a cached netmap as an
indicator it should be used (unless explicitly overridden). Add a test that
verifies the attribute is respected. Reverse the sense of the environment knob
to be true by default, with an override to disable caching at the client
regardless what the node attribute says.
Move the creation/update of the netmap cache (when enabled) until after
successfully applying the network map, to reduce the possibility that we will
cache (and thus reuse after a restart) a network map that fails to correctly
configure the client.
Updates #12639
Change-Id: I1df4dd791fdb485c6472a9f741037db6ed20c47e
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Instead of sending out disco keys via TSMP once, send them out in
intervals of 60+ seconds. The trigger is still callmemaaybe and the keys
will not be send if no direct connection needs to be established.
This fixes a case where a node can have stale keys but have communicated
with the other peer before, leading to an infinite DERP state.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
In #10057, @seigel pointed out an inconsistency in the help text for
`exit-node list` and `set --exit-node`:
1. Use `tailscale exit-node list`, which has a column titled "hostname"
and tells you that you can use a hostname with `set --exit-node`:
```console
$ tailscale exit-node list
IP HOSTNAME COUNTRY CITY STATUS
100.98.193.6 linode-vps.tailfa84dd.ts.net - - -
[…]
100.93.242.75 ua-iev-wg-001.mullvad.ts.net Ukraine Kyiv -
# To view the complete list of exit nodes for a country, use `tailscale exit-node list --filter=` followed by the country name.
# To use an exit node, use `tailscale set --exit-node=` followed by the hostname or IP.
# To have Tailscale suggest an exit node, use `tailscale exit-node suggest`.
```
(This is the same format hostnames are presented in the admin
console.)
2. Try copy/pasting a hostname into `set --exit-node`:
```console
$ tailscale set --exit-node=linode-vps.tailfa84dd.ts.net
invalid value "linode-vps.tailfa84dd.ts.net" for --exit-node; must be IP or unique node name
```
3. Note that the command allows some hostnames, if they're from nodes
in a different tailnet:
```console
$ tailscale set --exit-node= ua-iev-wg-001.mullvad.ts.net
$ echo $?
0
```
This patch addresses the inconsistency in two ways:
1. Allow using `tailscale set --exit-node=` with an FQDN that's missing
the trailing dot, matching the formatting used in `exit-node list`
and the admin console.
2. Make the description of valid exit nodes consistent across commands
("hostname or IP").
Updates #10057
Change-Id: If5d74f950cc1a9cc4b0ebc0c2f2d70689ffe4d73
Signed-off-by: Alex Chan <alexc@tailscale.com>
This avoids putting "DisablementSecrets" in the JSON output from
`tailscale lock log`, which is potentially scary to somebody who doesn't
understand the distinction.
AUMs are stored and transmitted in CBOR-encoded format, which uses an
integer rather than a string key, so this doesn't break already-created
TKAs.
Fixes#19189
Change-Id: I15b4e81a7cef724a450bafcfa0b938da223c78c9
Signed-off-by: Alex Chan <alexc@tailscale.com>
* Refer to "tailnet-lock" instead of "network-lock" in log messages
* Log keys as `tlpub:<hex>` rather than as Go structs
Updates tailscale/corp#39455
Updates tailscale/corp#37904
Change-Id: I644407d1eda029ee11027bcc949897aa4ba52787
Signed-off-by: Alex Chan <alexc@tailscale.com>
Prior to this change, closing multiple ServiceListeners concurrently
could result in failures as the independent close operations vie for the
attention of the Server's LocalBackend. The close operations would each
obtain the current ETag of the serve config and try to write new serve
config using this ETag. When one write invalidated the ETag of another,
the latter would fail. Exacerbating the issue, ServiceListener.Close
cannot be retried.
This change resolves the bug by using Server.mu to synchronize across
all ServiceListener.Close operations, ensuring they happen serially.
Fixes#19169
Signed-off-by: Harry Harpham <harry@tailscale.com>
This is a regression test for #19166, in which it was discovered that
after calling Server.ListenService for multiple Services, only the
Service from the most recent call would be advertised.
The bug was fixed in 99f8039101
Updates #19166
Signed-off-by: Harry Harpham <harry@tailscale.com>
This makes the limits easier to find and change, rather than scattering
them across the TKA code.
Updates #cleanup
Change-Id: I2f9b3b83d293eebb2572fa7bb6de2ca1f3d9a192
Signed-off-by: Alex Chan <alexc@tailscale.com>
The disco key subscriber could deadlock in a scenario where a self node
update came through the control path into the mapSession after the disco
key subscriber had taken the lock, but before it had pushed the netmap
change, as both the subscriber and onSelfNodeChanged needs the
controlclient lock.
The subscriber can safely take the mapsession as the changequeue has its
own lock for inserting records, and also checks if the queue has been
closed before inserting.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Without this, any test relying on underlying use of magicsock will fail
without network connectivity, even when the test logic has no need for a
network connection. Tests currently in this bucket include many in
tstest/integration and in tsnet.
Further explanation:
ipn only becomes Running when it sees at least one live peer or DERP
connection:
0cc1b2ff76/ipn/ipnlocal/local.go (L5861-L5866)
When tests only use a single node, they will never see a peer, so the
node has to wait to see a DERP server.
magicsock sets the preferred DERP server in updateNetInfo(), but this
function returns early if the network is down.
0cc1b2ff76/wgengine/magicsock/magicsock.go (L1053-L1106)
Because we're checking the real network, this prevents ipn from entering
"Running" and causes the test to fail or hang.
In tests, we can assume the network is up unless we're explicitly testing
the behaviour of tailscaled when the network is down. We do something similar
in magicsock/derp.go, where we assume we're connected to control unless
explicitly testing otherwise:
7d2101f352/wgengine/magicsock/derp.go (L166-L177)
This is the template for the changes to `networkDown()`.
Fixes#17122
Co-authored-by: Alex Chan <alexc@tailscale.com>
Signed-off-by: Harry Harpham <harry@tailscale.com>
When disco keys are learned on a node that is connected to control and
has a mapSession, wgengine will see the key as having changed, and
assume that any existing connections will need to be reset.
For keys learned via TSMP, the connection should not be reset as that
key is learned via an active wireguard connection. If wgengine resets
that connetion, a 15s timeout will occur.
This change adds a map to track new keys coming in via TSMP, and removes
them from the list of keys that needs to trigger wireguard resets. This
is done with an interface chain from controlclient down via localBackend
to userspaceEngine via the watchdog.
Once a key has been actively used for preventing a wireguard reset, the
key is removed from the map.
If mapSession becomes a long lived process instead of being dependent on
having a connection to control. This interface chain can be removed, and
the event sequence from wrap->controlClient->userspaceEngine, can be
changed to wrap->userspaceEngine->controlClient as we know the map will
not be gunked up with stale TSMP entries.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
AppendTo returns the new slice but the result was discarded,
so only the newly added service was advertised.
Signed-off-by: Evan Champion <110177090+evan314159@users.noreply.github.com>
Add riscv64 to the GOARCH list passed to mkctr for all Docker image
builds. Go already cross-compiles for riscv64, so this just adds the
architecture to the container manifest.
Updates #17812
Signed-off-by: Bruno Verachten <gounthar@gmail.com>
Previously, running `add/remove/revoke-keys` without passing any keys
would fail with an unhelpful error:
```console
$ tailscale lock revoke-keys
generation of recovery AUM failed: sending generate-recovery-aum: 500 Internal Server Error: no provided key is currently trusted
```
or
```console
$ tailscale lock revoke-keys
generation of recovery AUM failed: sending generate-recovery-aum: 500 Internal Server Error: network-lock is not active
```
Now they fail with a more useful error:
```console
$ tailscale lock revoke-keys
missing argument, expected one or more tailnet lock keys
```
Fixes#19130
Change-Id: I9d81fe2f5b92a335854e71cbc6928e7e77e537e3
Signed-off-by: Alex Chan <alexc@tailscale.com>
Install the previously uninstalled hooks for the filter and tstun
intercepts. Move the DNS manager hook installation into Init() with all
the others. Protect all implementations with a short-circuit if the node
is not configured to use Connectors 2025. The short-circuit pattern
replaces the previous pattern used in managing the DNS manager hook, of
setting it to nil in response to CapMap changes.
Fixestailscale/corp#38716
Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
The tailscale-online.target and tailscale-wait-online.service systemd
units were added in 30e12310f1 but never included in the release
packaging (tarballs, debs, rpms).
Updates #11504
Change-Id: I93e03e1330a7ff8facf845c7ca062ed2f0d35eaa
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The hook calls into the client assigned addresses to return a view of
the transit IPs associated with that connector.
Fixestailscale/corp#38125
Signed-off-by: George Jones <george@tailscale.com>
The client needs to know the set of transit IPs that are assigned
to each connector, so when we register transit IPs with the connector
we also need to assign them to that connector in the addrAssignments.
We identify the connector by node public key to match the peer information
that is available when the ExtraWireguardAllowedIPs hook will be invoked.
Fixestailscale/corp#38127
Signed-off-by: George Jones <george@tailscale.com>
* ipn/ipnlocal: warn incompatibility between no-snat-routes and exitnode
This commit adds a warning to health check when the --snat-subnet-routes=false flag for subnet router is
set alone side --advertise-exit-node=true. These two would conflict with each other and result internet-bound
traffic from peers using this exit node no masqueraded to the node's source IP and fail to route return
packets back. The described combination is not valid until we figure out a way to separate exitnode masquerade rule and skip it for subnet routes.
Updates #18725
Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
* use date instead of for now to clarify effectivness
Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
---------
Signed-off-by: KevinLiang10 <37811973+KevinLiang10@users.noreply.github.com>
conn25 needs to add routes to the operating system to direct handling
of the addresses in the magic IP range to the tailscale0 TUN and
tailscaled.
The way we do this for exit nodes and VIP services is that we add routes
to the Routes field of router.Config, and then the config is passed to
the WireGuard engine Reconfig.
conn25 is implemented as an ipnext.Extension and so this commit adds a
hook to ipnext.Hooks to allow any extension to provide routes to the
config. The hook if provided is called in routerConfigLocked, similarly
to exit nodes and VIP services.
Fixestailscale/corp#38123
Signed-off-by: Fran Bull <fran@tailscale.com>
On Linux batching.Conn will now write a vector of
coalesced buffers via sendmmsg(2) instead of copying
fragments into a single buffer.
Scatter-gather I/O has been available on Linux since the
earliest days (reworked in 2.6.24). Kernel passes fragments
to the driver if it supports it, otherwise linearizes
upon receiving the data.
Removing the copy overhead from userspace yields up to 4-5%
packet and bitrate improvement on Linux with GSO enabled:
46Gb/s 4.4m pps vs 44Gb/s 4.2m pps w/32 Peer Relay client flows.
Updates tailscale/corp#36989
Change-Id: Idb2248d0964fb011f1c8f957ca555eab6a6a6964
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
Adds the ability to detect when running on AWS ECS and fetch tokens from
the ECS metadata endpoints in addition to IMDSv2
Fixes#18909
Signed-off-by: Patrick Guinard <patrick@public.com>
Previous to this change, closing the listener returned by
Server.ListenService would free system resources, but not clean up state
in the Server's local backend. With this change, the local backend state
is now cleaned on close.
Fixestailscale/corp#35860
Signed-off-by: Harry Harpham <harry@tailscale.com>
TestListenService needs to setup state (capabilities, advertised routes,
ACL tags, etc.). It is imperative that this state propagates to all nodes
in the test tailnet before proceeding with the test. To achieve this,
TestListenService currently polls each node's local backend in a loop.
Using local.Client.WatchIPNBus improves the situation by blocking until
a new netmap comes in.
Fixestailscale/corp#36244
Signed-off-by: Harry Harpham <harry@tailscale.com>
This helps us distribute tests across CI runners. Most tsnet tests call
tstest.Shard, but two recently added tests do not: tsnet.TestFunnelClose
and tsnet.TestListenService. This commit resolves the oversight.
Fixestailscale/corp#36242
Signed-off-by: Harry Harpham <harry@tailscale.com>
If errors occured, the updater could end up deadlocked.
Closing the done channel rather than adding to it, fixes a deadlock in
the corp tests.
Updates #19111
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Use bufio.Writer.AvailableBuffer to write the 32-byte public key
directly into bufio's internal buffer as a single append+Write,
avoiding 32 separate WriteByte calls. Fall back to the existing
byte-at-a-time path when the buffer has insufficient space.
```
name old ns/op new ns/op speedup
NodeWriteRawWithoutAllocating-8 121 12.5 ~9.7x
(0 allocs/op in both)
```
Add BenchmarkNodeWriteRawWithoutAllocating and expand
TestNodeWriteRawWithoutAllocating to cover both fast (AvailableBuffer)
and slow (WriteByte fallback) paths with correctness and allocation
checks.
Updates tailscale/corp#38509
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
Use bufio.Writer.AvailableBuffer to write the frame header directly
into bufio's internal buffer as a single append+Write, avoiding 5
separate WriteByte calls. Fall back to the existing writeUint32
byte-at-a-time path when the buffer has insufficient space.
```
name old ns/op new ns/op speedup
WriteFrameHeader-8 18.8 7.8 ~2.4x
(0 allocs/op in both)
```
Add TestWriteFrameHeader with correctness
checks, allocation assertions, and coverage of both fast and slow
write paths. Move BenchmarkReadFrameHeader from client_test.go to
derp_test.go alongside BenchmarkWriteFrameHeader, co-located with
the functions under test.
Updates tailscale/corp#38509
Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
If AutoUpdate.Check is false, the client has opted out of checking for updates, so we shouldn't broadcast ClientVersion. If the client has opted in, it should be included in the initial Notify.
Updates tailscale/corp#32629
Signed-off-by: kari-ts <kari@tailscale.com>