When a client starts up without being able to connect to control, it
sends its discoKey to other nodes it wants to communicate with over
TSMP. This disco key will be a newer key than the one control knows
about.
If the client that can connect to control gets a full netmap, ensure
that the disco key for the node not connected to control is not
overwritten with the stale key control knows about.
This is implemented through keeping track of mapSession and use that for
the discokey injection if it is available. This ensures that we are not
constantly resetting the wireguard connection when getting the wrong
keys from control.
This is implemented as:
- If the key is received via TSMP:
- Set lastSeen for the peer to now()
- Set online for the peer to false
- When processing new keys, only accept keys where either:
- Peer is online
- lastSeen is newer than existing last seen
If mapSession is not available, as in we are not yet connected to
control, punt down the disco key injection to magicsock.
Ideally, we will want to have mapSession be long lived at some point in
the near future so we only need to inject keys in one location and then
also use that for testing and loading the cache, but that is a yak for
another PR.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
For the purpose of improved observability of UDP socket receive buffer
overflows on Linux.
Updates tailscale/corp#37679
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Add two small APIs to support out-of-tree projects to exchange custom
signaling messages over DERP without requiring disco protocol
extensions:
- OnDERPRecv callback on magicsock.Options / wgengine.Config: called for
every non-disco DERP packet before the peer map lookup, allowing callers
to intercept packets from unknown peers that would otherwise be dropped.
- SendDERPPacketTo method on magicsock.Conn: sends arbitrary bytes to a
node key via a DERP region, creating the connection if needed. Thin
wrapper around the existing internal sendAddr.
Also allow netstack.Start to accept a nil LocalBackend for use cases
that wire up TCP/UDP handlers directly without a full LocalBackend.
Updates tailscale/corp#24454
Change-Id: I99a523ef281625b8c0024a963f5f5bf5d8792c17
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Fix three independent flake sources, at least as debugged by Claude,
though empirically no longer flaking as it was before:
1. Poll for connection counter data instead of reading immediately.
The conncount callback fires asynchronously on received WireGuard
traffic, so after counts.Reset() there is no guarantee the counter
has been repopulated before checkStats reads it. Use tstest.WaitFor
with a 5s timeout to retry until a matching connection appears.
2. Replace the *2 symmetry assumption in global metric assertions.
metricSendUDP and friends are AggregateCounters that sum per-conn
expvars from both magicsock instances. The old assertion assumed
both instances had identical packet counts, which breaks under
asymmetric background WireGuard activity (handshake retries, etc).
The new assertGlobalMetricsMatchPerConn computes the actual sum of
both conns' expvars and compares against the AggregateCounter value.
3. Tolerate physical stats being 0 when user metrics are non-zero.
A rebind event replaces the socket mid-measurement, resetting the
physical connection counter while user metrics still reflect packets
processed before the rebind. Log instead of failing in this case.
Also move counts.Reset() after metric reads and reorder the reset
sequence (counts before metrics) to minimize the race window.
Fixestailscale/tailscale#13420
Change-Id: I7b090a4dc229a862c1a52161b3f2547ec1d1f23f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I omitted a lot of the min/max modernizers because they didn't
result in more clear code.
Some of it's older "for x := range 123".
Also: errors.AsType, any, fmt.Appendf, etc.
Updates #18682
Change-Id: I83a451577f33877f962766a5b65ce86f7696471c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Fix its/it's, who's/whose, wether/whether, missing apostrophes
in contractions, and other misspellings across the codebase.
Updates #cleanup
Change-Id: I20453b81a7aceaa14ea2a551abba08a2e7f0a1d8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
To be less spammy in stable, add a nob that disables the creation and
processing of TSMPDiscoKeyAdvertisements until we have a proper rollout
mechanism.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
The "public key moved" panic has caused confusion on multiple occasions,
and is a known issue for Mullvad. Add a loose heuristic to detect
Mullvad nodes, and trigger distinct panics for Mullvad and non-Mullvad
instances, with a link to the associated bug.
When this occurs again with Mullvad, it'll be easier for somebody to
find the existing bug.
If it occurs again with something other than Mullvad, it'll be more
obvious that it's a distinct issue.
Updates tailscale/corp#27300
Change-Id: Ie47271f45f2ff28f767578fcca5e6b21731d08a1
Signed-off-by: Alex Chan <alexc@tailscale.com>
derpActiveFunc was being called immediately as a bare goroutine,
before startGate was resolved. For the firstDerp case, startGate
is c.derpStarted which only closes after dc.Connect() completes,
so derpActiveFunc was firing before the DERP connection existed.
We now block it with the same logic used by runDerpReader and by
runDerpWriter.
Updates: #18810
Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Restore synchronous method calls from LocalBackend to magicsock.Conn
for node views, filter, and delta mutations. The eventbus delivery
introduced in 8e6f63cf1 was invalid for these updates because
subsequent operations in the same call chain depend on magicsock
already having the current state. The Synchronize/settleEventBus
workaround was fragile and kept requiring more workarounds and
introducing new mystery bugs.
Since eventbus was added, we've since learned more about when to use
eventbus, and this wasn't one of the cases.
We can take another swing at using eventbus for netmap changes in a
future change.
Fixes#16369
Updates #18575 (likely fixes)
Change-Id: I79057cc9259993368bb1e350ff0e073adf6b9a8f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The Tailscale CLI has some methods to watch the IPN bus for
messages, say, the current netmap (`tailscale debug netmap`).
The Tailscale daemon supports this using a streaming HTTP
response. Sometimes, the client can close its connection
abruptly -- due to an interruption, or in the case of `debug netmap`,
intentionally after consuming one message.
If the server daemon is writing a response as the client closes
its end of the socket, the daemon typically encounters a "broken pipe"
error. The "Watch IPN Bus" handler currently logs such errors after
they're propagated by a JSON encoding/writer helper.
Since the Tailscale CLI nominally closes its socket with the daemon
in this slightly ungraceful way (viz. `debug netmap`), stop logging
these broken pipe errors as far as possible. This will help avoid
confounding users when they scan backend logs.
Updates #18477
Signed-off-by: Amal Bansode <amal@tailscale.com>
This file was never truly necessary and has never actually been used in
the history of Tailscale's open source releases.
A Brief History of AUTHORS files
---
The AUTHORS file was a pattern developed at Google, originally for
Chromium, then adopted by Go and a bunch of other projects. The problem
was that Chromium originally had a copyright line only recognizing
Google as the copyright holder. Because Google (and most open source
projects) do not require copyright assignemnt for contributions, each
contributor maintains their copyright. Some large corporate contributors
then tried to add their own name to the copyright line in the LICENSE
file or in file headers. This quickly becomes unwieldy, and puts a
tremendous burden on anyone building on top of Chromium, since the
license requires that they keep all copyright lines intact.
The compromise was to create an AUTHORS file that would list all of the
copyright holders. The LICENSE file and source file headers would then
include that list by reference, listing the copyright holder as "The
Chromium Authors".
This also become cumbersome to simply keep the file up to date with a
high rate of new contributors. Plus it's not always obvious who the
copyright holder is. Sometimes it is the individual making the
contribution, but many times it may be their employer. There is no way
for the proejct maintainer to know.
Eventually, Google changed their policy to no longer recommend trying to
keep the AUTHORS file up to date proactively, and instead to only add to
it when requested: https://opensource.google/docs/releasing/authors.
They are also clear that:
> Adding contributors to the AUTHORS file is entirely within the
> project's discretion and has no implications for copyright ownership.
It was primarily added to appease a small number of large contributors
that insisted that they be recognized as copyright holders (which was
entirely their right to do). But it's not truly necessary, and not even
the most accurate way of identifying contributors and/or copyright
holders.
In practice, we've never added anyone to our AUTHORS file. It only lists
Tailscale, so it's not really serving any purpose. It also causes
confusion because Tailscalars put the "Tailscale Inc & AUTHORS" header
in other open source repos which don't actually have an AUTHORS file, so
it's ambiguous what that means.
Instead, we just acknowledge that the contributors to Tailscale (whoever
they are) are copyright holders for their individual contributions. We
also have the benefit of using the DCO (developercertificate.org) which
provides some additional certification of their right to make the
contribution.
The source file changes were purely mechanical with:
git ls-files | xargs sed -i -e 's/\(Tailscale Inc &\) AUTHORS/\1 contributors/g'
Updates #cleanup
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
When we have not yet communicated with a peer, send a
TSMPDiscoAdvertisement to let the peer know of our disco key. This is in
most cases redundant, but will allow us to set up direct connections
when the client cannot access control.
Some parts taken from: #18073
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Moves magicksock.cloudInfo into util/cloudinfo with minimal changes.
Updates #17796
Change-Id: I83f32473b9180074d5cdbf00fa31e5b3f579f189
Signed-off-by: Alex Valiushko <alexvaliushko@tailscale.com>
Adds a new types of TSMP messages for advertising disco keys keys
to/from a peer, and implements the advertising triggered by a TSMP ping.
Needed as part of the effort to cache the netmap and still let clients
connect without control being reachable.
Updates #12639
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Co-authored-by: James Tucker <james@tailscale.com>
Adds the ability to rotate discovery keys on running clients, needed for
testing upcoming disco key distribution changes.
Introduces key.DiscoKey, an atomic container for a disco private key,
public key, and the public key's ShortString, replacing the prior
separate atomic fields.
magicsock.Conn has a new RotateDiscoKey method, and access to this is
provided via localapi and a CLI debug command.
Note that this implementation is primarily for testing as it stands, and
regular use should likely introduce an additional mechanism that allows
the old key to be used for some time, to provide a seamless key rotation
rather than one that invalidates all sessions.
Updates tailscale/corp#34037
Signed-off-by: James Tucker <james@tailscale.com>
It's an unnecessary nuisance having it. We go out of our way to redact
it in so many places when we don't even need it there anyway.
Updates #12639
Change-Id: I5fc72e19e9cf36caeb42cf80ba430873f67167c3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
They distracted me in some refactoring. They're set but never used.
Updates #17858
Change-Id: I6ec7d6841ab684a55bccca7b7cbf7da9c782694f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I noticed a deadlock in a test in a in-development PR where during a
shutdown storm of things (from a tsnet.Server.Close), LocalBackend was
trying to call magicsock.Conn.Synchronize but the magicsock and/or
eventbus was already shut down and no longer processing events.
Updates #16369
Change-Id: I58b1f86c8959303c3fb46e2e3b7f38f6385036f1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
* Remove a couple of single-letter `l` variables
* Use named struct parameters in the test cases for readability
* Delete `wantAfterInactivityForFn` parameter when it returns the
default zero
Updates #cleanup
Signed-off-by: Alex Chan <alexc@tailscale.com>
Merge the connstats package into the netlog package
and unexport all of its declarations.
Remove the buildfeatures.HasConnStats and use HasNetLog instead.
Updates tailscale/corp#33352
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The connstats package was an unnecessary layer of indirection.
It was seperated out of wgengine/netlog so that net/tstun and
wgengine/magicsock wouldn't need a depenedency on the concrete
implementation of network flow logging.
Instead, we simply register a callback for counting connections.
This PR does the bare minimum work to prepare tstun and magicsock
to only care about that callback.
A future PR will delete connstats and merge it into netlog.
Updates tailscale/corp#33352
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
This commit also shuffles the hasPeerRelayServers atomic load
to happen sooner, reducing the cost for clients with no peer relay
servers.
Updates tailscale/corp#33099
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Also pull out interface method only needed in Linux.
Instead of having userspace do the call into the router, just let the
router pick up the change itself.
Updates #15160
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
Saves ~53 KB from the min build.
Updates #12614
Change-Id: I73f9544a9feea06027c6ebdd222d712ada851299
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Due to iOS memory limitations in 2020 (see
https://tailscale.com/blog/go-linker, etc) and wireguard-go using
multiple goroutines per peer, commit 16a9cfe2f4 introduced some
convoluted pathsways through Tailscale to look at packets before
they're delivered to wireguard-go and lazily reconfigure wireguard on
the fly before delivering a packet, only telling wireguard about peers
that are active.
We eventually want to remove that code and integrate wireguard-go's
configuration with Tailscale's existing netmap tracking.
To make it easier to find that code later, this makes it modular. It
saves 12 KB (of disk) to turn it off (at the expense of lots of RAM),
but that's not really the point. The point is rather making it obvious
(via the new constants) where this code even is.
Updates #12614
Change-Id: I113b040f3e35f7d861c457eaa710d35f47cee1cb
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Switching to a Geneve-encapsulated (peer relay) path in
endpoint.handlePongConnLocked is expected around port rebinds, which end
up clearing endpoint.bestAddr.
Fixestailscale/corp#33036
Signed-off-by: Jordan Whited <jordan@tailscale.com>