mirror of
https://github.com/tailscale/tailscale.git
synced 2026-05-31 12:10:38 -04:00
This NodeCapability works around the UDP GSO bugs introduced by torvalds/linux@b10b446 (v7.0-rc1). These bugs were later fixed by torvalds/linux@78effd8 and torvalds/linux@5f17ae0 (v7.1-rc5). These Linux kernel bugs cause mangled UDP headers and UDP checksums, resulting in high levels of packet loss. The aforementioned bugs have already made their way downstream into various distros, e.g. Ubuntu 26.04 LTS. Impacted users are now dealing with poor UDP performance in tailscaled, and in any other software that makes use of UDP GSO. Not all users of the affected kernels are impacted as the relevant kernel code path sits between kernel and netdev driver, and behaviors vary by driver/device capability. We cannot detect impact at runtime, as this would require gathering all netdevs, and performing loopback tests. This is invasive and in many cases impossible. So, we are left to choose between disabling UDP GSO for all users on affected kernels, whether they experience real impact or not, or try and work around the bugs. Disabling UDP GSO for a user that is not impacted can cut max throughput in half, and consume more CPU cycles. This commit attempts to workaround the bugs by avoiding UDP GSO when batches are small, and injecting a 1-byte sentinel tail payload when they are large. This tail payload is smaller than "GSO size", which sidesteps the primary trigger of all fragments in a batch being equal in length. The end result is slightly increased payload and packet overhead, but functional UDP GSO for all Linux 7.0-7.1.4 users, regardless of netdev/driver. Updates #19777 Signed-off-by: Jordan Whited <jordan@tailscale.com>