mirror of https://github.com/tailscale/tailscale.git synced 2026-05-29 11:11:31 -04:00

Files

Brad Fitzpatrick 95d874e9b4 cmd/testwrapper: surface race reports and skip retries when detected

A data race in a package matters more than any individual test
result. Two related problems:

1. Where go test's race detector text ("WARNING: DATA RACE" plus
   the goroutine stack traces) lands in JSON output is timing-
   dependent: it can be attributed to a test that ends up reporting
   PASS (e.g. when the racing goroutines outlive the test that
   spawned them and TSan prints during a different test's window).
   testwrapper's main loop only flushes the logs of failed tests,
   so the race report ends up stuck in a passing test's buffer and
   is silently dropped. The race builders just see a bare
   "FAIL\nFAIL\tpkg\ttime".

2. If the failing test in such a package happens to be marked flaky,
   testwrapper retries it. That is the worst possible response to a
   race: the flaky test might not even be the racy code, and a
   second run without the racy goroutines could "succeed" while
   hiding the real bug.

Address both: scan every output line for the race detector's first-
line marker. Track whether the package observed a race at all, on
the pkgFinished testAttempt. When a race was seen, fold every per-
test log buffer into the package-level logs (so the full report
surfaces from the existing pkg-fail flush path), and drop any
flaky-test retry plans for that package so we fail immediately
instead of running another attempt.

Two new tests:
- TestRaceSuppressesFlakyRetry verifies that a flaky test alongside
  a racy test does NOT get retried.
- TestRaceAttributedToPassingTest verifies that a race attributed by
  test2json to a passing test still surfaces in the output.

Also add a corpus of captured raw test binary outputs under
cmd/testwrapper/testdata/, with one subdirectory per scenario,
documenting the six representative shapes that go test -race can
emit (race in test body, race in goroutines that outlive a test,
race forced into a later test, race in TestMain post-m.Run, and a
parallel-tests split-attribution case via a "=== NAME" redirect
line). See its README.md for details.

Fixes #19603

Change-Id: Ifbfcd67fb3b1882c4907bd9cb2d68a8b5a91dd54
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>

2026-05-19 21:21:05 -07:00

A_baseline

cmd/testwrapper: surface race reports and skip retries when detected

2026-05-19 21:21:05 -07:00

B_inbody

cmd/testwrapper: surface race reports and skip retries when detected

2026-05-19 21:21:05 -07:00

C_spawnwait

cmd/testwrapper: surface race reports and skip retries when detected

2026-05-19 21:21:05 -07:00

D_delayed

cmd/testwrapper: surface race reports and skip retries when detected

2026-05-19 21:21:05 -07:00

E_testmain

cmd/testwrapper: surface race reports and skip retries when detected

2026-05-19 21:21:05 -07:00

F_parallel

cmd/testwrapper: surface race reports and skip retries when detected

2026-05-19 21:21:05 -07:00

README.md

cmd/testwrapper: surface race reports and skip retries when detected

2026-05-19 21:21:05 -07:00

README.md

Race-output test corpus

This directory is a corpus of captured Go test binary outputs that exercise the various ways the -race detector's WARNING: DATA RACE text can land relative to === RUN / --- PASS: / --- FAIL: / === NAME lines, and how cmd/internal/test2json attributes that output to tests.

Each scenario subdirectory contains:

src.go — the Go source code that was compiled and run to produce the captured output. Reproduce via go test -race -c -o /tmp/scenario.test ./<dir>/.
raw.txt (or scenario-specific name) — the raw stdout+stderr of the resulting test binary when run as ./scenario.test -test.v. This is the byte stream that go test -json feeds to go tool test2json in production.

go test -json adds two things on top of what test2json sees, which are NOT in these captures: a FAIL\t<pkg>\t<time> output line and a corresponding fail action event when the binary exits with a non-zero status (e.g. the race detector's exit code 66). Consumers that simulate the go test -json view from these files need to append those.

To turn a raw.txt into the JSON testwrapper would consume:

go tool test2json -p <pkgname> < raw.txt

Scenarios

A_baseline — no race

Two trivial passing tests, no -race involvement at all. Useful as a sanity-check for parsers; demonstrates the standard mapping from === RUN/--- PASS:/bare PASS lines to JSON events.

B_inbody — race during a test's own body

A single test spawns racing goroutines and wg.Wait()s for them. The race fires while the test is still running, so testing.checkRaces sees race.Errors() increment between the test's start and end and the test is marked --- FAIL: with a testing.go:1712: race detected during execution of test line.

This is the well-behaved case: the race output and the failure marker are both attributed to a single, clearly-failed test.

C_spawnwait — race in goroutines that outlive a test

TestSpawn spawns racing goroutines and returns immediately; TestWait blocks until they finish. Where the race report lands is timing-dependent. Two captured variants:

pass.raw.txt — both TestSpawn and TestWait end up marked --- PASS:, with the entire WARNING: DATA RACE report attributed to TestWait (the test whose === RUN line was the most recent thing emitted when TSan printed). Neither test's checkRaces saw race.Errors() increment within its window. The binary still exits non-zero, producing only a bare trailing FAIL. This is the case where testwrapper currently drops the race report on the floor.
caught.raw.txt — TestSpawn's checkRaces happened to fire after the goroutines raced, so it emits the standard testing.go:1712: + --- FAIL: TestSpawn markers. Race report is attributed to TestSpawn, and the failure is attributed to TestSpawn too. Well-behaved.

D_delayed — race delayed into the next test's body

Like C_spawnwait but the spawned goroutines block on a channel that the second test closes, then sleeps. This forces the race counter delta to land between the second test's resetRaces and checkRaces, so the framework reliably attributes a failure to the sleeping test. Compare with B_inbody: same end shape, but the failure is attributed to a test that doesn't contain the racing code.

E_testmain — race in `TestMain` after `m.Run()`

The race-producing goroutines are spawned by TestMain after m.Run() returns. By that point the framework has already printed its top-level PASS summary, so test2json's c.testName is reset to "" and the entire race report shows up at package level — all of its output events have an empty Test field. The bare PASS line earlier in the output also causes test2json to set c.result = "pass", which means go tool test2json in isolation ends up emitting {"Action":"pass","Package":"..."} for the package; only go test's own exit-code post-processing turns it into the visible package-level failure.

F_parallel — racing parallel tests

Two t.Parallel() tests both increment the same global. The captured split.raw.txt shows the report split across two tests via a === NAME TestParB redirect line emitted by the framework:

--- PASS: TestParA (0.00s)
==================
WARNING: DATA RACE
   ... full race report ...
==================
=== NAME  TestParB
    testing.go:1712: race detected during execution of test
--- FAIL: TestParB (0.00s)

test2json attributes the WARNING: DATA RACE block plus the full stack trace to TestParA (which passes), then sees the === NAME TestParB directive and switches attribution, so the trailing race detected line and --- FAIL end up under TestParB (which fails). A consumer that only flushes failed tests' logs sees the race detected line but loses the stack trace.

How these were captured

# For each scenario directory <name>/ (e.g. C_spawnwait):
go test -race -c -o /tmp/<name>.test ./testdata/<name>/
/tmp/<name>.test -test.v >./testdata/<name>/raw.txt 2>&1

For scenarios where the race detector's attribution is timing-dependent (C, F), the binary was run repeatedly and the interesting variants saved with distinct names.

The captured outputs contain absolute file paths from the machine they were recorded on (e.g. /tmp/racesurvey/...). Tests that match against these files should not depend on those paths.

README.md

Race-output test corpus

Scenarios

A_baseline — no race

B_inbody — race during a test's own body

C_spawnwait — race in goroutines that outlive a test

D_delayed — race delayed into the next test's body

E_testmain — race in TestMain after m.Run()

F_parallel — racing parallel tests

How these were captured

E_testmain — race in `TestMain` after `m.Run()`