Files
pnpm/pacquet/tasks/micro-benchmark
Zoltan Kochan 657d322b15 perf(network): schedule tarball downloads by estimated pipeline work (#12309)
## Summary

When the download connection pool saturates, freed slots are granted by a two-class scheduling policy instead of FIFO:

- **Latency class** (packument/metadata fetches, which gate resolution progress): served FIFO.
- **Throughput class** (tarball downloads): ranked by **estimated total pipeline work** — `unpackedSize + 3000 × fileCount` — so the most expensive download+extract jobs start first (longest-processing-time-first; a large archive that starts last runs alone at single-connection throughput while every other slot idles, see [pnpm/pnpm#12230](https://github.com/pnpm/pnpm/issues/12230)). The per-file term prices the fixed CAS-write/hash overhead, so a many-small-files package ranks as the long job it actually is.
- **Neither class can starve the other**: downloads are guaranteed a reserved half of the pool (strict metadata-first was measured to serialize cold installs — no tarball got a slot until resolution drained, costing the whole resolve/fetch overlap), and metadata wins beyond that reserve (a download backlog can't stall resolution). Both directions are work-conserving.

### How the size hints travel

- Local fresh installs read `dist.unpackedSize` / `dist.fileCount` off the resolver-fetched manifest (also fixes exact decompression-buffer preallocation on the prefetch path, previously hardcoded `None`).
- The pnpr `/v1/resolve` `package` frame carries both as optional `unpackedSize` / `fileCount` fields (omitted when the registry never published them; old clients and servers interoperate unchanged).
- pnpr frozen restores: the lockfile records no sizes, but the verification fan-out fetches each entry's metadata anyway — the npm verifier records both stats into an optional `ObservedDistStats` sink as a side product of the tarball-URL binding check, and the frozen fast path announces every verified tarball as a sized `package` frame before `done` (URLs derived by the same `tarball_url_and_integrity` the client materialization uses). Verdict-cache hits fetch no metadata and keep the bare `done` frame.
- pnpr's abbreviated metadata now **preserves** `unpackedSize`/`fileCount` instead of stripping them, since pacquet reads both.
- Resolve-time tarball fetches (tarball deps' manifests come from their archives) acquire in the latency class — they gate the resolver's walk.

### Benchmark tooling

- The integrated benchmark's latency proxy gained `--registry-slow-start`: per-connection TCP slow start (RFC 6928 initial window, doubling per delivered window toward the bandwidth cap), so scheduling effects that depend on per-connection ramp-up are measurable.
- Fixed a macOS bug where the proxy's accepted sockets inherited the listener's `O_NONBLOCK` and every proxied connection died on its first read — all shaped benchmark traffic silently failed before this.

## Measurements

Fixture: ~110 direct deps / 1308 packages (~90 MB wire), `isolated-linker.fresh-install.cold-cache.cold-store`, local mirror of real npm behind the shaped proxy (30 ms RTT, 80 Mbit/s per-connection cap, TCP slow start).

**Drift-controlled interleaved comparison** (4 alternating blocks x 4 runs each; sequential multi-target sessions on this machine showed up to +75% session-order drift, so block-paired ABAB is the only design we trust):

| target | mean +/- sd (n=16) |
| --- | --- |
| baseline FIFO | 14.36 s +/- 0.54 |
| this PR | **14.06 s +/- 0.70** |

The PR wins **all 4 paired blocks** (-0.18 s to -0.50 s, mean -0.30 s, ~2%). A scheduler ablation (reserve+FIFO, smallest-first, unpackedSize-only, work with K=3k and K=10k per file) ordered as the pipeline model predicts, but the per-variant deltas sit inside the session-drift noise, so only the FIFO-vs-full-design pairing is claimed. K in [3000, 10000] is indistinguishable.

**The starvation fix is the load-bearing piece, established mechanistically rather than by wall clock:** with strict metadata-first priority (an intermediate design), cold-install event timelines showed 4-7 s windows at install start with zero tarball activity - downloads never won a slot during the resolution burst, serializing the resolve and fetch phases. The reserved share removes those gaps entirely and the worst observed cold-install runs with it are within ~1 s of the median, where unreserved variants showed multi-second stragglers.

Real-registry A/B (15 randomized cold-install pairs against npmjs) is noise-bound on a saturated ~100 Mbit link (+/-3 s registry variance), median -0.17 s in this PR's favor - consistent with "never slower."
2026-06-11 02:58:36 +02:00
..