Zoltan Kochan 657d322b15 perf(network): schedule tarball downloads by estimated pipeline work (#12309)
## Summary

When the download connection pool saturates, freed slots are granted by a two-class scheduling policy instead of FIFO:

- **Latency class** (packument/metadata fetches, which gate resolution progress): served FIFO.
- **Throughput class** (tarball downloads): ranked by **estimated total pipeline work** — `unpackedSize + 3000 × fileCount` — so the most expensive download+extract jobs start first (longest-processing-time-first; a large archive that starts last runs alone at single-connection throughput while every other slot idles, see [pnpm/pnpm#12230](https://github.com/pnpm/pnpm/issues/12230)). The per-file term prices the fixed CAS-write/hash overhead, so a many-small-files package ranks as the long job it actually is.
- **Neither class can starve the other**: downloads are guaranteed a reserved half of the pool (strict metadata-first was measured to serialize cold installs — no tarball got a slot until resolution drained, costing the whole resolve/fetch overlap), and metadata wins beyond that reserve (a download backlog can't stall resolution). Both directions are work-conserving.

### How the size hints travel

- Local fresh installs read `dist.unpackedSize` / `dist.fileCount` off the resolver-fetched manifest (also fixes exact decompression-buffer preallocation on the prefetch path, previously hardcoded `None`).
- The pnpr `/v1/resolve` `package` frame carries both as optional `unpackedSize` / `fileCount` fields (omitted when the registry never published them; old clients and servers interoperate unchanged).
- pnpr frozen restores: the lockfile records no sizes, but the verification fan-out fetches each entry's metadata anyway — the npm verifier records both stats into an optional `ObservedDistStats` sink as a side product of the tarball-URL binding check, and the frozen fast path announces every verified tarball as a sized `package` frame before `done` (URLs derived by the same `tarball_url_and_integrity` the client materialization uses). Verdict-cache hits fetch no metadata and keep the bare `done` frame.
- pnpr's abbreviated metadata now **preserves** `unpackedSize`/`fileCount` instead of stripping them, since pacquet reads both.
- Resolve-time tarball fetches (tarball deps' manifests come from their archives) acquire in the latency class — they gate the resolver's walk.

### Benchmark tooling

- The integrated benchmark's latency proxy gained `--registry-slow-start`: per-connection TCP slow start (RFC 6928 initial window, doubling per delivered window toward the bandwidth cap), so scheduling effects that depend on per-connection ramp-up are measurable.
- Fixed a macOS bug where the proxy's accepted sockets inherited the listener's `O_NONBLOCK` and every proxied connection died on its first read — all shaped benchmark traffic silently failed before this.

## Measurements

Fixture: ~110 direct deps / 1308 packages (~90 MB wire), `isolated-linker.fresh-install.cold-cache.cold-store`, local mirror of real npm behind the shaped proxy (30 ms RTT, 80 Mbit/s per-connection cap, TCP slow start).

**Drift-controlled interleaved comparison** (4 alternating blocks x 4 runs each; sequential multi-target sessions on this machine showed up to +75% session-order drift, so block-paired ABAB is the only design we trust):

| target | mean +/- sd (n=16) |
| --- | --- |
| baseline FIFO | 14.36 s +/- 0.54 |
| this PR | **14.06 s +/- 0.70** |

The PR wins **all 4 paired blocks** (-0.18 s to -0.50 s, mean -0.30 s, ~2%). A scheduler ablation (reserve+FIFO, smallest-first, unpackedSize-only, work with K=3k and K=10k per file) ordered as the pipeline model predicts, but the per-variant deltas sit inside the session-drift noise, so only the FIFO-vs-full-design pairing is claimed. K in [3000, 10000] is indistinguishable.

**The starvation fix is the load-bearing piece, established mechanistically rather than by wall clock:** with strict metadata-first priority (an intermediate design), cold-install event timelines showed 4-7 s windows at install start with zero tarball activity - downloads never won a slot during the resolution burst, serializing the resolve and fetch phases. The reserved share removes those gaps entirely and the worst observed cold-install runs with it are within ~1 s of the median, where unreserved variants showed multi-second stragglers.

Real-registry A/B (15 randomized cold-install pairs against npmjs) is noise-bound on a saturated ~100 Mbit link (+/-3 s registry variance), median -0.17 s in this PR's favor - consistent with "never slower."
2026-06-11 02:58:36 +02:00
2026-04-10 18:30:33 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-05-24 02:23:07 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-04-30 23:19:31 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-04-30 23:03:46 +02:00
2026-06-10 12:40:29 +02:00
2026-06-10 12:40:29 +02:00
2026-01-16 16:31:31 +01:00
2024-03-21 01:09:22 +01:00

简体中文 | 日本語 | 한국어 | Italiano | Português Brasileiro

pnpm

Fast, disk space efficient package manager:

  • Fast. Up to 2x faster than the alternatives (see benchmark).
  • Efficient. Files inside node_modules are linked from a single content-addressable storage.
  • Great for monorepos.
  • Strict. A package can access only dependencies that are specified in its package.json.
  • Deterministic. Has a lockfile called pnpm-lock.yaml.
  • Works as a Node.js version manager. See pnpm runtime.
  • Works everywhere. Supports Windows, Linux, and macOS.
  • Battle-tested. Used in production by teams of all sizes since 2016.
  • See the full feature comparison with npm and Yarn.

To quote the Rush team:

Microsoft uses pnpm in Rush repos with hundreds of projects and hundreds of PRs per day, and weve found it to be very fast and reliable.

npm version OpenCollective OpenCollective X Follow Stand With Ukraine

Platinum Sponsors

Bit OpenAI

Gold Sponsors

Sanity Discord Vite
SerpApi CodeRabbit Stackblitz
Workleap Nx

Silver Sponsors

Replit Cybozu BairesDev
devowl.io u|screen Leniolabs_
Depot Cerbos ⏱️ Time.now

Support this project by becoming a sponsor.

Background

pnpm uses a content-addressable filesystem to store all files from all module directories on a disk. When using npm, if you have 100 projects using lodash, you will have 100 copies of lodash on disk. With pnpm, lodash will be stored in a content-addressable storage, so:

  1. If you depend on different versions of lodash, only the files that differ are added to the store. If lodash has 100 files, and a new version has a change only in one of those files, pnpm update will only add 1 new file to the storage.
  2. All the files are saved in a single place on the disk. When packages are installed, their files are linked from that single place consuming no additional disk space. Linking is performed using either hard-links or reflinks (copy-on-write).

As a result, you save gigabytes of space on your disk and you have a lot faster installations! If you'd like more details about the unique node_modules structure that pnpm creates and why it works fine with the Node.js ecosystem, read this small article: Flat node_modules is not the only way.

💖 Like this project? Let people know with a tweet

Getting Started

Benchmark

pnpm is up to 2x faster than npm and Yarn classic. See all benchmarks here.

Benchmarks on an app with lots of dependencies:

License

MIT, except the pnpr/ directory, which is source-available under the PolyForm Shield License 1.0.0.

Description
No description provided
Readme MIT 345 MiB
Languages
Rust 55.9%
TypeScript 43.5%
JavaScript 0.5%