The Map cache was only populated after a successful 200 fetch.
On 304 Not Modified and all cache-hit paths (offline, preferOffline,
exact version, publishedBy), the Map was never populated. This meant
every subsequent resolution of the same package within the same install
hit SQLite + JSON.parse again instead of the O(1) Map lookup.
The old LRU cache was populated on both 200 and 304 paths.
The findPending method linearly scanned the pendingWrites array
on every get/getHeaders call. On cold install with 75 packages,
this was O(n²): each new package's cache check scanned all
previously queued writes. Replace with a Map for O(1) lookups.
Remove the setImmediate yield before SQLite reads — it added ~225ms
of scheduling overhead on cold installs (75 packages × 3 checks × 1ms).
Revert all worker thread attempts. The message passing overhead
outweighed the benefit of async SQLite reads.
The final approach: simple synchronous SQLite reads on the main thread
with an in-memory Map cache for same-run dedup. This matches the
original file-based approach's performance characteristics.
The worker thread approach added more overhead (structured clone per
message) than it saved. Revert to main-thread SQLite but add:
1. In-memory Map cache to avoid redundant SQLite reads + JSON.parse
for packages resolved multiple times in the same install
2. setImmediate yield before each SQLite read to unblock the event loop,
allowing pending network callbacks to run between reads
Move SQLite reads/writes, JSON.parse, semver matching, and version
picking from the main thread to the existing worker pool.
The main thread is now a thin orchestrator that only does network I/O.
Resolution uses at most 2 round-trips to the worker:
1. Worker checks SQLite cache → cache hit returns immediately
2. On cache miss, main thread fetches from registry, sends raw JSON
to worker → worker parses, writes cache, picks version
This unblocks the main event loop during resolution — network I/O,
tarball downloads, and linking can proceed while the worker does
CPU-heavy parsing and semver matching.
Replace the two-table design (metadata_index + metadata_blobs) with a
single metadata table storing etag, modified, cached_at, is_full flag,
and the raw JSON blob.
The separate index table added complexity without meaningful benefit —
we parse the full blob anyway to extract the resolved version's manifest
after picking. The single table keeps writes simple (1 INSERT with the
raw registry response) and reads simple (1 SELECT + JSON.parse).
The is_full flag ensures that abbreviated-only cache entries are not
served when full metadata is requested (e.g., for optional dependencies).
Replace metadata_manifests (per-version rows requiring JSON.stringify
per manifest) with metadata_blobs (single raw JSON blob per package).
Write path: store the raw registry response text as-is — zero
serialization on the hot path. Only the compact index fields
(dist-tags, version keys, deprecated flags) are extracted.
Read path: parse the lightweight index for version picking, then
parse the blob and extract just the resolved version's manifest.
This eliminates the cold install regression caused by hundreds of
JSON.stringify calls per install. The index table still provides
cheap header lookups for conditional requests.
Also tracks is_full flag on the index to avoid serving abbreviated
metadata when full is requested (e.g., for optional dependencies).
Add early-return guards to getHeaders, getIndex, getManifest, and
updateCachedAt when the DB has been closed. This prevents "statement
has been finalized" errors when the process exit handler closes the
DB while async operations are still in flight.
Also change store controller close to flush (not close) the metadata
DB, since the exit handler handles cleanup.
SQLite with mmap and page cache serves the same purpose without the
complexity. The LRU cached full PackageMeta objects (all versions
parsed) which wasted memory, had stale-cache risks with lightweight
stubs, and expired after 120s anyway.
Remove PackageMetaCache interface, LRU creation, and all metaCache
threading through the resolver chain.
Split the single-blob metadata storage into two tables:
- metadata_index: dist-tags, version keys (with deprecated), time, and
cache headers — one row per package, ~10KB
- metadata_manifests: per-version manifest objects, keyed by (name,
version, type) — ~2KB each
During resolution, only the lightweight index is parsed to pick a
version. The full manifest for the resolved version is loaded separately.
For a package like typescript with 200+ versions, this avoids parsing
~400KB of unused manifest JSON.
The index is shared across abbreviated/full metadata types — only the
per-version manifests differ. This eliminates the type column from the
index and simplifies the abbreviated→full fallback to the manifest level.
encode-registry replaced : with + for filesystem compatibility. With
SQLite, colons are fine in DB keys. Use URL.host directly with a simple
Map cache for hot-path performance.
- Set meta.modified from DB row in loadMetaFromDb so If-Modified-Since
headers are sent even when modified comes from the DB column
- Use || instead of && for header lookup so both etag and modified are
populated even when one is already available from in-memory metadata
- Check pending writes in MetadataCache.get/getHeaders instead of
flushing in loadMetaFromDb, avoiding synchronous DB writes during
resolution
- Close MetadataCache when store controller is closed to prevent
resource leaks
- Add defensive slash-index guards in cache API functions
Use queueSet() instead of synchronous set() when saving metadata after
registry fetches. Writes are batched and flushed on the next tick in a
single transaction, avoiding blocking the event loop during resolution.
This fixes the cold install regression where hundreds of synchronous
SQLite writes were serializing the resolution phase.
- Remove ABBREVIATED_META_DIR, FULL_META_DIR, FULL_FILTERED_META_DIR
from @pnpm/constants — no longer used
- Update lockfileOnly test to check MetadataCache DB instead of files
- Update prune to also remove metadata.db and its WAL/SHM files
Rewrite cache list, delete, view, and list-registries commands to query
the MetadataCache SQLite DB instead of globbing JSON files on disk.
The cache.cmd handler no longer needs resolutionMode/registrySupportsTimeField
to determine which metadata subdirectory to use — all metadata types are
in the same DB.
Fixes from Gemini review:
- Include registry name in DB cache keys to avoid collisions when the
same package name exists in different registries (e.g., npm vs private)
- Make updateCachedAt implement the same abbreviated→full fallback as
get/getHeaders, so 304 responses correctly update the cached_at
timestamp even when the data was stored under a different type
- Reuse etag/modified from already-loaded metadata instead of making a
redundant getHeaders DB call
Replace the per-package JSON file cache (metadata-v1.4/, metadata-ff-v1.4/)
with a single SQLite database (metadata.db) for registry metadata caching.
Benefits:
- Cheap conditional request header lookups (etag/modified) without parsing
the full metadata JSON — enables If-None-Match/If-Modified-Since with
minimal I/O overhead
- Full metadata can serve abbreviated requests — if a package was previously
fetched as full (e.g., for trustPolicy or resolutionMode), the resolver
reuses it instead of making another registry request
- Eliminates hundreds of individual file read/write/rename operations per
install, replaced by SQLite WAL-mode transactions
- Removes the runLimited/metafileOperationLimits concurrency machinery —
SQLite handles concurrent access natively
New package: @pnpm/cache.metadata — SQLite-backed MetadataCache class
modeled after @pnpm/store.index, with getHeaders() for cheap lookups,
get() with abbreviated→full fallback, and set()/updateCachedAt().
* perf: use abbreviated metadata for minimumReleaseAge when possible
Instead of always fetching full package metadata when minimumReleaseAge
is set, fetch the smaller abbreviated document first and check the
top-level `modified` field. If the package was last modified before the
release age cutoff, all versions are mature and no per-version time
filtering is needed. Only re-fetch full metadata for the rare case of
recently-modified packages.
Also uses fs.stat() to check cache file mtime instead of reading and
parsing the JSON to check cachedAt, avoiding unnecessary I/O.
* fix: validate modified date and handle abbreviated metadata edge cases
- Validate meta.modified date to prevent invalid dates from bypassing
minimumReleaseAge filtering
- Skip full metadata refetch for packages excluded by publishedByExclude
- Allow ERR_PNPM_MISSING_TIME from cached abbreviated metadata to fall
through to the network fetch path instead of throwing
* fix: cache abbreviated metadata before re-fetching full metadata
Save the abbreviated metadata to disk before re-fetching full metadata
so subsequent runs benefit from the mtime cache fast-path.
* fix: resolve type narrowing for conditional metadata fetch result
Before fetching package metadata from the registry, stat the local cache
file and send its mtime as an If-Modified-Since header. If the registry
returns 304 Not Modified, read the local cache instead of downloading
the full response body. This saves bandwidth and latency for packages
whose metadata hasn't changed since the last fetch.
Registries that don't support If-Modified-Since simply return 200 as
before, so there is no behavior change for unsupported registries.
- Enable Happy Eyeballs (`autoSelectFamily`) for faster dual-stack (IPv4/IPv6) connection establishment
- Increase keep-alive timeouts (30s idle, 10min max) to reduce connection churn during install
- Set optimized global dispatcher so requests without custom options still benefit
- Pre-allocate `SharedArrayBuffer` for tarball downloads when `Content-Length` is known, avoiding intermediate chunk array and double-copy
The ENOTSUP fallback in createClonePkg() silently converted clone
failures to file copies, preventing the auto-importer from detecting
that cloning is not supported and falling through to hardlinks.
On filesystems without reflink support (e.g. ext4 on Linux CI),
this caused every file to be copied instead of hardlinked — a 2-9x
regression for install operations on large projects.
The fix uses a raw clone (without ENOTSUP fallback) for the auto-mode
probe. If the filesystem doesn't support cloning, the error propagates
and the auto-importer falls through to hardlinks. Once cloning is
confirmed to work, subsequent packages use the full clone importer
with ENOTSUP fallback for transient failures under heavy parallel I/O.