The findPending method linearly scanned the pendingWrites array
on every get/getHeaders call. On cold install with 75 packages,
this was O(n²): each new package's cache check scanned all
previously queued writes. Replace with a Map for O(1) lookups.
Replace the two-table design (metadata_index + metadata_blobs) with a
single metadata table storing etag, modified, cached_at, is_full flag,
and the raw JSON blob.
The separate index table added complexity without meaningful benefit —
we parse the full blob anyway to extract the resolved version's manifest
after picking. The single table keeps writes simple (1 INSERT with the
raw registry response) and reads simple (1 SELECT + JSON.parse).
The is_full flag ensures that abbreviated-only cache entries are not
served when full metadata is requested (e.g., for optional dependencies).
Replace metadata_manifests (per-version rows requiring JSON.stringify
per manifest) with metadata_blobs (single raw JSON blob per package).
Write path: store the raw registry response text as-is — zero
serialization on the hot path. Only the compact index fields
(dist-tags, version keys, deprecated flags) are extracted.
Read path: parse the lightweight index for version picking, then
parse the blob and extract just the resolved version's manifest.
This eliminates the cold install regression caused by hundreds of
JSON.stringify calls per install. The index table still provides
cheap header lookups for conditional requests.
Also tracks is_full flag on the index to avoid serving abbreviated
metadata when full is requested (e.g., for optional dependencies).
Add early-return guards to getHeaders, getIndex, getManifest, and
updateCachedAt when the DB has been closed. This prevents "statement
has been finalized" errors when the process exit handler closes the
DB while async operations are still in flight.
Also change store controller close to flush (not close) the metadata
DB, since the exit handler handles cleanup.
Split the single-blob metadata storage into two tables:
- metadata_index: dist-tags, version keys (with deprecated), time, and
cache headers — one row per package, ~10KB
- metadata_manifests: per-version manifest objects, keyed by (name,
version, type) — ~2KB each
During resolution, only the lightweight index is parsed to pick a
version. The full manifest for the resolved version is loaded separately.
For a package like typescript with 200+ versions, this avoids parsing
~400KB of unused manifest JSON.
The index is shared across abbreviated/full metadata types — only the
per-version manifests differ. This eliminates the type column from the
index and simplifies the abbreviated→full fallback to the manifest level.
- Set meta.modified from DB row in loadMetaFromDb so If-Modified-Since
headers are sent even when modified comes from the DB column
- Use || instead of && for header lookup so both etag and modified are
populated even when one is already available from in-memory metadata
- Check pending writes in MetadataCache.get/getHeaders instead of
flushing in loadMetaFromDb, avoiding synchronous DB writes during
resolution
- Close MetadataCache when store controller is closed to prevent
resource leaks
- Add defensive slash-index guards in cache API functions
Use queueSet() instead of synchronous set() when saving metadata after
registry fetches. Writes are batched and flushed on the next tick in a
single transaction, avoiding blocking the event loop during resolution.
This fixes the cold install regression where hundreds of synchronous
SQLite writes were serializing the resolution phase.
Fixes from Gemini review:
- Include registry name in DB cache keys to avoid collisions when the
same package name exists in different registries (e.g., npm vs private)
- Make updateCachedAt implement the same abbreviated→full fallback as
get/getHeaders, so 304 responses correctly update the cached_at
timestamp even when the data was stored under a different type
- Reuse etag/modified from already-loaded metadata instead of making a
redundant getHeaders DB call
Replace the per-package JSON file cache (metadata-v1.4/, metadata-ff-v1.4/)
with a single SQLite database (metadata.db) for registry metadata caching.
Benefits:
- Cheap conditional request header lookups (etag/modified) without parsing
the full metadata JSON — enables If-None-Match/If-Modified-Since with
minimal I/O overhead
- Full metadata can serve abbreviated requests — if a package was previously
fetched as full (e.g., for trustPolicy or resolutionMode), the resolver
reuses it instead of making another registry request
- Eliminates hundreds of individual file read/write/rename operations per
install, replaced by SQLite WAL-mode transactions
- Removes the runLimited/metafileOperationLimits concurrency machinery —
SQLite handles concurrent access natively
New package: @pnpm/cache.metadata — SQLite-backed MetadataCache class
modeled after @pnpm/store.index, with getHeaders() for cheap lookups,
get() with abbreviated→full fallback, and set()/updateCachedAt().