Commit Graph

21 Commits

Author SHA1 Message Date
Zoltan Kochan
99a70f9798 perf: fragmented flushing and optimized SQLite schema 2026-04-03 01:24:01 +02:00
Zoltan Kochan
8ce3745097 perf: lazy header cache and exclusive SQLite locking 2026-04-03 01:01:10 +02:00
Zoltan Kochan
b94d828ef2 perf: synchronous header cache and PRAGMA optimizations 2026-04-03 00:55:52 +02:00
Zoltan Kochan
9334fd7117 perf: in-memory header cache and optimized sync calls 2026-04-03 00:50:23 +02:00
Zoltan Kochan
3eb53e34c2 perf: lazy load metadata with covering index and fix dbName collisions 2026-04-02 22:51:08 +02:00
Zoltan Kochan
d2d471d331 perf: lazy load metadata and use covering index for headers 2026-04-02 22:42:57 +02:00
Zoltan Kochan
05f7d33dba perf: simplify metadata cache and remove extra 304 writes 2026-04-02 20:57:50 +02:00
Zoltan Kochan
980362c186 perf: batch metadata lookups to reduce synchronous overhead 2026-04-02 11:12:21 +02:00
Zoltan Kochan
1a9d0c75cf perf: global write batching and in-memory caching for metadata 2026-04-02 10:35:10 +02:00
Zoltan Kochan
155b9106ab perf: further optimize metadata cache and restore table schema 2026-04-02 02:06:00 +02:00
Zoltan Kochan
637cbb1e84 perf: optimize metadata cache sync transactions and DB lookups 2026-04-02 01:37:22 +02:00
Zoltan Kochan
6bb3f3d947 perf: use Map for pending write lookups instead of linear scan
The findPending method linearly scanned the pendingWrites array
on every get/getHeaders call. On cold install with 75 packages,
this was O(n²): each new package's cache check scanned all
previously queued writes. Replace with a Map for O(1) lookups.
2026-04-02 00:07:08 +02:00
Zoltan Kochan
a9fc72c3ed refactor: simplify to single-table schema
Replace the two-table design (metadata_index + metadata_blobs) with a
single metadata table storing etag, modified, cached_at, is_full flag,
and the raw JSON blob.

The separate index table added complexity without meaningful benefit —
we parse the full blob anyway to extract the resolved version's manifest
after picking. The single table keeps writes simple (1 INSERT with the
raw registry response) and reads simple (1 SELECT + JSON.parse).

The is_full flag ensures that abbreviated-only cache entries are not
served when full metadata is requested (e.g., for optional dependencies).
2026-04-01 17:31:57 +02:00
Zoltan Kochan
d09d9d8efb perf: store raw JSON blob instead of per-version manifests
Replace metadata_manifests (per-version rows requiring JSON.stringify
per manifest) with metadata_blobs (single raw JSON blob per package).

Write path: store the raw registry response text as-is — zero
serialization on the hot path. Only the compact index fields
(dist-tags, version keys, deprecated flags) are extracted.

Read path: parse the lightweight index for version picking, then
parse the blob and extract just the resolved version's manifest.

This eliminates the cold install regression caused by hundreds of
JSON.stringify calls per install. The index table still provides
cheap header lookups for conditional requests.

Also tracks is_full flag on the index to avoid serving abbreviated
metadata when full is requested (e.g., for optional dependencies).
2026-04-01 17:19:04 +02:00
Zoltan Kochan
cbf7dfcf65 fix: guard MetadataCache methods against use after close
Add early-return guards to getHeaders, getIndex, getManifest, and
updateCachedAt when the DB has been closed. This prevents "statement
has been finalized" errors when the process exit handler closes the
DB while async operations are still in flight.

Also change store controller close to flush (not close) the metadata
DB, since the exit handler handles cleanup.
2026-04-01 16:55:07 +02:00
Zoltan Kochan
e046348a27 fix: allow arbitrary fields in queueWrite versions record 2026-04-01 15:39:12 +02:00
Zoltan Kochan
d36a005dc5 perf: decompose metadata cache into index + per-version manifests
Split the single-blob metadata storage into two tables:
- metadata_index: dist-tags, version keys (with deprecated), time, and
  cache headers — one row per package, ~10KB
- metadata_manifests: per-version manifest objects, keyed by (name,
  version, type) — ~2KB each

During resolution, only the lightweight index is parsed to pick a
version. The full manifest for the resolved version is loaded separately.
For a package like typescript with 200+ versions, this avoids parsing
~400KB of unused manifest JSON.

The index is shared across abbreviated/full metadata types — only the
per-version manifests differ. This eliminates the type column from the
index and simplifies the abbreviated→full fallback to the manifest level.
2026-04-01 15:32:15 +02:00
Zoltan Kochan
3ea70465a1 fix: address Copilot review feedback
- Set meta.modified from DB row in loadMetaFromDb so If-Modified-Since
  headers are sent even when modified comes from the DB column
- Use || instead of && for header lookup so both etag and modified are
  populated even when one is already available from in-memory metadata
- Check pending writes in MetadataCache.get/getHeaders instead of
  flushing in loadMetaFromDb, avoiding synchronous DB writes during
  resolution
- Close MetadataCache when store controller is closed to prevent
  resource leaks
- Add defensive slash-index guards in cache API functions
2026-04-01 14:55:49 +02:00
Zoltan Kochan
83439ab4a2 perf: batch metadata DB writes via queueSet/flush
Use queueSet() instead of synchronous set() when saving metadata after
registry fetches. Writes are batched and flushed on the next tick in a
single transaction, avoiding blocking the event loop during resolution.

This fixes the cold install regression where hundreds of synchronous
SQLite writes were serializing the resolution phase.
2026-04-01 14:55:49 +02:00
Zoltan Kochan
9038774fea fix: registry-scoped cache keys, updateCachedAt fallback, avoid redundant DB lookups
Fixes from Gemini review:

- Include registry name in DB cache keys to avoid collisions when the
  same package name exists in different registries (e.g., npm vs private)
- Make updateCachedAt implement the same abbreviated→full fallback as
  get/getHeaders, so 304 responses correctly update the cached_at
  timestamp even when the data was stored under a different type
- Reuse etag/modified from already-loaded metadata instead of making a
  redundant getHeaders DB call
2026-04-01 14:55:18 +02:00
Zoltan Kochan
89ea578e68 perf: replace file-based metadata cache with SQLite
Replace the per-package JSON file cache (metadata-v1.4/, metadata-ff-v1.4/)
with a single SQLite database (metadata.db) for registry metadata caching.

Benefits:
- Cheap conditional request header lookups (etag/modified) without parsing
  the full metadata JSON — enables If-None-Match/If-Modified-Since with
  minimal I/O overhead
- Full metadata can serve abbreviated requests — if a package was previously
  fetched as full (e.g., for trustPolicy or resolutionMode), the resolver
  reuses it instead of making another registry request
- Eliminates hundreds of individual file read/write/rename operations per
  install, replaced by SQLite WAL-mode transactions
- Removes the runLimited/metafileOperationLimits concurrency machinery —
  SQLite handles concurrent access natively

New package: @pnpm/cache.metadata — SQLite-backed MetadataCache class
modeled after @pnpm/store.index, with getHeaders() for cheap lookups,
get() with abbreviated→full fallback, and set()/updateCachedAt().
2026-04-01 14:54:13 +02:00