Commit Graph

10671 Commits

Author SHA1 Message Date
Zoltan Kochan
99a70f9798 perf: fragmented flushing and optimized SQLite schema 2026-04-03 01:24:01 +02:00
Zoltan Kochan
8ce3745097 perf: lazy header cache and exclusive SQLite locking 2026-04-03 01:01:10 +02:00
Zoltan Kochan
b94d828ef2 perf: synchronous header cache and PRAGMA optimizations 2026-04-03 00:55:52 +02:00
Zoltan Kochan
9334fd7117 perf: in-memory header cache and optimized sync calls 2026-04-03 00:50:23 +02:00
Zoltan Kochan
3eb53e34c2 perf: lazy load metadata with covering index and fix dbName collisions 2026-04-02 22:51:08 +02:00
Zoltan Kochan
d2d471d331 perf: lazy load metadata and use covering index for headers 2026-04-02 22:42:57 +02:00
Zoltan Kochan
47074b03d1 fix: handle Uint8Array metadata in cacheView and tests 2026-04-02 22:25:32 +02:00
Zoltan Kochan
05f7d33dba perf: simplify metadata cache and remove extra 304 writes 2026-04-02 20:57:50 +02:00
Zoltan Kochan
980362c186 perf: batch metadata lookups to reduce synchronous overhead 2026-04-02 11:12:21 +02:00
Zoltan Kochan
1a9d0c75cf perf: global write batching and in-memory caching for metadata 2026-04-02 10:35:10 +02:00
Zoltan Kochan
155b9106ab perf: further optimize metadata cache and restore table schema 2026-04-02 02:06:00 +02:00
Zoltan Kochan
637cbb1e84 perf: optimize metadata cache sync transactions and DB lookups 2026-04-02 01:37:22 +02:00
Zoltan Kochan
03c8a05509 perf: populate in-memory Map cache on all return paths
The Map cache was only populated after a successful 200 fetch.
On 304 Not Modified and all cache-hit paths (offline, preferOffline,
exact version, publishedBy), the Map was never populated. This meant
every subsequent resolution of the same package within the same install
hit SQLite + JSON.parse again instead of the O(1) Map lookup.

The old LRU cache was populated on both 200 and 304 paths.
2026-04-02 00:15:12 +02:00
Zoltan Kochan
6bb3f3d947 perf: use Map for pending write lookups instead of linear scan
The findPending method linearly scanned the pendingWrites array
on every get/getHeaders call. On cold install with 75 packages,
this was O(n²): each new package's cache check scanned all
previously queued writes. Replace with a Map for O(1) lookups.
2026-04-02 00:07:08 +02:00
Zoltan Kochan
bc68609def perf: remove setImmediate yield, revert worker attempts
Remove the setImmediate yield before SQLite reads — it added ~225ms
of scheduling overhead on cold installs (75 packages × 3 checks × 1ms).

Revert all worker thread attempts. The message passing overhead
outweighed the benefit of async SQLite reads.

The final approach: simple synchronous SQLite reads on the main thread
with an in-memory Map cache for same-run dedup. This matches the
original file-based approach's performance characteristics.
2026-04-01 23:11:07 +02:00
Zoltan Kochan
261ea0fad1 perf: revert worker thread, add in-memory cache + setImmediate yield
The worker thread approach added more overhead (structured clone per
message) than it saved. Revert to main-thread SQLite but add:

1. In-memory Map cache to avoid redundant SQLite reads + JSON.parse
   for packages resolved multiple times in the same install
2. setImmediate yield before each SQLite read to unblock the event loop,
   allowing pending network callbacks to run between reads
2026-04-01 19:42:45 +02:00
Zoltan Kochan
73d9a83f52 Revert "perf: move resolver CPU work to worker thread"
This reverts commit 2c4e9cc263.
2026-04-01 19:05:28 +02:00
Zoltan Kochan
2c4e9cc263 perf: move resolver CPU work to worker thread
Move SQLite reads/writes, JSON.parse, semver matching, and version
picking from the main thread to the existing worker pool.

The main thread is now a thin orchestrator that only does network I/O.
Resolution uses at most 2 round-trips to the worker:
1. Worker checks SQLite cache → cache hit returns immediately
2. On cache miss, main thread fetches from registry, sends raw JSON
   to worker → worker parses, writes cache, picks version

This unblocks the main event loop during resolution — network I/O,
tarball downloads, and linking can proceed while the worker does
CPU-heavy parsing and semver matching.
2026-04-01 18:59:18 +02:00
Zoltan Kochan
f05ddb9caf fix: use colon in registry host for lockfileOnly test 2026-04-01 18:20:50 +02:00
Zoltan Kochan
cf19f3515c fix: accept any cached metadata on 304 regardless of is_full flag 2026-04-01 17:43:40 +02:00
Zoltan Kochan
a9fc72c3ed refactor: simplify to single-table schema
Replace the two-table design (metadata_index + metadata_blobs) with a
single metadata table storing etag, modified, cached_at, is_full flag,
and the raw JSON blob.

The separate index table added complexity without meaningful benefit —
we parse the full blob anyway to extract the resolved version's manifest
after picking. The single table keeps writes simple (1 INSERT with the
raw registry response) and reads simple (1 SELECT + JSON.parse).

The is_full flag ensures that abbreviated-only cache entries are not
served when full metadata is requested (e.g., for optional dependencies).
2026-04-01 17:31:57 +02:00
Zoltan Kochan
d09d9d8efb perf: store raw JSON blob instead of per-version manifests
Replace metadata_manifests (per-version rows requiring JSON.stringify
per manifest) with metadata_blobs (single raw JSON blob per package).

Write path: store the raw registry response text as-is — zero
serialization on the hot path. Only the compact index fields
(dist-tags, version keys, deprecated flags) are extracted.

Read path: parse the lightweight index for version picking, then
parse the blob and extract just the resolved version's manifest.

This eliminates the cold install regression caused by hundreds of
JSON.stringify calls per install. The index table still provides
cheap header lookups for conditional requests.

Also tracks is_full flag on the index to avoid serving abbreviated
metadata when full is requested (e.g., for optional dependencies).
2026-04-01 17:19:04 +02:00
Zoltan Kochan
cbf7dfcf65 fix: guard MetadataCache methods against use after close
Add early-return guards to getHeaders, getIndex, getManifest, and
updateCachedAt when the DB has been closed. This prevents "statement
has been finalized" errors when the process exit handler closes the
DB while async operations are still in flight.

Also change store controller close to flush (not close) the metadata
DB, since the exit handler handles cleanup.
2026-04-01 16:55:07 +02:00
Zoltan Kochan
a042ebb8a2 fix: remove accidentally committed metadata.db files, add to gitignore 2026-04-01 16:29:14 +02:00
Zoltan Kochan
a97e5cb15a refactor: remove in-memory LRU metadata cache
SQLite with mmap and page cache serves the same purpose without the
complexity. The LRU cached full PackageMeta objects (all versions
parsed) which wasted memory, had stale-cache risks with lightweight
stubs, and expired after 120s anyway.

Remove PackageMetaCache interface, LRU creation, and all metaCache
threading through the resolver chain.
2026-04-01 16:28:45 +02:00
Zoltan Kochan
9717d73b60 refactor: remove dead mutation in resolveFromIndex 2026-04-01 16:16:37 +02:00
Zoltan Kochan
1348834640 fix: don't cache lightweight meta stubs in the in-memory LRU 2026-04-01 16:02:47 +02:00
Zoltan Kochan
e046348a27 fix: allow arbitrary fields in queueWrite versions record 2026-04-01 15:39:12 +02:00
Zoltan Kochan
d36a005dc5 perf: decompose metadata cache into index + per-version manifests
Split the single-blob metadata storage into two tables:
- metadata_index: dist-tags, version keys (with deprecated), time, and
  cache headers — one row per package, ~10KB
- metadata_manifests: per-version manifest objects, keyed by (name,
  version, type) — ~2KB each

During resolution, only the lightweight index is parsed to pick a
version. The full manifest for the resolved version is loaded separately.
For a package like typescript with 200+ versions, this avoids parsing
~400KB of unused manifest JSON.

The index is shared across abbreviated/full metadata types — only the
per-version manifests differ. This eliminates the type column from the
index and simplifies the abbreviated→full fallback to the manifest level.
2026-04-01 15:32:15 +02:00
Zoltan Kochan
f2ac90ce01 refactor: drop encode-registry, use URL.host directly
encode-registry replaced : with + for filesystem compatibility. With
SQLite, colons are fine in DB keys. Use URL.host directly with a simple
Map cache for hot-path performance.
2026-04-01 15:10:36 +02:00
Zoltan Kochan
133c1213fa fix: remove unreachable meta.time?.modified inside meta.time == null branch 2026-04-01 14:57:12 +02:00
Zoltan Kochan
3ea70465a1 fix: address Copilot review feedback
- Set meta.modified from DB row in loadMetaFromDb so If-Modified-Since
  headers are sent even when modified comes from the DB column
- Use || instead of && for header lookup so both etag and modified are
  populated even when one is already available from in-memory metadata
- Check pending writes in MetadataCache.get/getHeaders instead of
  flushing in loadMetaFromDb, avoiding synchronous DB writes during
  resolution
- Close MetadataCache when store controller is closed to prevent
  resource leaks
- Add defensive slash-index guards in cache API functions
2026-04-01 14:55:49 +02:00
Zoltan Kochan
83439ab4a2 perf: batch metadata DB writes via queueSet/flush
Use queueSet() instead of synchronous set() when saving metadata after
registry fetches. Writes are batched and flushed on the next tick in a
single transaction, avoiding blocking the event loop during resolution.

This fixes the cold install regression where hundreds of synchronous
SQLite writes were serializing the resolution phase.
2026-04-01 14:55:49 +02:00
Zoltan Kochan
1abbcf8d02 fix: use real temp directories in installing client tests 2026-04-01 14:55:20 +02:00
Zoltan Kochan
c9eeff27b7 fix: add cache/metadata tsconfig reference to deps-installer 2026-04-01 14:55:20 +02:00
Zoltan Kochan
ed6bae5b9c refactor: remove legacy metadata directory constants, update remaining tests
- Remove ABBREVIATED_META_DIR, FULL_META_DIR, FULL_FILTERED_META_DIR
  from @pnpm/constants — no longer used
- Update lockfileOnly test to check MetadataCache DB instead of files
- Update prune to also remove metadata.db and its WAL/SHM files
2026-04-01 14:55:19 +02:00
Zoltan Kochan
075e313b81 fix: sort tsconfig references 2026-04-01 14:55:19 +02:00
Zoltan Kochan
e1b9a9ad68 feat: update pnpm cache commands to use SQLite metadata cache
Rewrite cache list, delete, view, and list-registries commands to query
the MetadataCache SQLite DB instead of globbing JSON files on disk.

The cache.cmd handler no longer needs resolutionMode/registrySupportsTimeField
to determine which metadata subdirectory to use — all metadata types are
in the same DB.
2026-04-01 14:55:19 +02:00
Zoltan Kochan
9038774fea fix: registry-scoped cache keys, updateCachedAt fallback, avoid redundant DB lookups
Fixes from Gemini review:

- Include registry name in DB cache keys to avoid collisions when the
  same package name exists in different registries (e.g., npm vs private)
- Make updateCachedAt implement the same abbreviated→full fallback as
  get/getHeaders, so 304 responses correctly update the cached_at
  timestamp even when the data was stored under a different type
- Reuse etag/modified from already-loaded metadata instead of making a
  redundant getHeaders DB call
2026-04-01 14:55:18 +02:00
Zoltan Kochan
5c383b2866 fix: sort dependencies and tsconfig references alphabetically 2026-04-01 14:54:13 +02:00
Zoltan Kochan
89ea578e68 perf: replace file-based metadata cache with SQLite
Replace the per-package JSON file cache (metadata-v1.4/, metadata-ff-v1.4/)
with a single SQLite database (metadata.db) for registry metadata caching.

Benefits:
- Cheap conditional request header lookups (etag/modified) without parsing
  the full metadata JSON — enables If-None-Match/If-Modified-Since with
  minimal I/O overhead
- Full metadata can serve abbreviated requests — if a package was previously
  fetched as full (e.g., for trustPolicy or resolutionMode), the resolver
  reuses it instead of making another registry request
- Eliminates hundreds of individual file read/write/rename operations per
  install, replaced by SQLite WAL-mode transactions
- Removes the runLimited/metafileOperationLimits concurrency machinery —
  SQLite handles concurrent access natively

New package: @pnpm/cache.metadata — SQLite-backed MetadataCache class
modeled after @pnpm/store.index, with getHeaders() for cheap lookups,
get() with abbreviated→full fallback, and set()/updateCachedAt().
2026-04-01 14:54:13 +02:00
Zoltan Kochan
968724fc0b perf: use abbreviated metadata for minimumReleaseAge (#11160)
* perf: use abbreviated metadata for minimumReleaseAge when possible

Instead of always fetching full package metadata when minimumReleaseAge
is set, fetch the smaller abbreviated document first and check the
top-level `modified` field. If the package was last modified before the
release age cutoff, all versions are mature and no per-version time
filtering is needed. Only re-fetch full metadata for the rare case of
recently-modified packages.

Also uses fs.stat() to check cache file mtime instead of reading and
parsing the JSON to check cachedAt, avoiding unnecessary I/O.

* fix: validate modified date and handle abbreviated metadata edge cases

- Validate meta.modified date to prevent invalid dates from bypassing
  minimumReleaseAge filtering
- Skip full metadata refetch for packages excluded by publishedByExclude
- Allow ERR_PNPM_MISSING_TIME from cached abbreviated metadata to fall
  through to the network fetch path instead of throwing

* fix: cache abbreviated metadata before re-fetching full metadata

Save the abbreviated metadata to disk before re-fetching full metadata
so subsequent runs benefit from the mtime cache fast-path.

* fix: resolve type narrowing for conditional metadata fetch result
2026-04-01 14:47:31 +02:00
Zoltan Kochan
6c57d746bb chore: update pnpm-lock.yaml (#11164) 2026-04-01 12:42:03 +02:00
Zoltan Kochan
421d120972 perf: use If-Modified-Since for conditional metadata fetches (#11161)
Before fetching package metadata from the registry, stat the local cache
file and send its mtime as an If-Modified-Since header. If the registry
returns 304 Not Modified, read the local cache instead of downloading
the full response body. This saves bandwidth and latency for packages
whose metadata hasn't changed since the last fetch.

Registries that don't support If-Modified-Since simply return 200 as
before, so there is no behavior change for unsupported registries.
2026-04-01 12:39:13 +02:00
Zoltan Kochan
52ee08aad4 chore: update pnpm-lock.yaml (#11111) 2026-03-31 12:39:26 +02:00
Zoltan Kochan
6b3d87a4ca perf: optimize undici connection settings and tarball buffering (#11151)
- Enable Happy Eyeballs (`autoSelectFamily`) for faster dual-stack (IPv4/IPv6) connection establishment
- Increase keep-alive timeouts (30s idle, 10min max) to reduce connection churn during install
- Set optimized global dispatcher so requests without custom options still benefit
- Pre-allocate `SharedArrayBuffer` for tarball downloads when `Content-Length` is known, avoiding intermediate chunk array and double-copy
2026-03-31 00:33:42 +02:00
Zoltan Kochan
14bb19ba6b chore: update pnpm to beta 6 2026-03-30 18:38:26 +02:00
Zoltan Kochan
a4305c91b4 ci: remove not needed comment from release.yml 2026-03-30 18:35:59 +02:00
Zoltan Kochan
ac1570c238 chore(release): 11.0.0-beta.6 v11.0.0-beta.6 2026-03-30 18:28:21 +02:00
Zoltan Kochan
9b1e5da6f7 fix: auto import mode falls through to hardlinks on ENOTSUP (#11150)
The ENOTSUP fallback in createClonePkg() silently converted clone
failures to file copies, preventing the auto-importer from detecting
that cloning is not supported and falling through to hardlinks.

On filesystems without reflink support (e.g. ext4 on Linux CI),
this caused every file to be copied instead of hardlinked — a 2-9x
regression for install operations on large projects.

The fix uses a raw clone (without ENOTSUP fallback) for the auto-mode
probe. If the filesystem doesn't support cloning, the error propagates
and the auto-importer falls through to hardlinks. Once cloning is
confirmed to work, subsequent packages use the full clone importer
with ENOTSUP fallback for transient failures under heavy parallel I/O.
2026-03-30 18:25:58 +02:00