Files
pnpm/agent/server/src/fileStore.ts
Zoltan Kochan ccc606ed15 feat: pnpm agent — server-side resolution for faster installs (#11251)
## Summary

Adds an opt-in **pnpm agent** server that resolves dependencies server-side and streams only the files missing from the client's content-addressable store.

- **`@pnpm/agent.server`** — multi-process HTTP server (Node.js `cluster`) with SQLite-backed metadata and file caches
- **`@pnpm/agent.client`** — streams an NDJSON response, dispatches worker threads to fetch files while the server is still resolving
- **New config**: `agent` in `pnpm-workspace.yaml` (opt-in)

## How it works

1. Client reads integrity hashes from its local store index
2. Sends `POST /v1/install` with dependencies + store integrities
3. Server resolves the dependency tree using pnpm's `install({ lockfileOnly: true })`, with a SQLite-backed `PackageMetaCache` for fast repeat resolution
4. As each package resolves, a wrapped `storeController.requestPackage` looks up its files and immediately streams digests the client is missing (NDJSON `D` lines)
5. Client reads the stream line by line; digest batches fill up and dispatch worker threads to `POST /v1/files` — file downloads overlap with server-side resolution
6. After resolution, server sends index entries (`I` lines) and lockfile (`L` line)
7. Client writes index entries to store, then runs headless install with a wrapped `fetchPackage` that calls `readPkgFromCafs` with `verifyStoreIntegrity: false` (files are trusted from the agent)
8. `/v1/files` response is gzip-streamed (274MB → ~80MB) — server pipes through `createGzip`, worker pipes through `createGunzip`, parsing and writing files to CAFS as data arrives

## Performance

1351-package project, cold local store, warm server (localhost):

| Scenario | Time |
|----------|------|
| Vanilla pnpm install (cold OS cache) | ~48s |
| Vanilla pnpm install (warm OS cache) | ~34s |
| With pnpm agent (consistent) | **~33s** |

### Key optimizations

1. **SQLite metadata cache** — server-side resolution drops from ~3.4s to ~0.9s
2. **SQLite file store** — consistent read performance regardless of OS file cache state
3. **Streaming `/v1/install`** — file digests stream during resolution, downloads start before resolution finishes
4. **Gzip-streamed `/v1/files`** — whole-stream gzip (274MB → ~80MB), significant savings on remote servers
5. **Worker-thread streaming HTTP** — workers pipe gzip → parse → write to CAFS as data arrives, no buffering
6. **No rehashing** — server-provided digests used directly, skipping 33K SHA-512 computations
7. **No re-verification** — wrapped `fetchPackage` calls `readPkgFromCafs` with `verifyStoreIntegrity: false`
8. **Direct `writeFileSync` with `wx`** — no stat + temp + rename
9. **Pre-packed msgpack** — server sends raw store index buffers, client writes directly to SQLite
10. **WAL checkpoint** — ensures store index entries written by agent are visible to headless install's worker threads

## Usage

Start the server:
```bash
node agent/server/lib/bin.js
```

Configure in `pnpm-workspace.yaml`:
```yaml
agent: http://localhost:4873
```
2026-04-20 11:56:46 +02:00

76 lines
2.5 KiB
TypeScript

import { createRequire } from 'node:module'
import type { DatabaseSync as _DatabaseSync } from 'node:sqlite'
// Jest's ESM module resolver doesn't handle node:sqlite.
// Use createRequire to load it at runtime.
const { DatabaseSync } = createRequire(import.meta.url)('node:sqlite') as typeof import('node:sqlite')
/**
* SQLite-backed file store for fast batch reads.
* Files are stored as blobs keyed by hex digest.
* Much faster than 33K individual readFileSync calls.
*/
export class FileStore {
private db: _DatabaseSync
private getStmt!: ReturnType<_DatabaseSync['prepare']>
private hasStmt!: ReturnType<_DatabaseSync['prepare']>
private insertStmt!: ReturnType<_DatabaseSync['prepare']>
constructor (dbPath: string) {
this.db = new DatabaseSync(dbPath)
this.db.exec('PRAGMA busy_timeout=5000')
this.db.exec('PRAGMA journal_mode=WAL')
this.db.exec('PRAGMA synchronous=NORMAL')
this.db.exec(`
CREATE TABLE IF NOT EXISTS files (
digest TEXT PRIMARY KEY,
content BLOB NOT NULL,
size INTEGER NOT NULL,
executable INTEGER NOT NULL DEFAULT 0
)
`)
// /v1/files is a hot path (thousands of .get() calls per request), so
// prepare statements once in the constructor and reuse them.
this.getStmt = this.db.prepare('SELECT content, size FROM files WHERE digest = ?')
this.hasStmt = this.db.prepare('SELECT 1 FROM files WHERE digest = ?')
this.insertStmt = this.db.prepare(
'INSERT OR IGNORE INTO files (digest, content, size, executable) VALUES (?, ?, ?, ?)'
)
}
has (digest: string): boolean {
return this.hasStmt.get(digest) !== undefined
}
/**
* Bulk insert pre-read file contents. Runs in a transaction for speed.
* Callers pass the already-read buffer so we don't re-read from disk.
*/
importMany (files: Array<{ digest: string, content: Buffer, executable: boolean }>): number {
let imported = 0
this.db.exec('BEGIN')
try {
for (const file of files) {
const result = this.insertStmt.run(file.digest, file.content, file.content.length, file.executable ? 1 : 0)
if (result.changes > 0) imported++
}
this.db.exec('COMMIT')
} catch (err) {
this.db.exec('ROLLBACK')
throw err
}
return imported
}
/**
* Get file content and size for building the archive response.
*/
get (digest: string): { content: Buffer, size: number } | undefined {
return this.getStmt.get(digest) as { content: Buffer, size: number } | undefined
}
close (): void {
this.db.close()
}
}