mirror of
https://github.com/pdfme/pdfme.git
synced 2026-06-02 11:17:32 -04:00
* fix: two unbounded-cache memory leaks in common and schemas
Two module-level Map caches that never evict and store multi-MB strings
as keys, silently leaking for the entire lifetime of any consumer.
1. packages/common/src/expression.ts — parseDataCache
parseData() was memoized via a module-level parseDataCache keyed by
JSON.stringify(data). replacePlaceholders() calls it with a merged
{ ...schemaNameDefaults, ...variables } object where values may be
arbitrary strings from the caller. Whenever inputs contain base64
(image schemas with embedded data URLs, embedded fonts, large text),
the cache key is a multi-MB JSON string that gets pinned permanently;
every unique inputs state adds its own key, never collected. Parsing
is O(fields) and cheap, so removing the cache is strictly a win.
Regression test: packages/common/__tests__/expression.test.ts
'replacePlaceholders memory safety > does not retain call inputs in
a module-level cache' — runs 30 replacePlaceholders() calls with
unique ~500 KB payloads, captures a V8 heap snapshot via
v8.writeHeapSnapshot, aggregates string nodes >= 200 KB and asserts
the total retained size is below 2 MB. Pre-fix: ~30 MB retained
(FAILS). Post-fix: 0 bytes retained (passes).
2. packages/schemas/src/graphics/image.ts — getCacheKey
getCacheKey(schema, input) returned `${schema.type}${input}`, using
the full base64 bytes of the image as part of the cache key. Every
unique image processed by the PDF render path added a permanent Map
entry whose key byte length matched the image itself.
Replaced with a short fingerprint that samples the total length plus
three 16-char regions (first, middle, last). The middle-region
sample is essential: base64 PNGs share a common header and IEND
trailer, so distinct images of the same size would collide if only
first/last regions were sampled. Middle bytes are pixel data and
differ between distinct images with overwhelming probability. Keys
stay under 80 chars regardless of input size.
Regression tests: packages/schemas/__tests__/image.test.ts
- 'does not pin the full base64 input as a cache key' — asserts
key length < 100 chars. Pre-fix: 139 chars for a minimal PNG and
proportionally more for realistic images (FAILS).
- 'distinguishes different images via the fingerprint' — guards
against future over-shortening of the fingerprint that could
reintroduce collisions between distinct images.
Both leaks were originally identified via a V8 heap-snapshot diff taken
across a UI workload (typing + field tabbing) against a consumer app
with image schemas carrying base64 content. Before the fix, the top two
growing allocations by retained size were multi-MB string entries — one
per module-level cache in this PR — together accounting for hundreds of
MB of retained JS heap in a single 3-iteration run. After the fix, both
string entries disappear from the top 25 growing allocations and
aggregate JS heap is net flat / slightly shrinking across iterations.
No public API change. No behavioral change for consumers. Both caches
were module-local implementation details.
* fix(schemas): harden image cache key with FNV-1a hash; fix stale test comments
Addresses Greptile review on #1426:
- Replace 3-region sampling fingerprint in getCacheKey with an FNV-1a
32-bit hash over the full input. The old first-16 slice was a
constant data-URI prefix for any image of the same MIME type,
contributing no entropy; hashing every byte removes that weakness
at the same O(n) cost without retaining any slice as a Map key.
Key format is now `${type}:${len}:${fnv1a-hex}` (~40 chars).
- Rewrite stale comments in image.test.ts that referred to a
padding/mutation scheme the test never performs, and update the
fingerprint-format comment to match the new hash-based key.
- Add trailing newline to expression.test.ts.
All pre-existing and new tests still pass.