Files
pdfme/packages/pdf-lib/__tests__/utils/unicode.spec.ts
devin-ai-integration[bot] e4a4c300cd Migrate pdf-lib into pdfme monorepo (#1059)
* Migrate pdf-lib into pdfme monorepo

- Add @pdfme/pdf-lib package to packages/ directory
- Update root package.json to include pdf-lib in workspaces
- Update all package dependencies to use workspace:* for @pdfme/pdf-lib
- Configure TypeScript build targets (cjs, esm, node) for pdf-lib
- Add ESLint configuration with relaxed rules for pdf-lib migration
- Integrate pdf-lib into monorepo build and clean scripts
- Add basic test suite for pdf-lib package
- All lint, build, and test suites pass successfully

This migration improves maintainability by consolidating all PDF operations
into a single repository and unified build/test/release process.

Co-Authored-By: Kyohei Fukuda <kyoheif@wix.com>

* Fix TypeScript module resolution for workspace dependencies

- Changed moduleResolution from 'bundler' to 'node' in common package
- This should resolve '@pdfme/pdf-lib' module resolution issues
- Reverted workspace dependency format back to '*' for npm compatibility

Co-Authored-By: Kyohei Fukuda <kyoheif@wix.com>

* Fix pdf-lib package.json exports paths

- Updated main, module, and exports paths to point to correct locations
- Changed from dist/*/index.js to dist/*/src/index.js to match build output
- Fixed TypeScript types path from dist/types/index.d.ts to dist/types/src/index.d.ts
- Resolves Vite package entry resolution errors and TypeScript module resolution issues

Co-Authored-By: Kyohei Fukuda <kyoheif@wix.com>

* Fix CodeQL security alerts in svg.ts

- Add input validation and sanitization for HTML/SVG parsing
- Prevent ReDoS attacks with regex limits and input size checks
- Sanitize font family names to prevent prototype pollution
- Add URL validation for image sources to prevent path traversal
- Limit transformation parsing to prevent infinite loops
- Maintain backward compatibility while improving security

Co-Authored-By: Kyohei Fukuda <kyoheif@wix.com>

* Implement comprehensive security fixes for CodeQL alerts in svg.ts

- Add input validation and sanitization for SVG content
- Implement safe HTML parsing with null checks and size limits
- Add controlled dynamic property access with allowlisted tag names
- Prevent style injection with filtered and limited style entries
- Add regex match limits to prevent ReDoS attacks
- Enhance font selection with input validation and type safety
- Sanitize image sources to prevent path traversal and injection
- Limit CSS style parsing to prevent potential vulnerabilities

These changes address the 2 high-severity CodeQL security alerts while
maintaining backward compatibility and functionality.

Co-Authored-By: Kyohei Fukuda <kyoheif@wix.com>

* Add additional security fixes for CodeQL alerts in svg.ts

- Implement safer property access for polygon node transformation
- Add input validation for points attribute with regex pattern matching
- Replace Object.assign with safer property assignment to prevent prototype pollution
- Add null checks and type validation for node attributes and childNodes
- Implement safer SVG node parsing with comprehensive validation
- Add array type checks for childNodes processing

These changes target the remaining 2 high-severity CodeQL security alerts
by addressing potential prototype pollution and unsafe property access.

Co-Authored-By: Kyohei Fukuda <kyoheif@wix.com>

* Implement comprehensive security hardening for CodeQL alerts in svg.ts

- Add comprehensive SVG content sanitization with allowlist-based tag filtering
- Implement strict input validation with bounds checking for all numeric inputs
- Replace unsafe dynamic property assignment with Object.defineProperty
- Add try-catch error handling for HTML parsing operations
- Restrict allowed style properties and validate string lengths
- Use setAttribute/removeAttribute instead of direct attribute manipulation
- Add type safety checks for all node operations
- Implement safer polygon-to-path conversion with validation

These changes address the 10 high-severity CodeQL security alerts by:
1. Preventing XSS through comprehensive input sanitization
2. Avoiding prototype pollution with safer property assignment
3. Adding bounds checking to prevent DoS attacks
4. Using allowlist-based validation for all user inputs
5. Implementing proper error handling to prevent crashes

Co-Authored-By: Kyohei Fukuda <kyoheif@wix.com>

* Potential fix for code scanning alert no. 32: Incomplete multi-character sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Potential fix for code scanning alert no. 39: Incomplete multi-character sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Fix inefficient regular expression in svg.ts to pass CodeQL

- Changed /([^:\s]+)*\s*:\s*([^;]+)/g to /([^:\s]+)\s*:\s*([^;]+)/g
- Removed the problematic * quantifier that could cause exponential backtracking
- This fixes the "Inefficient regular expression" security alert from GitHub Advanced Security

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* remove sanitize-html

* move tests

* fix for security

* update dependabot.yml

* organize

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Kyohei Fukuda <kyouhei.fukuda0729@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-06-26 18:30:05 +09:00

338 lines
9.0 KiB
TypeScript
Raw Permalink Blame History

import {
mergeIntoTypedArray,
utf16Encode,
utf8Encode,
utf16Decode,
} from '../../src/utils';
const utf8BOM = new Uint8Array([0xef, 0xbb, 0xbf]);
const utf16BOM = new Uint16Array([0xfeff]);
const withUtf8Bom = (encoding: Uint8Array) =>
mergeIntoTypedArray(utf8BOM, encoding);
const withUtf16Bom = (encoding: Uint16Array) =>
new Uint16Array([...Array.from(utf16BOM), ...Array.from(encoding)]);
describe(`utf8Encode`, () => {
it(`encodes <U+004D U+0430 U+4E8C U+10302> to UTF-8`, () => {
const input = '\u{004D}\u{0430}\u{4E8C}\u{10302}';
// prettier-ignore
const expected = new Uint8Array([
/* U+004D */ 0x4d,
/* U+0430 */ 0xd0, 0xb0,
/* U+4E8C */ 0xe4, 0xba, 0x8c,
/* U+10302 */ 0xf0, 0x90, 0x8c, 0x82,
]);
const actual = utf8Encode(input);
expect(actual).toEqual(withUtf8Bom(expected));
});
it(`encodes <U+004D U+0061 U+10000> to UTF-8`, () => {
const input = '\u{004D}\u{0061}\u{10000}';
// prettier-ignore
const expected = new Uint8Array([
/* U+004D */ 0x4d,
/* U+0061 */ 0x61,
/* U+10000 */ 0xf0, 0x90, 0x80, 0x80,
]);
const actual = utf8Encode(input);
expect(actual).toEqual(withUtf8Bom(expected));
});
it(`encodes <U+1F4A9 U+1F382> to UTF-8 (without a BOM)`, () => {
const input = '💩🎂';
// prettier-ignore
const expected = new Uint8Array([
/* U+1F4A9 */ 0xf0, 0x9f, 0x92, 0xa9,
/* U+1F382 */ 0xf0, 0x9f, 0x8e, 0x82,
]);
const actual = utf8Encode(input, false);
expect(actual).toEqual(expected);
});
it(`encodes "Дмитрий Козлюк (Dmitry Kozlyuk)" to UTF-8`, () => {
const input = 'Дмитрий Козлюк (Dmitry Kozlyuk)';
// prettier-ignore
const expected = new Uint8Array([
0xd0, 0x94, 0xd0, 0xbc, 0xd0, 0xb8, 0xd1, 0x82, 0xd1, 0x80, 0xd0, 0xb8,
0xd0, 0xb9, 0x20, 0xd0, 0x9a, 0xd0, 0xbe, 0xd0, 0xb7, 0xd0, 0xbb, 0xd1,
0x8e, 0xd0, 0xba, 0x20, 0x28, 0x44, 0x6d, 0x69, 0x74, 0x72, 0x79, 0x20,
0x4b, 0x6f, 0x7a, 0x6c, 0x79, 0x75, 0x6b, 0x29,
]);
const actual = utf8Encode(input);
expect(actual).toEqual(withUtf8Bom(expected));
});
it(`encodes "ä☺𠜎️☁️" to UTF-8 (without a BOM)`, () => {
const input = 'ä☺𠜎️☁️';
// prettier-ignore
const expected = new Uint8Array([
0xc3, 0xa4, 0xe2, 0x98, 0xba, 0xf0, 0xa0, 0x9c, 0x8e, 0xef, 0xb8, 0x8f,
0xe2, 0x98, 0x81, 0xef, 0xb8, 0x8f,
]);
const actual = utf8Encode(input, false);
expect(actual).toEqual(expected);
});
});
describe(`utf16Encode`, () => {
it(`encodes <U+004D U+0430 U+4E8C U+10302> to UTF-16`, () => {
const input = '\u{004D}\u{0430}\u{4E8C}\u{10302}';
// prettier-ignore
const expected = new Uint16Array(new Uint8Array([
/* U+004D */ 0x4d, 0x00,
/* U+0430 */ 0x30, 0x04,
/* U+4E8C */ 0x8c, 0x4e,
/* U+10302 */ 0x00, 0xd8, 0x02, 0xdf,
]).buffer);
const actual = utf16Encode(input);
expect(actual).toEqual(withUtf16Bom(expected));
});
it(`encodes <U+004D U+0061 U+10000> to UTF-16`, () => {
const input = '\u{004D}\u{0061}\u{10000}';
// prettier-ignore
const expected = new Uint16Array(new Uint8Array([
/* U+004D */ 0x4d, 0x00,
/* U+0061 */ 0x61, 0x00,
/* U+10000 */ 0x00, 0xd8, 0x00, 0xdc,
]).buffer);
const actual = utf16Encode(input);
expect(actual).toEqual(withUtf16Bom(expected));
});
it(`encodes <U+1F4A9 U+1F382> to UTF-16 (without a BOM)`, () => {
const input = '💩🎂';
// prettier-ignore
const expected = new Uint16Array(new Uint8Array([
/* U+1F4A9 */ 0x3d, 0xd8, 0xa9, 0xdc,
/* U+1F382 */ 0x3c, 0xd8, 0x82, 0xdf,
]).buffer);
const actual = utf16Encode(input, false);
expect(actual).toEqual(expected);
});
it(`encodes "Дмитрий Козлюк (Dmitry Kozlyuk)" to UTF-16`, () => {
const input = 'Дмитрий Козлюк (Dmitry Kozlyuk)';
// prettier-ignore
const expected = new Uint16Array([
0x414, 0x43c, 0x438, 0x442, 0x440, 0x438, 0x439, 0x20, 0x41a, 0x43e,
0x437, 0x43b, 0x44e, 0x43a, 0x20, 0x28, 0x44, 0x6d, 0x69, 0x74, 0x72,
0x79, 0x20, 0x4b, 0x6f, 0x7a, 0x6c, 0x79, 0x75, 0x6b, 0x29,
]);
const actual = utf16Encode(input);
expect(actual).toEqual(withUtf16Bom(expected));
});
it(`encodes "ä☺𠜎️☁️" to UTF-16 (without a BOM)`, () => {
const input = 'ä☺𠜎️☁️';
// prettier-ignore
const expected = new Uint16Array([
0xe4, 0x263a, 55361, 57102, 0xfe0f, 0x2601, 0xfe0f,
]);
const actual = utf16Encode(input, false);
expect(actual).toEqual(expected);
});
});
describe(`utf16Decode`, () => {
it(`decodes <U+004D U+0430 U+4E8C U+10302> from UTF-16`, () => {
// prettier-ignore
const input = new Uint8Array([
/* U+004D */ 0x00, 0x4d,
/* U+0430 */ 0x04, 0x30,
/* U+4E8C */ 0x4e, 0x8c,
/* U+10302 */ 0xd8, 0x00, 0xdf, 0x02,
]);
const expected = '\u{004D}\u{0430}\u{4E8C}\u{10302}';
const actual = utf16Decode(input, false);
expect(actual).toEqual(expected);
});
it(`decodes <U+004D U+0061 U+10000> from UTF-16`, () => {
// prettier-ignore
const input = new Uint8Array([
/* U+004D */ 0x00, 0x4d,
/* U+0061 */ 0x00, 0x61,
/* U+10000 */ 0xd8, 0x00, 0xdc, 0x00,
]);
const expected = '\u{004D}\u{0061}\u{10000}';
const actual = utf16Decode(input, false);
expect(actual).toEqual(expected);
});
it(`decodes <U+1F4A9 U+1F382> from UTF-16`, () => {
// prettier-ignore
const input = new Uint8Array([
/* U+1F4A9 */ 0xd8, 0x3d, 0xdc, 0xa9,
/* U+1F382 */ 0xd8, 0x3c, 0xdf, 0x82,
]);
const expected = '💩🎂';
const actual = utf16Decode(input, false);
expect(actual).toEqual(expected);
});
it(`decodes 'abcd' from UTF-16`, () => {
// prettier-ignore
const input = new Uint8Array([
/* a */ 0, 97,
/* b */ 0, 98,
/* c */ 0, 99,
/* d */ 0, 100,
]);
const expected = 'abcd';
const actual = utf16Decode(input, false);
expect(actual).toEqual(expected);
});
it(`decodes "Дмитрий Козлюк (Dmitry Kozlyuk)" from UTF-16`, () => {
const littleEndianBOM = (0xfe << 8) | 0xff;
// prettier-ignore
const input = new Uint8Array(new Uint16Array([
littleEndianBOM,
0x414, 0x43c, 0x438, 0x442, 0x440, 0x438, 0x439, 0x020, 0x41a, 0x43e,
0x437, 0x43b, 0x44e, 0x43a, 0x020, 0x028, 0x044, 0x06d, 0x069, 0x074,
0x072, 0x079, 0x020, 0x04b, 0x06f, 0x07a, 0x06c, 0x079, 0x075, 0x06b,
0x29,
]).buffer);
const expected = 'Дмитрий Козлюк (Dmitry Kozlyuk)';
const actual = utf16Decode(input, true);
expect(actual).toEqual(expected);
});
it(`decodes "ä☺𠜎️☁️" from UTF-16 (without a BOM)`, () => {
const littleEndianBOM = (0xfe << 8) | 0xff;
// prettier-ignore
const input = new Uint8Array(new Uint16Array([
littleEndianBOM,
0xe4, 0x263a, 55361, 57102, 0xfe0f, 0x2601, 0xfe0f,
]).buffer);
const expected = 'ä☺𠜎️☁️';
const actual = utf16Decode(input, true);
expect(actual).toEqual(expected);
});
it(`injects a replacement character when the input ends prematurely`, () => {
// prettier-ignore
const input = new Uint8Array([
/* U+1F4A9 */ 0xd8, 0x3d, 0xdc, 0xa9,
/* U+1F382 */ 0xd8,
]);
const expected = '💩<>';
const actual = utf16Decode(input, false);
expect(actual).toEqual(expected);
});
it(`injects a replacement character when the input ends with a high surrogate`, () => {
// prettier-ignore
const input = new Uint8Array([
/* U+1F4A9 */ 0xd8, 0x3d, 0xdc, 0xa9,
/* U+1F382 */ 0xd8, 0x3c,
]);
const expected = '💩<>';
const actual = utf16Decode(input, false);
expect(actual).toEqual(expected);
});
it(`injects a replacement character when the input ends with a low surrogate`, () => {
// prettier-ignore
const input = new Uint8Array([
/* U+1F4A9 */ 0xd8, 0x3d, 0xdc, 0xa9,
/* U+1F382 */ 0xdf, 0x82,
]);
const expected = '💩<>';
const actual = utf16Decode(input, false);
expect(actual).toEqual(expected);
});
it(`injects a replacement character when low surrogates precede high surrogates`, () => {
// prettier-ignore
const input = new Uint8Array([
/* U+1F4A9 */ 0xd8, 0x3d, 0xdc, 0xa9,
/* U+1F382 */ 0xdf, 0x82, 0xd8, 0x3c,
/* valid a */ 0, 97,
]);
const expected = '💩<>a';
const actual = utf16Decode(input, false);
expect(actual).toEqual(expected);
});
it(`injects a replacement character when high surrogates are not followed by low surrogates`, () => {
// prettier-ignore
const input = new Uint8Array([
/* valid U+1F4A9 */ 0xd8, 0x3d, 0xdc, 0xa9,
/* invalid U+1F382 */ 0xd8, 0x3c, 0x82, 0xdf,
/* valid a */ 0, 97,
]);
const expected = '💩<>a';
const actual = utf16Decode(input, false);
expect(actual).toEqual(expected);
});
});