Commit Graph

91 Commits

Author SHA1 Message Date
James R. Barlow
bbd263ff48 Add tests for fpdf2 renderer and font infrastructure
- Add hOCR test fixtures for Latin, Arabic, CJK, Devanagari scripts
- Add tests for fpdf2 renderer, multi-font manager, system font provider
- Add multilingual rendering tests
- Update existing tests to use fpdf2 renderer
2026-01-06 13:46:11 -08:00
James R. Barlow
a596ccf844 Raise exception if resulting PDF might appear blank in a known in some PDF viewers
Fixes #1187
2023-11-09 22:33:22 -08:00
James R. Barlow
9ffb45f283 Remove public domain congress.jpg and replace with baiona_color.jpg
For reuse compliance we are phasing out public domain licenses
2023-10-24 00:54:31 -07:00
James R. Barlow
93fda0dd00 Detect and warn about Tagged PDFs 2023-10-12 01:03:09 -07:00
f-hansen
050dd1f5a8 Allow title, subject, author, and keywords to be unset with an empty string argument (#1117)
Co-authored-by: Frederick D. Hansen <frederick.hansen@gmail.com>
2023-06-20 01:07:23 -07:00
James R. Barlow
545cd031b0 Replace public domain graph.pdf and derivates with licensed version 2022-08-11 01:09:00 -07:00
James R. Barlow
c5359bd990 jbig2 is from linn now 2022-08-06 15:36:20 -07:00
James R. Barlow
7f77308846 Remake palette.pdf using baiona-colormapped file 2022-08-06 15:35:47 -07:00
James R. Barlow
79db985181 Improve encryption tests; drop some public domain resources
Generate the encrypted files we need and remove special test files we retained for this.

Replace jbig2.pdf based on congress.jpg with version based on ccitt.pdf.
2022-08-06 14:37:45 -07:00
James R. Barlow
d591a3e059 resources readme: remove license and copyright info
Better to not repeat ourselves and present this info in example one place.
2022-08-04 03:42:22 -07:00
James R. Barlow
4b9ea40a0c spdx: move identifiers to files that support them
If the apparent license changed, take this commit as correct.
2022-08-04 03:26:54 -07:00
James R. Barlow
4a27124eab Simplify metadata for invalid xml in output
Removes possibly non-free resource enron1.pdf.
2020-02-12 00:07:18 -08:00
James R. Barlow
0c0d53b10f tests: AcroForm test case did not work correctly; fixed 2019-12-30 17:50:32 -08:00
James R. Barlow
c5571388e2 Improve test coverage of _sync.py 2019-12-10 01:06:27 -08:00
James R. Barlow
5e2a7f8a56 tests: speed up several slow tests 2019-12-09 16:17:57 -08:00
James R. Barlow
0a72c12ff0 weave: add new test for link consistency 2019-05-12 03:36:33 -07:00
James R. Barlow
f34b3015b2 Prevent Ghostscript from generating invalid XMP metadata
If DocumentInfo contains NULs Ghostscript will generate XMP with
NULs which is not allowed. Repair DocumentInfo before Ghostscript sees it.
2019-01-04 13:20:41 -08:00
James R. Barlow
9e6b54c7ed Add test case for Type3 fonts with no Unicode mapping 2018-11-15 21:54:26 -08:00
James R. Barlow
d3b334c10f Test case: true type font without Unicode mapping 2018-11-15 16:22:53 -08:00
James R. Barlow
686207ab7f Check for and reject Adobe LiveCycle Designer PDFs
These are the ones that display a "Please wait..." message.

Closes #296
2018-09-13 21:50:51 -07:00
James R. Barlow
795019b0c1 Work around invalid TOC entries
Kodak Capture Desktop and probably other software creates a
/Outlines entry with /First being set to an invalid indirect reference to
an object that hasn't been created. This is legal in the PDF spec but
problematic for qpdf. The objgen will be (max valid object ID + 1, 0).
Because we create new objects in _weave, some TOC entries will end
up assigned to new objects we create. Typically /ProcSet.

We solve the issue by refactoring page traversal and then doing it
twice, once to resolve all references (eliminating the null
reference problem) and a second pass to make our changes.
2018-09-11 14:44:16 -07:00
James R. Barlow
c171cb7286 Merge img2pdf 0.3.0 fix from v6.2.3 2018-08-01 15:17:33 -07:00
James R. Barlow
1d09061130 Revert previous commit amd reject input images with alpha channel
Decided on this for simplicity of old release branch.

Modifies baiona.png by stripping
alpha, adds baiona_alpha which
includes the alpha.
2018-07-31 23:45:28 -07:00
James R. Barlow
ed8ff79e10 Optimize some of our bigger test files
Only partially optimize multipage.pdf so that it hopefully
improves speed of test suite without being useless as an
optimization test.
2018-06-29 00:35:49 -07:00
James R. Barlow
9637696a54 Fix test resources naming inconsistency 2018-06-28 23:37:14 -07:00
James R. Barlow
02b3ca6862 Compress test images more heavily 2018-06-28 21:40:12 -07:00
James R. Barlow
2131ad4670 Fix --remove-background error on PDFs with colormapped images
It's unclear how exactly a
colormapped image gets to this
spot given the tendency of other
image processing tools to flatten
such images, but someone made it happen, so now we make sure
the image is okay.

Closes #262
2018-04-27 17:21:01 -07:00
James R. Barlow
7368399f8b Clarify license of two test files - https://github.com/jbarlow83/OCRmyPDF/issues/254 2018-04-17 11:56:36 -07:00
James R. Barlow
34c78a892a Fix list table for tests/resources
[ci skip]
2018-04-15 23:52:19 -07:00
James R. Barlow
4f6bffb477 Update copyrights 2018-03-31 11:54:38 -07:00
James R. Barlow
45dbff6401 Fix table of contents not preserved in PDF/A 2018-03-26 02:23:19 -07:00
James R. Barlow
6756016572 Add license notice to all files
Source files to GPL3

Exceptions:
-tests/spoof/* to MIT
-hocrtransform.py
-_unicodefun.py

Test resources to CC BY-SA 4.0 except when otherwise noted.

Add GPL license.
2018-03-24 02:33:24 -07:00
James R. Barlow
74ca736333 Issue #223: improve text of encrypted PDF error message 2018-02-27 15:08:22 -08:00
James R. Barlow
a9da839c39 Add vector-only PDF test case 2018-02-08 00:17:35 -08:00
James R. Barlow
3a167af2c4 Nearly smallest possible PDF-1.3 with all required fields 2017-11-26 23:32:21 -08:00
James R. Barlow
965de3a235 Test case for issue #200 2017-11-26 22:52:53 -08:00
James R. Barlow
34fc1f5fd7 Add reminder that blank.pdf is not trivial 2017-09-13 01:19:18 -07:00
James R. Barlow
d04e43d46d Update copyright info for test files
[ci skip]
2017-09-01 01:00:32 -07:00
James R. Barlow
52483072dc Add a differential test that checks tesseract uses supplied word list 2017-07-21 16:40:20 -07:00
James R. Barlow
4b5cd420e1 Add new test file 2017-05-29 12:16:08 -07:00
James R. Barlow
21982cf1cb baiona_gray remove alpha channel 2017-05-11 23:23:37 -07:00
James R. Barlow
edc01408da Update the .png files, again, hopefully without corruption 2017-05-11 23:20:50 -07:00
James R. Barlow
bf04f03c4c Fix corrupt test file “typewriter.png”
This file is not currently used in any tests, but could be, so replace
corrupt version with a useful one.
2017-05-06 22:28:34 -07:00
James R. Barlow
93e802f473 Fix issue #163, color and grayscale images JPEG compressed when not needed 2017-05-06 22:27:25 -07:00
James R. Barlow
aa859a4139 Fix #156 - NoneType has no ‘getObject’ for pages with no /Contents 2017-05-01 15:46:15 -07:00
James R. Barlow
d1a0065ef8 Create test case for Form XObjects 2017-02-14 12:51:15 -08:00
James R. Barlow
1976dc6f30 Fix issue #121 “pop from empty list” (content stream parsing error) 2017-01-26 17:24:40 -08:00
James R. Barlow
097a69d07f pageinfo: fix “decimal.InvalidOperation: quantize result has too many digits”
And add new test case for this.
2016-12-08 16:04:14 -08:00
James R. Barlow
949d2ff1c2 v4.3.1 release notes 2016-11-07 14:36:08 -08:00
James R. Barlow
cc9c0d819e Add test case for documents that get rotated incorrectly after deskew 2016-11-07 14:15:03 -08:00