James R. Barlow
bbd263ff48
Add tests for fpdf2 renderer and font infrastructure
...
- Add hOCR test fixtures for Latin, Arabic, CJK, Devanagari scripts
- Add tests for fpdf2 renderer, multi-font manager, system font provider
- Add multilingual rendering tests
- Update existing tests to use fpdf2 renderer
2026-01-06 13:46:11 -08:00
James R. Barlow
a596ccf844
Raise exception if resulting PDF might appear blank in a known in some PDF viewers
...
Fixes #1187
2023-11-09 22:33:22 -08:00
James R. Barlow
9ffb45f283
Remove public domain congress.jpg and replace with baiona_color.jpg
...
For reuse compliance we are phasing out public domain licenses
2023-10-24 00:54:31 -07:00
James R. Barlow
93fda0dd00
Detect and warn about Tagged PDFs
2023-10-12 01:03:09 -07:00
f-hansen
050dd1f5a8
Allow title, subject, author, and keywords to be unset with an empty string argument ( #1117 )
...
Co-authored-by: Frederick D. Hansen <frederick.hansen@gmail.com >
2023-06-20 01:07:23 -07:00
James R. Barlow
545cd031b0
Replace public domain graph.pdf and derivates with licensed version
2022-08-11 01:09:00 -07:00
James R. Barlow
c5359bd990
jbig2 is from linn now
2022-08-06 15:36:20 -07:00
James R. Barlow
7f77308846
Remake palette.pdf using baiona-colormapped file
2022-08-06 15:35:47 -07:00
James R. Barlow
79db985181
Improve encryption tests; drop some public domain resources
...
Generate the encrypted files we need and remove special test files we retained for this.
Replace jbig2.pdf based on congress.jpg with version based on ccitt.pdf.
2022-08-06 14:37:45 -07:00
James R. Barlow
d591a3e059
resources readme: remove license and copyright info
...
Better to not repeat ourselves and present this info in example one place.
2022-08-04 03:42:22 -07:00
James R. Barlow
4b9ea40a0c
spdx: move identifiers to files that support them
...
If the apparent license changed, take this commit as correct.
2022-08-04 03:26:54 -07:00
James R. Barlow
4a27124eab
Simplify metadata for invalid xml in output
...
Removes possibly non-free resource enron1.pdf.
2020-02-12 00:07:18 -08:00
James R. Barlow
0c0d53b10f
tests: AcroForm test case did not work correctly; fixed
2019-12-30 17:50:32 -08:00
James R. Barlow
c5571388e2
Improve test coverage of _sync.py
2019-12-10 01:06:27 -08:00
James R. Barlow
5e2a7f8a56
tests: speed up several slow tests
2019-12-09 16:17:57 -08:00
James R. Barlow
0a72c12ff0
weave: add new test for link consistency
2019-05-12 03:36:33 -07:00
James R. Barlow
f34b3015b2
Prevent Ghostscript from generating invalid XMP metadata
...
If DocumentInfo contains NULs Ghostscript will generate XMP with
NULs which is not allowed. Repair DocumentInfo before Ghostscript sees it.
2019-01-04 13:20:41 -08:00
James R. Barlow
9e6b54c7ed
Add test case for Type3 fonts with no Unicode mapping
2018-11-15 21:54:26 -08:00
James R. Barlow
d3b334c10f
Test case: true type font without Unicode mapping
2018-11-15 16:22:53 -08:00
James R. Barlow
686207ab7f
Check for and reject Adobe LiveCycle Designer PDFs
...
These are the ones that display a "Please wait..." message.
Closes #296
2018-09-13 21:50:51 -07:00
James R. Barlow
795019b0c1
Work around invalid TOC entries
...
Kodak Capture Desktop and probably other software creates a
/Outlines entry with /First being set to an invalid indirect reference to
an object that hasn't been created. This is legal in the PDF spec but
problematic for qpdf. The objgen will be (max valid object ID + 1, 0).
Because we create new objects in _weave, some TOC entries will end
up assigned to new objects we create. Typically /ProcSet.
We solve the issue by refactoring page traversal and then doing it
twice, once to resolve all references (eliminating the null
reference problem) and a second pass to make our changes.
2018-09-11 14:44:16 -07:00
James R. Barlow
c171cb7286
Merge img2pdf 0.3.0 fix from v6.2.3
2018-08-01 15:17:33 -07:00
James R. Barlow
1d09061130
Revert previous commit amd reject input images with alpha channel
...
Decided on this for simplicity of old release branch.
Modifies baiona.png by stripping
alpha, adds baiona_alpha which
includes the alpha.
2018-07-31 23:45:28 -07:00
James R. Barlow
ed8ff79e10
Optimize some of our bigger test files
...
Only partially optimize multipage.pdf so that it hopefully
improves speed of test suite without being useless as an
optimization test.
2018-06-29 00:35:49 -07:00
James R. Barlow
9637696a54
Fix test resources naming inconsistency
2018-06-28 23:37:14 -07:00
James R. Barlow
02b3ca6862
Compress test images more heavily
2018-06-28 21:40:12 -07:00
James R. Barlow
2131ad4670
Fix --remove-background error on PDFs with colormapped images
...
It's unclear how exactly a
colormapped image gets to this
spot given the tendency of other
image processing tools to flatten
such images, but someone made it happen, so now we make sure
the image is okay.
Closes #262
2018-04-27 17:21:01 -07:00
James R. Barlow
7368399f8b
Clarify license of two test files - https://github.com/jbarlow83/OCRmyPDF/issues/254
2018-04-17 11:56:36 -07:00
James R. Barlow
34c78a892a
Fix list table for tests/resources
...
[ci skip]
2018-04-15 23:52:19 -07:00
James R. Barlow
4f6bffb477
Update copyrights
2018-03-31 11:54:38 -07:00
James R. Barlow
45dbff6401
Fix table of contents not preserved in PDF/A
2018-03-26 02:23:19 -07:00
James R. Barlow
6756016572
Add license notice to all files
...
Source files to GPL3
Exceptions:
-tests/spoof/* to MIT
-hocrtransform.py
-_unicodefun.py
Test resources to CC BY-SA 4.0 except when otherwise noted.
Add GPL license.
2018-03-24 02:33:24 -07:00
James R. Barlow
74ca736333
Issue #223 : improve text of encrypted PDF error message
2018-02-27 15:08:22 -08:00
James R. Barlow
a9da839c39
Add vector-only PDF test case
2018-02-08 00:17:35 -08:00
James R. Barlow
3a167af2c4
Nearly smallest possible PDF-1.3 with all required fields
2017-11-26 23:32:21 -08:00
James R. Barlow
965de3a235
Test case for issue #200
2017-11-26 22:52:53 -08:00
James R. Barlow
34fc1f5fd7
Add reminder that blank.pdf is not trivial
2017-09-13 01:19:18 -07:00
James R. Barlow
d04e43d46d
Update copyright info for test files
...
[ci skip]
2017-09-01 01:00:32 -07:00
James R. Barlow
52483072dc
Add a differential test that checks tesseract uses supplied word list
2017-07-21 16:40:20 -07:00
James R. Barlow
4b5cd420e1
Add new test file
2017-05-29 12:16:08 -07:00
James R. Barlow
21982cf1cb
baiona_gray remove alpha channel
2017-05-11 23:23:37 -07:00
James R. Barlow
edc01408da
Update the .png files, again, hopefully without corruption
2017-05-11 23:20:50 -07:00
James R. Barlow
bf04f03c4c
Fix corrupt test file “typewriter.png”
...
This file is not currently used in any tests, but could be, so replace
corrupt version with a useful one.
2017-05-06 22:28:34 -07:00
James R. Barlow
93e802f473
Fix issue #163 , color and grayscale images JPEG compressed when not needed
2017-05-06 22:27:25 -07:00
James R. Barlow
aa859a4139
Fix #156 - NoneType has no ‘getObject’ for pages with no /Contents
2017-05-01 15:46:15 -07:00
James R. Barlow
d1a0065ef8
Create test case for Form XObjects
2017-02-14 12:51:15 -08:00
James R. Barlow
1976dc6f30
Fix issue #121 “pop from empty list” (content stream parsing error)
2017-01-26 17:24:40 -08:00
James R. Barlow
097a69d07f
pageinfo: fix “decimal.InvalidOperation: quantize result has too many digits”
...
And add new test case for this.
2016-12-08 16:04:14 -08:00
James R. Barlow
949d2ff1c2
v4.3.1 release notes
2016-11-07 14:36:08 -08:00
James R. Barlow
cc9c0d819e
Add test case for documents that get rotated incorrectly after deskew
2016-11-07 14:15:03 -08:00