James R. Barlow
ed8ff79e10
Optimize some of our bigger test files
...
Only partially optimize multipage.pdf so that it hopefully
improves speed of test suite without being useless as an
optimization test.
2018-06-29 00:35:49 -07:00
James R. Barlow
9637696a54
Fix test resources naming inconsistency
2018-06-28 23:37:14 -07:00
James R. Barlow
02b3ca6862
Compress test images more heavily
2018-06-28 21:40:12 -07:00
James R. Barlow
2131ad4670
Fix --remove-background error on PDFs with colormapped images
...
It's unclear how exactly a
colormapped image gets to this
spot given the tendency of other
image processing tools to flatten
such images, but someone made it happen, so now we make sure
the image is okay.
Closes #262
2018-04-27 17:21:01 -07:00
James R. Barlow
7368399f8b
Clarify license of two test files - https://github.com/jbarlow83/OCRmyPDF/issues/254
2018-04-17 11:56:36 -07:00
James R. Barlow
34c78a892a
Fix list table for tests/resources
...
[ci skip]
2018-04-15 23:52:19 -07:00
James R. Barlow
4f6bffb477
Update copyrights
2018-03-31 11:54:38 -07:00
James R. Barlow
45dbff6401
Fix table of contents not preserved in PDF/A
2018-03-26 02:23:19 -07:00
James R. Barlow
6756016572
Add license notice to all files
...
Source files to GPL3
Exceptions:
-tests/spoof/* to MIT
-hocrtransform.py
-_unicodefun.py
Test resources to CC BY-SA 4.0 except when otherwise noted.
Add GPL license.
2018-03-24 02:33:24 -07:00
James R. Barlow
74ca736333
Issue #223 : improve text of encrypted PDF error message
2018-02-27 15:08:22 -08:00
James R. Barlow
a9da839c39
Add vector-only PDF test case
2018-02-08 00:17:35 -08:00
James R. Barlow
3a167af2c4
Nearly smallest possible PDF-1.3 with all required fields
2017-11-26 23:32:21 -08:00
James R. Barlow
965de3a235
Test case for issue #200
2017-11-26 22:52:53 -08:00
James R. Barlow
34fc1f5fd7
Add reminder that blank.pdf is not trivial
2017-09-13 01:19:18 -07:00
James R. Barlow
d04e43d46d
Update copyright info for test files
...
[ci skip]
2017-09-01 01:00:32 -07:00
James R. Barlow
52483072dc
Add a differential test that checks tesseract uses supplied word list
2017-07-21 16:40:20 -07:00
James R. Barlow
4b5cd420e1
Add new test file
2017-05-29 12:16:08 -07:00
James R. Barlow
21982cf1cb
baiona_gray remove alpha channel
2017-05-11 23:23:37 -07:00
James R. Barlow
edc01408da
Update the .png files, again, hopefully without corruption
2017-05-11 23:20:50 -07:00
James R. Barlow
bf04f03c4c
Fix corrupt test file “typewriter.png”
...
This file is not currently used in any tests, but could be, so replace
corrupt version with a useful one.
2017-05-06 22:28:34 -07:00
James R. Barlow
93e802f473
Fix issue #163 , color and grayscale images JPEG compressed when not needed
2017-05-06 22:27:25 -07:00
James R. Barlow
aa859a4139
Fix #156 - NoneType has no ‘getObject’ for pages with no /Contents
2017-05-01 15:46:15 -07:00
James R. Barlow
d1a0065ef8
Create test case for Form XObjects
2017-02-14 12:51:15 -08:00
James R. Barlow
1976dc6f30
Fix issue #121 “pop from empty list” (content stream parsing error)
2017-01-26 17:24:40 -08:00
James R. Barlow
097a69d07f
pageinfo: fix “decimal.InvalidOperation: quantize result has too many digits”
...
And add new test case for this.
2016-12-08 16:04:14 -08:00
James R. Barlow
949d2ff1c2
v4.3.1 release notes
2016-11-07 14:36:08 -08:00
James R. Barlow
cc9c0d819e
Add test case for documents that get rotated incorrectly after deskew
2016-11-07 14:15:03 -08:00
James R. Barlow
fdd9b8b8ce
Optimize some of the test resources to reduce file sizes
...
Mostly by reducing RGB -> monochrome and applying JBIG2 compression
2016-11-07 14:01:23 -08:00
James R. Barlow
a86805f0d9
Remove possibly non-free page from "multipage.pdf"
2016-10-27 15:56:43 -07:00
James R. Barlow
013c5a369f
Replace redacted file with an OCR-able file
2016-10-07 12:45:22 -07:00
James R. Barlow
6baf8668a6
Replace with non-free file milk.pdf with free equivalent
2016-10-06 13:10:28 -07:00
James R. Barlow
4ba2962c56
Comment on non-free files
2016-10-05 16:48:16 -07:00
James R. Barlow
4dad09cc91
resources/README: replace the other large table with a list table
2016-10-05 16:38:51 -07:00
James R. Barlow
825c0f8b2a
Note that milk.pdf is non-free, start using list-tables
2016-09-10 14:44:00 -07:00
James R. Barlow
9ca29c787b
Update description of masks.pdf to reflect what it actually tests
2016-09-01 21:21:14 -07:00
James R. Barlow
bf89e38c69
Add milk.pdf test case
2016-08-31 11:42:21 -07:00
James R. Barlow
d25397e2b0
Add test case for PDFs with masks and stencil masks
2016-08-26 15:03:27 -07:00
James R. Barlow
fef35e4eb2
Fix handling of DPI for rare case of JPEG recompression after deskew/clean
...
This test is exercised by page 4 of multipage.pdf. If all images are
JPEGs, and one of deskew/clean removes DPI information, make sure that
we can get the right information back and that the DPI stays square.
2016-07-29 01:34:52 -07:00
James R. Barlow
8f77576dc4
Fix non-square image resolution for "hocr" case; use img2pdf 0.2.1
...
Tesseract renderer not immediately fixable.
2016-07-28 16:43:51 -07:00
jbarlow83
1bacf35a2c
Update license information for encrypted_algo4.pdf
2016-06-24 14:25:15 -07:00
James R. Barlow
b4a734fc0d
Test case for "algorithm 4" test
...
Algorithm 4 -> PDF version 1.6
2016-06-23 13:21:26 -07:00
James R. Barlow
f3e06b2dbd
Add bookmarks to file for more testing
2016-02-29 00:05:07 -08:00
James R. Barlow
323b9a5f8e
Add other missing files
2016-02-20 05:34:21 -08:00
James R. Barlow
cab381a339
Add JPEG 2000 test case
2016-02-20 05:13:19 -08:00
James R. Barlow
8246cc0538
Gracefully recover from tesseract's failure to process very large images
...
And test cases to check this
2016-02-20 04:53:23 -08:00
James R. Barlow
812fd745b6
Remove redundant line from resources
2016-02-16 14:29:56 -08:00
James R. Barlow
1224af1780
Update test resources to address files with unknown source
...
-Remove Test_Issue_28.pdf (inherited from fritz-hh, source unknown)
-Replace missing_docinfo.pdf (received from user, but it's a printout of
a website; unclear status, so created a new PDF with the same effect)
-Others are okay
2016-02-16 00:28:28 -08:00
James R. Barlow
265d2ce39b
Better skewed image
2016-02-08 23:44:46 -08:00
James R. Barlow
3569c76c0f
Also include cardinal.pdf
2016-02-08 15:23:04 -08:00
James R. Barlow
9058dedfbe
New tests for ccitt, jbig2 encodings
2016-01-19 13:01:56 -08:00