Commit Graph

311 Commits

Author SHA1 Message Date
James R. Barlow
bf96171b65 Ignore whether or not textonly_pdf was used in cache
The difference doesn't matter in 7.0.0 anymore.
2018-06-23 02:58:26 -07:00
James R. Barlow
b81daf71d1 Regenerate test cache 2018-06-23 02:02:58 -07:00
James R. Barlow
faad1fc58a Reactivate two tests that weren't using their fixtures properly 2018-06-23 01:54:09 -07:00
James R. Barlow
6f48181a56 Disable a pylint 2018-06-23 01:53:04 -07:00
James R. Barlow
807c8b0726 Trailing whitespace 2018-06-23 01:51:19 -07:00
James R. Barlow
b0dbaeafc5 Cleanup unused imports 2018-06-23 01:47:53 -07:00
James R. Barlow
2530d1791b Fix several pylint errors and warnings 2018-06-23 00:54:22 -07:00
James R. Barlow
94150f414a Remove qpdf.merge
We no longer need to merge pages this way. Much of the functionality
was there to implement page splitting without hitting ulimit which
will be fixed in qpdf > 8.0.2. The tests were expensive to run.

Also remove pytest-timeout since it breaks the Linux build.
2018-06-23 00:45:03 -07:00
James R. Barlow
76e7e8dbbb Replace several uses of str(path) with fspath(path)
Helps make it more explicit. Did not do this to tests because use of paths
is more involved there.
2018-06-22 21:00:47 -07:00
James R. Barlow
9e765ddf46 Rename _optimize to optimize.py 2018-06-22 17:51:57 -07:00
James R. Barlow
73431d9761 Remove obsolete _naive_find_text 2018-06-13 14:00:50 -07:00
James R. Barlow
45cb4525cf Remove other references to PyMuPDF 2018-06-13 01:02:53 -07:00
James R. Barlow
9608b22d34 Remove all uses of PyPDF2 except PDF/A check
Leave PDF/A check alone for now, since pikepdf has no equivalent.
2018-05-26 02:07:18 -07:00
James R. Barlow
78a686ecb4 Consider qpdf behavior on algo4 a pass
qpdf opens files with null user password, so do the same.
2018-05-25 00:33:31 -07:00
James R. Barlow
0a04a60f69 Document need for pdfinfo to be pickleable 2018-05-24 22:24:13 -07:00
James R. Barlow
68d8642988 Found out this test was extremely slow - no reason to actual use a large file 2018-05-24 22:22:51 -07:00
James R. Barlow
16f70ff054 Main changeset for pikepdf-based refactor pdfinfo 2018-05-24 22:22:01 -07:00
James R. Barlow
786a2ad65a Make optimize test do a little more 2018-05-18 17:50:39 -07:00
James R. Barlow
0c279b01a4 Fix test failure on missing JobContext 2018-05-17 01:16:58 -07:00
James R. Barlow
3b820ffa7b test_metadata: change from xfail to skipif without fitz 2018-05-17 00:14:57 -07:00
James R. Barlow
5e20d1d554 metadata: Fix failing test on __getitem__['/CreationDate'] 2018-05-16 13:46:07 -07:00
James R. Barlow
6171de41bf optimize: move a lot of image scanning code to pikepdf 2018-05-14 22:21:53 -07:00
James R. Barlow
3254315127 Update test cache 2018-05-11 12:19:50 -07:00
James R. Barlow
ca297fd26b Update tests 2018-05-11 02:33:44 -07:00
James R. Barlow
72253d09fa Add arguments to control optimization 2018-05-10 22:23:24 -07:00
James R. Barlow
24b0adfacc Merge branch 'master' into develop 2018-05-10 20:54:55 -07:00
James R. Barlow
acc6698ab3 Make XML metadata test actually work 2018-05-10 20:37:10 -07:00
James R. Barlow
606d3e6aa1 Remove tests that exercise obsolete features (tesseract, -g) 2018-05-10 20:33:32 -07:00
James R. Barlow
687a7954d6 test_main: uses leptonica 2018-05-10 19:05:31 -07:00
James R. Barlow
abed8e034e Add metadata preservation test from stash 2018-05-10 16:43:28 -07:00
James R. Barlow
b8f3ead541 Remove tesseract renderer entirely
Grafting lets us work with older Tesseract versions as if they could use
sandwich, so there is no point in keeping it. It's been deprecated for a
long time now anyway.
2018-05-10 14:06:13 -07:00
James R. Barlow
9226f8a5d1 Trap PDF/A-3 errors on old Ghostscript 2018-05-04 15:29:43 -07:00
James R. Barlow
7cf83c77ca Merge branch 'feature/pdfa3' 2018-05-03 16:45:57 -07:00
James R. Barlow
8a9f174f63 Fix XMP validation issue with /CreationDate
Related to previous validation issue. If the /CreationDate had no
timezone, Ghostscript also creates invalid metadata. Work around this.
Also fix up PDF date decoding, and transcode dates to standardize them.
2018-05-03 16:30:20 -07:00
James R. Barlow
76276f61e5 Split out rotation related tests 2018-05-01 23:51:35 -07:00
James R. Barlow
bfd26e6ec6 Tests: confirm OCR layer copied 2018-05-01 23:16:41 -07:00
James R. Barlow
b5d7e9cbb0 Fix all issues with rotations
All tests now pass
2018-05-01 22:50:20 -07:00
James R. Barlow
a9abe13185 Remove the old tesseract pdf_renderer 2018-05-01 17:31:34 -07:00
James R. Barlow
6b315e8315 Add ability to disable cache 2018-05-01 15:52:00 -07:00
James R. Barlow
2131ad4670 Fix --remove-background error on PDFs with colormapped images
It's unclear how exactly a
colormapped image gets to this
spot given the tendency of other
image processing tools to flatten
such images, but someone made it happen, so now we make sure
the image is okay.

Closes #262
2018-04-27 17:21:01 -07:00
James R. Barlow
219fe2155b test_pageinfo: remove duplicate import 2018-04-27 17:16:42 -07:00
James R. Barlow
0934905493 Don't suppress error message from config_notfound
Since it showed up in s390x bionic
2018-04-25 21:58:18 -07:00
James R. Barlow
df87e21c85 Add support for PDF/A-3
No ability to attach files however
2018-04-20 00:06:55 -07:00
Hugo
d761d80750 Use more standard __version__ rather than PILLOW_VERSION (#257) 2018-04-19 23:35:32 -07:00
James R. Barlow
0b10db91be Fix regression: Disable Ghostscript JPEG passthrough entirely 2018-04-17 17:00:24 -07:00
James R. Barlow
1a516b2af9 Fix regression: time stamp test suite failures 2018-04-17 16:59:21 -07:00
James R. Barlow
7368399f8b Clarify license of two test files - https://github.com/jbarlow83/OCRmyPDF/issues/254 2018-04-17 11:56:36 -07:00
James R. Barlow
34c78a892a Fix list table for tests/resources
[ci skip]
2018-04-15 23:52:19 -07:00
James R. Barlow
10aa59f674 v6.1.4 fix test suite regression with Ghostscript 9.23 2018-04-12 15:16:54 -07:00
James R. Barlow
ba0535e3fb Update test cache to account for unpaper --layout none change 2018-04-12 00:48:21 -07:00