Commit Graph

412 Commits

Author SHA1 Message Date
James R. Barlow
7566d4b768 Introduce plugins/filters 2019-05-27 16:55:04 -07:00
James R. Barlow
5c4c32ab3c Remove multiprocessing tests - no longer valid 2019-05-27 12:07:20 -07:00
James R. Barlow
c14f62752b Tests: add an API test 2019-05-25 16:24:09 -07:00
James R. Barlow
5cecb3ecb4 Convert one test to use API 2019-05-22 23:53:48 -07:00
James R. Barlow
32a076c039 Refactor validation and exceptions
CLI now tracks check_options exceptions. API now works more like
an API, without an exception handler,
because the caller should provide one.
2019-05-20 18:01:17 -07:00
James R. Barlow
ef1ef1cdf0 Fix test invalidated by Python 3.6 logging fixes 2019-05-17 15:20:07 -07:00
James R. Barlow
4340ad9f12 Update test cache 2019-05-17 01:45:06 -07:00
James R. Barlow
8df1ea2754 Mark some slow tests 2019-05-17 01:42:27 -07:00
James R. Barlow
e528adc603 pylint removal 2019-05-17 01:09:06 -07:00
James R. Barlow
13ab23ba54 Refactor weave_layers, introduce progress bar
Fixes a bug in this branch where --sidecar would fail by trying to iterator
the executor futures twice.
2019-05-16 14:57:31 -07:00
James R. Barlow
5e025c3382 Reinstate log level in messages to be closer to old behavior 2019-05-15 15:46:36 -07:00
James R. Barlow
486f73d5d6 Remove custom logger 2019-05-15 02:28:13 -07:00
James R. Barlow
c904b430b6 Merge master into api branch; all test pass 2019-05-14 16:33:02 -07:00
James R. Barlow
0a72c12ff0 weave: add new test for link consistency 2019-05-12 03:36:33 -07:00
James R. Barlow
482cb788ed Don't use MagicMock() as a dummy logger in pytest 2019-05-11 12:44:17 -07:00
James R. Barlow
15a988b999 weave: use emplacement method, scrap TOC repair
The new emplacement method updates page objects in place without
generating new objgen numbers, meaning we no longer need to update the table
of contents to preserve links.
2019-05-11 12:40:25 -07:00
James R. Barlow
bcdd196699 ghostscript: remove unnecessary post-render resizing step 2019-05-11 12:10:50 -07:00
James R. Barlow
58c29ffb5c weave: use explicit pdf.close(), drastically reduce open file handles
With the new pikepdf 1.2.0 we no longer need to hold file handles
open because of the "copy to memory" functionality. We retain
the behavior of closing/reopening the output PDF every 100 pages as
a way to limit memory usage.
2019-04-18 15:12:48 -07:00
mawi
c92ccc6134 fix: tests 2019-04-08 14:57:42 +02:00
mawi
783a128bd1 feat: move to sync (none ETL) implementation - remove ruffus 2019-04-04 21:02:38 +02:00
Martin Wind
a4667b5656 refactor: move ruffus related code to one file 2019-03-28 20:16:10 +01:00
Martin Wind
f65a3d3762 fix import in unpaper test 2019-03-26 10:04:26 +01:00
James R. Barlow
427afc0616 Fix LeptonicaErrorTrap when a sys.stderr.fileno() is not available
The LeptonicaErrorTrap was problematic for Celery and other
libraries that mess with stderr.

Closes #359
2019-03-17 14:22:36 -07:00
James R. Barlow
486dc7e22c Fix some test failures missed in prev commit 2019-03-06 13:28:50 -08:00
James R. Barlow
dc616bb507 Fix test suite so --clean is not requested when unpaper is not installed 2019-03-05 22:33:13 -08:00
James R. Barlow
5da26e4c9c Convert most uses of subprocess.Popen to subprocess.run in test suite 2019-03-05 22:25:22 -08:00
James R. Barlow
a27ee3ee8c optimize: use Decode to invert 1bpp PNGs for now 2019-03-03 17:50:12 -08:00
James R. Barlow
58e6663806 Update test cache for french->german change 2019-03-03 03:23:59 -08:00
James R. Barlow
3f1d9ef99c Fix tests for move to Alpine dockerfile 2019-02-26 12:30:21 -08:00
James R. Barlow
19e35db2b7 Fix issue when weave handoff occurs with no OCR font present
If using --tesseract-timeout 0 and any image processing on a file with
more than 100 pages, the weave handoff will occur. Ensure this
works correctly even if no Glyphless font is present.

Closes #347
2019-02-10 02:05:59 -08:00
James R. Barlow
df688742d5 Fix exception on traversing corrupt ToC entries 2019-02-10 00:50:21 -08:00
James R. Barlow
f095e91cb4 unpaper-args: add test case and harden feature 2019-02-07 16:21:02 -08:00
James R. Barlow
f34b3015b2 Prevent Ghostscript from generating invalid XMP metadata
If DocumentInfo contains NULs Ghostscript will generate XMP with
NULs which is not allowed. Repair DocumentInfo before Ghostscript sees it.
2019-01-04 13:20:41 -08:00
James R. Barlow
7d330afd81 Delinting 2019-01-02 13:34:45 -08:00
James R. Barlow
c771938907 Convert to f-strings where it makes sense 2018-12-31 15:01:19 -08:00
James R. Barlow
8c0009c5c8 Make pdfminer.six optional
Mainly since the current release of pdfminer.six lacks a sdist, blocking
homebrew packaging. Also in case other distros don't accept pdfminer.six.
2018-12-31 01:08:43 -08:00
James R. Barlow
cfc5cdf47d pdfa: remove a pile of deprecated code
It's now handled in pikepdf.
2018-12-31 00:05:13 -08:00
James R. Barlow
0880b16491 Sort imports with isort 2018-12-30 01:28:15 -08:00
James R. Barlow
06308a22ce Reformat with black 2018-12-30 01:27:49 -08:00
James R. Barlow
80bd7de580 Generate test cache 2018-12-30 01:02:37 -08:00
James R. Barlow
8b90c45437 Drop support for Tesseract 3 2018-12-30 00:47:12 -08:00
James R. Barlow
72b920eb16 Drop support for Python 3.5 2018-12-30 00:23:26 -08:00
James R. Barlow
b4a51907d6 Detect when metadata is dropped during PDF/A conversion 2018-12-30 00:13:25 -08:00
James R. Barlow
13d20bd993 pdfinfo: tolerate PDFs that overflow and underflow the graphics stack 2018-12-15 15:10:29 -08:00
James R. Barlow
ed9bb985e2 Fix pikepdf 0.9.0 2018-12-14 23:21:13 -08:00
James R. Barlow
632dab2cc0 Replace Ghostscript DOCINFO and fix 9.25 metadata date regression
We no longer use Ghostscript to manage PDF metadata, instead
omitting the DOCINFO segment from the pdfmark file we generate.

Instead all of the relevant metadata code has been migrated to pikepdf,
and we use that API. This should be more consistent and fixes the
Ghostscript version-depedent quirks.

Also removes our python-xmp-toolkit dependency, except for
testing.
2018-12-13 18:13:30 -08:00
James R. Barlow
414407fbd6 Deprecate encode/decode_pdf_date and remap to pikepdf version 2018-12-12 22:01:21 -08:00
James R. Barlow
9e6b54c7ed Add test case for Type3 fonts with no Unicode mapping 2018-11-15 21:54:26 -08:00
James R. Barlow
d3b334c10f Test case: true type font without Unicode mapping 2018-11-15 16:22:53 -08:00
James R. Barlow
cc7f2a3f02 Fix Python 3.5 pathlib regressions 2018-11-10 02:11:23 -08:00