James R. Barlow
7566d4b768
Introduce plugins/filters
2019-05-27 16:55:04 -07:00
James R. Barlow
5c4c32ab3c
Remove multiprocessing tests - no longer valid
2019-05-27 12:07:20 -07:00
James R. Barlow
c14f62752b
Tests: add an API test
2019-05-25 16:24:09 -07:00
James R. Barlow
5cecb3ecb4
Convert one test to use API
2019-05-22 23:53:48 -07:00
James R. Barlow
32a076c039
Refactor validation and exceptions
...
CLI now tracks check_options exceptions. API now works more like
an API, without an exception handler,
because the caller should provide one.
2019-05-20 18:01:17 -07:00
James R. Barlow
ef1ef1cdf0
Fix test invalidated by Python 3.6 logging fixes
2019-05-17 15:20:07 -07:00
James R. Barlow
4340ad9f12
Update test cache
2019-05-17 01:45:06 -07:00
James R. Barlow
8df1ea2754
Mark some slow tests
2019-05-17 01:42:27 -07:00
James R. Barlow
e528adc603
pylint removal
2019-05-17 01:09:06 -07:00
James R. Barlow
13ab23ba54
Refactor weave_layers, introduce progress bar
...
Fixes a bug in this branch where --sidecar would fail by trying to iterator
the executor futures twice.
2019-05-16 14:57:31 -07:00
James R. Barlow
5e025c3382
Reinstate log level in messages to be closer to old behavior
2019-05-15 15:46:36 -07:00
James R. Barlow
486f73d5d6
Remove custom logger
2019-05-15 02:28:13 -07:00
James R. Barlow
c904b430b6
Merge master into api branch; all test pass
2019-05-14 16:33:02 -07:00
James R. Barlow
0a72c12ff0
weave: add new test for link consistency
2019-05-12 03:36:33 -07:00
James R. Barlow
482cb788ed
Don't use MagicMock() as a dummy logger in pytest
2019-05-11 12:44:17 -07:00
James R. Barlow
15a988b999
weave: use emplacement method, scrap TOC repair
...
The new emplacement method updates page objects in place without
generating new objgen numbers, meaning we no longer need to update the table
of contents to preserve links.
2019-05-11 12:40:25 -07:00
James R. Barlow
bcdd196699
ghostscript: remove unnecessary post-render resizing step
2019-05-11 12:10:50 -07:00
James R. Barlow
58c29ffb5c
weave: use explicit pdf.close(), drastically reduce open file handles
...
With the new pikepdf 1.2.0 we no longer need to hold file handles
open because of the "copy to memory" functionality. We retain
the behavior of closing/reopening the output PDF every 100 pages as
a way to limit memory usage.
2019-04-18 15:12:48 -07:00
mawi
c92ccc6134
fix: tests
2019-04-08 14:57:42 +02:00
mawi
783a128bd1
feat: move to sync (none ETL) implementation - remove ruffus
2019-04-04 21:02:38 +02:00
Martin Wind
a4667b5656
refactor: move ruffus related code to one file
2019-03-28 20:16:10 +01:00
Martin Wind
f65a3d3762
fix import in unpaper test
2019-03-26 10:04:26 +01:00
James R. Barlow
427afc0616
Fix LeptonicaErrorTrap when a sys.stderr.fileno() is not available
...
The LeptonicaErrorTrap was problematic for Celery and other
libraries that mess with stderr.
Closes #359
2019-03-17 14:22:36 -07:00
James R. Barlow
486dc7e22c
Fix some test failures missed in prev commit
2019-03-06 13:28:50 -08:00
James R. Barlow
dc616bb507
Fix test suite so --clean is not requested when unpaper is not installed
2019-03-05 22:33:13 -08:00
James R. Barlow
5da26e4c9c
Convert most uses of subprocess.Popen to subprocess.run in test suite
2019-03-05 22:25:22 -08:00
James R. Barlow
a27ee3ee8c
optimize: use Decode to invert 1bpp PNGs for now
2019-03-03 17:50:12 -08:00
James R. Barlow
58e6663806
Update test cache for french->german change
2019-03-03 03:23:59 -08:00
James R. Barlow
3f1d9ef99c
Fix tests for move to Alpine dockerfile
2019-02-26 12:30:21 -08:00
James R. Barlow
19e35db2b7
Fix issue when weave handoff occurs with no OCR font present
...
If using --tesseract-timeout 0 and any image processing on a file with
more than 100 pages, the weave handoff will occur. Ensure this
works correctly even if no Glyphless font is present.
Closes #347
2019-02-10 02:05:59 -08:00
James R. Barlow
df688742d5
Fix exception on traversing corrupt ToC entries
2019-02-10 00:50:21 -08:00
James R. Barlow
f095e91cb4
unpaper-args: add test case and harden feature
2019-02-07 16:21:02 -08:00
James R. Barlow
f34b3015b2
Prevent Ghostscript from generating invalid XMP metadata
...
If DocumentInfo contains NULs Ghostscript will generate XMP with
NULs which is not allowed. Repair DocumentInfo before Ghostscript sees it.
2019-01-04 13:20:41 -08:00
James R. Barlow
7d330afd81
Delinting
2019-01-02 13:34:45 -08:00
James R. Barlow
c771938907
Convert to f-strings where it makes sense
2018-12-31 15:01:19 -08:00
James R. Barlow
8c0009c5c8
Make pdfminer.six optional
...
Mainly since the current release of pdfminer.six lacks a sdist, blocking
homebrew packaging. Also in case other distros don't accept pdfminer.six.
2018-12-31 01:08:43 -08:00
James R. Barlow
cfc5cdf47d
pdfa: remove a pile of deprecated code
...
It's now handled in pikepdf.
2018-12-31 00:05:13 -08:00
James R. Barlow
0880b16491
Sort imports with isort
2018-12-30 01:28:15 -08:00
James R. Barlow
06308a22ce
Reformat with black
2018-12-30 01:27:49 -08:00
James R. Barlow
80bd7de580
Generate test cache
2018-12-30 01:02:37 -08:00
James R. Barlow
8b90c45437
Drop support for Tesseract 3
2018-12-30 00:47:12 -08:00
James R. Barlow
72b920eb16
Drop support for Python 3.5
2018-12-30 00:23:26 -08:00
James R. Barlow
b4a51907d6
Detect when metadata is dropped during PDF/A conversion
2018-12-30 00:13:25 -08:00
James R. Barlow
13d20bd993
pdfinfo: tolerate PDFs that overflow and underflow the graphics stack
2018-12-15 15:10:29 -08:00
James R. Barlow
ed9bb985e2
Fix pikepdf 0.9.0
2018-12-14 23:21:13 -08:00
James R. Barlow
632dab2cc0
Replace Ghostscript DOCINFO and fix 9.25 metadata date regression
...
We no longer use Ghostscript to manage PDF metadata, instead
omitting the DOCINFO segment from the pdfmark file we generate.
Instead all of the relevant metadata code has been migrated to pikepdf,
and we use that API. This should be more consistent and fixes the
Ghostscript version-depedent quirks.
Also removes our python-xmp-toolkit dependency, except for
testing.
2018-12-13 18:13:30 -08:00
James R. Barlow
414407fbd6
Deprecate encode/decode_pdf_date and remap to pikepdf version
2018-12-12 22:01:21 -08:00
James R. Barlow
9e6b54c7ed
Add test case for Type3 fonts with no Unicode mapping
2018-11-15 21:54:26 -08:00
James R. Barlow
d3b334c10f
Test case: true type font without Unicode mapping
2018-11-15 16:22:53 -08:00
James R. Barlow
cc7f2a3f02
Fix Python 3.5 pathlib regressions
2018-11-10 02:11:23 -08:00