James R. Barlow
a5efc4af9b
unpaper: replace input pnm with png
...
Unpaper or its underlying libraries don't seem to accept pnms with an
odd integer width. Although it's not clear if this is the issue at all.
In any case, keeping the image a PNG works around the issue. unpaper
only accepted PNM input in the past, which is why we send it PNM.
Since it now accepts PNG, we might as well use PNG.
Unpaper can write PNG as output too, but this added a few seconds to
the test suite was not committed.
Related issues:
https://github.com/ocrmypdf/OCRmyPDF/issues/887
https://github.com/ocrmypdf/OCRmyPDF/issues/665
https://github.com/unpaper/unpaper/issues/82
2022-07-03 15:32:16 -07:00
James R. Barlow
61600111d3
test_pdfinfo: refactor by extracting fixtures
2022-06-18 16:29:57 -07:00
James R. Barlow
17a5b8b43c
Refactor reporting of optimization failures
2022-06-13 01:30:15 -07:00
James R. Barlow
13d11e76e5
optimize plugin: solve linearization and "is optimization enabled?" issues
2022-06-13 00:59:41 -07:00
James R. Barlow
61069660a2
Move optimization options to plugin
2022-06-12 02:42:16 -07:00
James R. Barlow
3d4f80639d
Remove test that is now always skipped
2022-06-12 00:31:01 -07:00
James R. Barlow
b17fb61389
Configure pylint in pyproject and delint
2022-06-12 00:30:44 -07:00
James R. Barlow
0ac15dd0b2
Suppress libxmp DeprecationWarning during test
2022-06-01 00:46:16 -07:00
James R. Barlow
33cdabaf65
tests: account for test that expected pngquant for windows
2022-05-26 13:52:22 -07:00
James R. Barlow
5d0cc0a092
tests: Extract some test fixtures for better clarity
2022-05-26 00:57:31 -07:00
James R. Barlow
6c427f82ea
Add test case for corrupt ICC profiles
2022-05-26 00:41:19 -07:00
James R. Barlow
b00fe3dc5d
pytest.skip() - remove kwarg entirely, to avoid breaking older pytest and not getting warns from newer pytest
2022-04-14 20:15:00 -07:00
James R. Barlow
e6aa3a4299
tests: explain why CacheOcrEngine needs lock
2022-04-05 16:16:51 -07:00
James R. Barlow
43302d7e12
Fix pytest.warns() on older pytest
...
Thanks @QuLogic
2022-04-05 16:02:50 -07:00
James Barlow
776ada6713
Upgrade pre-commit and associated tools; various lints
2022-04-03 20:53:01 -07:00
James Barlow
dfe31a2f6d
Add lock to certain "with patch" cases
...
Switch to --use-threads seems to have broken tests that assumed they could
monkeypatch things. Although that's odd, since while we can have multiple
worker threads, we should never have
parallel tests in the same process.
2022-04-03 17:22:04 -07:00
James Barlow
0c43963d69
Fix pytest deprecation warnings
2022-04-03 13:30:58 -07:00
James Barlow
f29fe7f23e
Fix Pillow deprecation warnings
2022-04-03 13:30:50 -07:00
James R. Barlow
13917c051c
Disable oom killer test for --use-threads
2022-03-13 01:02:28 -08:00
James R. Barlow
514038d4ec
optimize: recognize and produce [/FlateDecode /DCTDecode] images
2022-02-08 00:38:08 -08:00
James R. Barlow
3b406112d0
ghostscript: improve test coverage of error cases
2022-01-25 23:45:47 -08:00
James R. Barlow
2d0ac4707c
Use better img2pdf settings where possible while supporting old versions
...
Fixes #894
2022-01-14 11:55:54 -08:00
James R. Barlow
ea69e868ed
unpaper: issue warning if image too large to clean
2022-01-11 10:44:38 -08:00
James R. Barlow
ee21bf9ef6
Update cache
2021-12-13 20:45:30 -08:00
James R. Barlow
d48254d477
Fix issue with attempting to deskew a blank page on Tesseract 5
...
Closes #868
2021-12-10 21:48:09 -08:00
James R. Barlow
13af3252ff
tests: simplify run_ocrmypdf API
2021-12-06 17:00:25 -08:00
James R. Barlow
6910c48b81
Fix test_outputtype_none on Windows and cleanup docs
2021-12-06 15:38:38 -08:00
James R. Barlow
e642dd4b35
Fix kill signal on Windows
2021-12-06 15:38:32 -08:00
James R. Barlow
9de06f62ee
Use Python executors instead of pools
...
ProcessPool/ThreadPool don't have the ability to notice when a child worker
was terminated. ProcessPoolExecutor and ThreadPoolExecutor do notice and
provide better error messages.
Add tests to check.
2021-12-06 15:38:27 -08:00
James R. Barlow
8fdcb15b4e
tests: improve typing and remove some legacy code
2021-12-06 15:38:27 -08:00
James R. Barlow
4c1ff1086c
tess cache: don't include full platform - could be sensitive
2021-12-06 15:38:26 -08:00
James R. Barlow
f91faf9795
Add new argument --tesseract-thresholding to control tesseract thresholding where available
...
Also add missing test for --tesseract-oem
2021-12-06 15:38:14 -08:00
James R. Barlow
c75ff4687a
Turning on Ghostscript interpolation changes this test
...
Seems acceptable. We don't normally use Ghostscript to downsample PDFs
like is happening in this test.
2021-11-15 16:36:24 -08:00
James R. Barlow
acc9d58c39
Skip no language test for Tess 5
2021-11-13 01:37:27 -08:00
James R. Barlow
e3126d2806
Adjust test to support Tesseract 5 working harder to find its files
2021-11-13 01:16:35 -08:00
James R. Barlow
f51164aff8
Upgrade test version of pymupdf
2021-11-13 00:53:41 -08:00
James R. Barlow
6f58a14351
pdfa: remove deprecated pkg_resources based access and tests
2021-11-13 00:52:03 -08:00
James R. Barlow
7ba04267b1
Remove shims to support for old versions of pikepdf < 4
2021-11-13 00:43:20 -08:00
James R. Barlow
380b981763
Remove most Python 3.6 special casing
2021-11-13 00:27:48 -08:00
James R. Barlow
5abfb14c2a
Remove leptonica and cffi
2021-11-13 00:06:35 -08:00
James R. Barlow
036afc4d88
Update cache, related to previous apparently
2021-11-12 23:57:50 -08:00
James R. Barlow
59642a98b2
Disable --remove-background so we can remove leptonica
2021-11-12 23:56:52 -08:00
James R. Barlow
f8c6be2e26
test_rotation: replace leptonica test with Pillow channel ops
...
New function is likely not as robust but seems capable of inexact image comparison.
2021-11-12 23:49:38 -08:00
James R. Barlow
30440104ba
Remove --threshold argument
...
Tesseract is now included better thresholding (binarization) in v5. Users that have
thresholding issues should try that first. If we find further problems
this can be brought back as a plugin.
2021-11-12 20:09:55 -08:00
James R. Barlow
b159e02110
Convert deskew to use degrees, since all our other angles are in degrees
2021-11-12 16:40:51 -08:00
James R. Barlow
a55ab05d16
Replace leptonica deskew with tesseract find skew and pillow rotate
...
Also rebuild the cache.
2021-11-12 16:35:08 -08:00
James R. Barlow
6c34d59836
tesseract: yet another version variant
2021-11-04 00:17:18 -07:00
James R. Barlow
690f88119d
Fix test failures on pikepdf 3.2.0 + pybind11 2.8.0
...
When compiled without pybind11 2.8.0, pikepdf supplies a shim to implement
pikepdf._ObjectMapping.values() which has subtly different semantics
from a true dict-like objects; in particular it supports
next(objectmap.values())
where a standard dict requires
next(iter(objectmap.values()).
pybind11 2.8.0 now implements .values() properly, meaning some misuses of
protocol in ocrmypdf fail.
If pybind11 < 2.8.0, pikepdf will
continue to offer its shim. If pybind11 >= 2.8.0, pikepdf does not add its shim.
Consequently no changes were needed in pikepdf.
Closes #843
2021-10-12 13:38:52 -07:00
James R. Barlow
78f391536b
Offer hint to user to use --max-image-mpixels after decompression bob error
...
Closes #801
2021-10-06 00:19:11 -07:00
James R. Barlow
790d3022f6
Implement --output-type=none to skip producing the PDF and use only the sidecar
...
Closes #787
2021-09-26 01:07:34 -07:00