Commit Graph

675 Commits

Author SHA1 Message Date
James R. Barlow
b00fe3dc5d pytest.skip() - remove kwarg entirely, to avoid breaking older pytest and not getting warns from newer pytest 2022-04-14 20:15:00 -07:00
James R. Barlow
e6aa3a4299 tests: explain why CacheOcrEngine needs lock 2022-04-05 16:16:51 -07:00
James R. Barlow
43302d7e12 Fix pytest.warns() on older pytest
Thanks @QuLogic
2022-04-05 16:02:50 -07:00
James Barlow
776ada6713 Upgrade pre-commit and associated tools; various lints 2022-04-03 20:53:01 -07:00
James Barlow
dfe31a2f6d Add lock to certain "with patch" cases
Switch to --use-threads seems to have broken tests that assumed they could
monkeypatch things. Although that's odd, since while we can have multiple
worker threads, we should never have
parallel tests in the same process.
2022-04-03 17:22:04 -07:00
James Barlow
0c43963d69 Fix pytest deprecation warnings 2022-04-03 13:30:58 -07:00
James Barlow
f29fe7f23e Fix Pillow deprecation warnings 2022-04-03 13:30:50 -07:00
James R. Barlow
13917c051c Disable oom killer test for --use-threads 2022-03-13 01:02:28 -08:00
James R. Barlow
514038d4ec optimize: recognize and produce [/FlateDecode /DCTDecode] images 2022-02-08 00:38:08 -08:00
James R. Barlow
3b406112d0 ghostscript: improve test coverage of error cases 2022-01-25 23:45:47 -08:00
James R. Barlow
2d0ac4707c Use better img2pdf settings where possible while supporting old versions
Fixes #894
2022-01-14 11:55:54 -08:00
James R. Barlow
ea69e868ed unpaper: issue warning if image too large to clean 2022-01-11 10:44:38 -08:00
James R. Barlow
ee21bf9ef6 Update cache 2021-12-13 20:45:30 -08:00
James R. Barlow
d48254d477 Fix issue with attempting to deskew a blank page on Tesseract 5
Closes #868
2021-12-10 21:48:09 -08:00
James R. Barlow
13af3252ff tests: simplify run_ocrmypdf API 2021-12-06 17:00:25 -08:00
James R. Barlow
6910c48b81 Fix test_outputtype_none on Windows and cleanup docs 2021-12-06 15:38:38 -08:00
James R. Barlow
e642dd4b35 Fix kill signal on Windows 2021-12-06 15:38:32 -08:00
James R. Barlow
9de06f62ee Use Python executors instead of pools
ProcessPool/ThreadPool don't have the ability to notice when a child worker
was terminated. ProcessPoolExecutor and ThreadPoolExecutor do notice and
provide better error messages.

Add tests to check.
2021-12-06 15:38:27 -08:00
James R. Barlow
8fdcb15b4e tests: improve typing and remove some legacy code 2021-12-06 15:38:27 -08:00
James R. Barlow
4c1ff1086c tess cache: don't include full platform - could be sensitive 2021-12-06 15:38:26 -08:00
James R. Barlow
f91faf9795 Add new argument --tesseract-thresholding to control tesseract thresholding where available
Also add missing test for --tesseract-oem
2021-12-06 15:38:14 -08:00
James R. Barlow
c75ff4687a Turning on Ghostscript interpolation changes this test
Seems acceptable. We don't normally use Ghostscript to downsample PDFs
like is happening in this test.
2021-11-15 16:36:24 -08:00
James R. Barlow
acc9d58c39 Skip no language test for Tess 5 2021-11-13 01:37:27 -08:00
James R. Barlow
e3126d2806 Adjust test to support Tesseract 5 working harder to find its files 2021-11-13 01:16:35 -08:00
James R. Barlow
f51164aff8 Upgrade test version of pymupdf 2021-11-13 00:53:41 -08:00
James R. Barlow
6f58a14351 pdfa: remove deprecated pkg_resources based access and tests 2021-11-13 00:52:03 -08:00
James R. Barlow
7ba04267b1 Remove shims to support for old versions of pikepdf < 4 2021-11-13 00:43:20 -08:00
James R. Barlow
380b981763 Remove most Python 3.6 special casing 2021-11-13 00:27:48 -08:00
James R. Barlow
5abfb14c2a Remove leptonica and cffi 2021-11-13 00:06:35 -08:00
James R. Barlow
036afc4d88 Update cache, related to previous apparently 2021-11-12 23:57:50 -08:00
James R. Barlow
59642a98b2 Disable --remove-background so we can remove leptonica 2021-11-12 23:56:52 -08:00
James R. Barlow
f8c6be2e26 test_rotation: replace leptonica test with Pillow channel ops
New function is likely not as robust but seems capable of inexact image comparison.
2021-11-12 23:49:38 -08:00
James R. Barlow
30440104ba Remove --threshold argument
Tesseract is now included better thresholding (binarization) in v5. Users that have
thresholding issues should try that first. If we find further problems
this can be brought back as a plugin.
2021-11-12 20:09:55 -08:00
James R. Barlow
b159e02110 Convert deskew to use degrees, since all our other angles are in degrees 2021-11-12 16:40:51 -08:00
James R. Barlow
a55ab05d16 Replace leptonica deskew with tesseract find skew and pillow rotate
Also rebuild the cache.
2021-11-12 16:35:08 -08:00
James R. Barlow
6c34d59836 tesseract: yet another version variant 2021-11-04 00:17:18 -07:00
James R. Barlow
690f88119d Fix test failures on pikepdf 3.2.0 + pybind11 2.8.0
When compiled without pybind11 2.8.0, pikepdf supplies a shim to implement
pikepdf._ObjectMapping.values() which has subtly different semantics
from a true dict-like objects; in particular it supports
next(objectmap.values())
where a standard dict requires
next(iter(objectmap.values()).

pybind11 2.8.0 now implements .values() properly, meaning some misuses of
protocol  in ocrmypdf fail.

If pybind11 < 2.8.0, pikepdf will
continue to offer its shim. If pybind11 >= 2.8.0, pikepdf does not add its shim.

Consequently no changes were needed in pikepdf.

Closes #843
2021-10-12 13:38:52 -07:00
James R. Barlow
78f391536b Offer hint to user to use --max-image-mpixels after decompression bob error
Closes #801
2021-10-06 00:19:11 -07:00
James R. Barlow
790d3022f6 Implement --output-type=none to skip producing the PDF and use only the sidecar
Closes #787
2021-09-26 01:07:34 -07:00
James R. Barlow
c725bf79da flake8 delinting 2021-09-21 16:37:03 -07:00
James R. Barlow
cb6c1939e9 typing: fix runtime issues 2021-08-27 02:18:54 -07:00
James R. Barlow
4eca0a165b pre-commit: pyupgrade modernizing 2021-08-26 18:04:38 -07:00
James R. Barlow
1b46481f7e pre-commit: add setup.cfg fmt 2021-08-26 17:59:40 -07:00
James R. Barlow
aa10a70d70 Rebuild test cache due to hocr output change 2021-08-01 01:00:05 -07:00
James R. Barlow
37923ffe52 Work around Pillow 8.3.1 DPI changes
Pillow decided against round-tripping DPI values.
https://github.com/python-pillow/Pillow/pull/5476

Fixes #802
2021-07-14 02:34:28 -07:00
James R. Barlow
5cba68b93d tests: Don't require symlink permissions on Windows
Some of tests required symlink permissions, which CI workers have but typical Windows
user accounts do not. Mostly these are just correctness tests.
2021-07-14 00:11:47 -07:00
James R. Barlow
5f01c5e330 Fix another species of Tesseract version number breaking regex
Fixes #795
2021-06-16 00:09:03 -07:00
James R. Barlow
7b1e5b4f41 Fix "invalid version number" for untagged tesseract versions
Fixes #770
2021-04-26 01:18:07 -07:00
James R. Barlow
757b72b0af Revert "Remove apparently unused portion of a test"
This reverts commit d89a633ba7.
2021-04-16 00:21:11 -07:00
James R. Barlow
d673126994 Fix ZeroDivisionError on files containing images drawn at scale 0
Fixes #761
2021-04-15 23:26:14 -07:00