James R. Barlow
b00fe3dc5d
pytest.skip() - remove kwarg entirely, to avoid breaking older pytest and not getting warns from newer pytest
2022-04-14 20:15:00 -07:00
James R. Barlow
e6aa3a4299
tests: explain why CacheOcrEngine needs lock
2022-04-05 16:16:51 -07:00
James R. Barlow
43302d7e12
Fix pytest.warns() on older pytest
...
Thanks @QuLogic
2022-04-05 16:02:50 -07:00
James Barlow
776ada6713
Upgrade pre-commit and associated tools; various lints
2022-04-03 20:53:01 -07:00
James Barlow
dfe31a2f6d
Add lock to certain "with patch" cases
...
Switch to --use-threads seems to have broken tests that assumed they could
monkeypatch things. Although that's odd, since while we can have multiple
worker threads, we should never have
parallel tests in the same process.
2022-04-03 17:22:04 -07:00
James Barlow
0c43963d69
Fix pytest deprecation warnings
2022-04-03 13:30:58 -07:00
James Barlow
f29fe7f23e
Fix Pillow deprecation warnings
2022-04-03 13:30:50 -07:00
James R. Barlow
13917c051c
Disable oom killer test for --use-threads
2022-03-13 01:02:28 -08:00
James R. Barlow
514038d4ec
optimize: recognize and produce [/FlateDecode /DCTDecode] images
2022-02-08 00:38:08 -08:00
James R. Barlow
3b406112d0
ghostscript: improve test coverage of error cases
2022-01-25 23:45:47 -08:00
James R. Barlow
2d0ac4707c
Use better img2pdf settings where possible while supporting old versions
...
Fixes #894
2022-01-14 11:55:54 -08:00
James R. Barlow
ea69e868ed
unpaper: issue warning if image too large to clean
2022-01-11 10:44:38 -08:00
James R. Barlow
ee21bf9ef6
Update cache
2021-12-13 20:45:30 -08:00
James R. Barlow
d48254d477
Fix issue with attempting to deskew a blank page on Tesseract 5
...
Closes #868
2021-12-10 21:48:09 -08:00
James R. Barlow
13af3252ff
tests: simplify run_ocrmypdf API
2021-12-06 17:00:25 -08:00
James R. Barlow
6910c48b81
Fix test_outputtype_none on Windows and cleanup docs
2021-12-06 15:38:38 -08:00
James R. Barlow
e642dd4b35
Fix kill signal on Windows
2021-12-06 15:38:32 -08:00
James R. Barlow
9de06f62ee
Use Python executors instead of pools
...
ProcessPool/ThreadPool don't have the ability to notice when a child worker
was terminated. ProcessPoolExecutor and ThreadPoolExecutor do notice and
provide better error messages.
Add tests to check.
2021-12-06 15:38:27 -08:00
James R. Barlow
8fdcb15b4e
tests: improve typing and remove some legacy code
2021-12-06 15:38:27 -08:00
James R. Barlow
4c1ff1086c
tess cache: don't include full platform - could be sensitive
2021-12-06 15:38:26 -08:00
James R. Barlow
f91faf9795
Add new argument --tesseract-thresholding to control tesseract thresholding where available
...
Also add missing test for --tesseract-oem
2021-12-06 15:38:14 -08:00
James R. Barlow
c75ff4687a
Turning on Ghostscript interpolation changes this test
...
Seems acceptable. We don't normally use Ghostscript to downsample PDFs
like is happening in this test.
2021-11-15 16:36:24 -08:00
James R. Barlow
acc9d58c39
Skip no language test for Tess 5
2021-11-13 01:37:27 -08:00
James R. Barlow
e3126d2806
Adjust test to support Tesseract 5 working harder to find its files
2021-11-13 01:16:35 -08:00
James R. Barlow
f51164aff8
Upgrade test version of pymupdf
2021-11-13 00:53:41 -08:00
James R. Barlow
6f58a14351
pdfa: remove deprecated pkg_resources based access and tests
2021-11-13 00:52:03 -08:00
James R. Barlow
7ba04267b1
Remove shims to support for old versions of pikepdf < 4
2021-11-13 00:43:20 -08:00
James R. Barlow
380b981763
Remove most Python 3.6 special casing
2021-11-13 00:27:48 -08:00
James R. Barlow
5abfb14c2a
Remove leptonica and cffi
2021-11-13 00:06:35 -08:00
James R. Barlow
036afc4d88
Update cache, related to previous apparently
2021-11-12 23:57:50 -08:00
James R. Barlow
59642a98b2
Disable --remove-background so we can remove leptonica
2021-11-12 23:56:52 -08:00
James R. Barlow
f8c6be2e26
test_rotation: replace leptonica test with Pillow channel ops
...
New function is likely not as robust but seems capable of inexact image comparison.
2021-11-12 23:49:38 -08:00
James R. Barlow
30440104ba
Remove --threshold argument
...
Tesseract is now included better thresholding (binarization) in v5. Users that have
thresholding issues should try that first. If we find further problems
this can be brought back as a plugin.
2021-11-12 20:09:55 -08:00
James R. Barlow
b159e02110
Convert deskew to use degrees, since all our other angles are in degrees
2021-11-12 16:40:51 -08:00
James R. Barlow
a55ab05d16
Replace leptonica deskew with tesseract find skew and pillow rotate
...
Also rebuild the cache.
2021-11-12 16:35:08 -08:00
James R. Barlow
6c34d59836
tesseract: yet another version variant
2021-11-04 00:17:18 -07:00
James R. Barlow
690f88119d
Fix test failures on pikepdf 3.2.0 + pybind11 2.8.0
...
When compiled without pybind11 2.8.0, pikepdf supplies a shim to implement
pikepdf._ObjectMapping.values() which has subtly different semantics
from a true dict-like objects; in particular it supports
next(objectmap.values())
where a standard dict requires
next(iter(objectmap.values()).
pybind11 2.8.0 now implements .values() properly, meaning some misuses of
protocol in ocrmypdf fail.
If pybind11 < 2.8.0, pikepdf will
continue to offer its shim. If pybind11 >= 2.8.0, pikepdf does not add its shim.
Consequently no changes were needed in pikepdf.
Closes #843
2021-10-12 13:38:52 -07:00
James R. Barlow
78f391536b
Offer hint to user to use --max-image-mpixels after decompression bob error
...
Closes #801
2021-10-06 00:19:11 -07:00
James R. Barlow
790d3022f6
Implement --output-type=none to skip producing the PDF and use only the sidecar
...
Closes #787
2021-09-26 01:07:34 -07:00
James R. Barlow
c725bf79da
flake8 delinting
2021-09-21 16:37:03 -07:00
James R. Barlow
cb6c1939e9
typing: fix runtime issues
2021-08-27 02:18:54 -07:00
James R. Barlow
4eca0a165b
pre-commit: pyupgrade modernizing
2021-08-26 18:04:38 -07:00
James R. Barlow
1b46481f7e
pre-commit: add setup.cfg fmt
2021-08-26 17:59:40 -07:00
James R. Barlow
aa10a70d70
Rebuild test cache due to hocr output change
2021-08-01 01:00:05 -07:00
James R. Barlow
37923ffe52
Work around Pillow 8.3.1 DPI changes
...
Pillow decided against round-tripping DPI values.
https://github.com/python-pillow/Pillow/pull/5476
Fixes #802
2021-07-14 02:34:28 -07:00
James R. Barlow
5cba68b93d
tests: Don't require symlink permissions on Windows
...
Some of tests required symlink permissions, which CI workers have but typical Windows
user accounts do not. Mostly these are just correctness tests.
2021-07-14 00:11:47 -07:00
James R. Barlow
5f01c5e330
Fix another species of Tesseract version number breaking regex
...
Fixes #795
2021-06-16 00:09:03 -07:00
James R. Barlow
7b1e5b4f41
Fix "invalid version number" for untagged tesseract versions
...
Fixes #770
2021-04-26 01:18:07 -07:00
James R. Barlow
757b72b0af
Revert "Remove apparently unused portion of a test"
...
This reverts commit d89a633ba7 .
2021-04-16 00:21:11 -07:00
James R. Barlow
d673126994
Fix ZeroDivisionError on files containing images drawn at scale 0
...
Fixes #761
2021-04-15 23:26:14 -07:00