Commit Graph

504 Commits

Author SHA1 Message Date
James R. Barlow
4a640b8dcd Fix language argument not working as list
Fixes #523
2020-04-14 23:18:52 -07:00
James R. Barlow
9471bc8921 Fix versions with leading v, e.g. v5.0 2020-04-10 13:42:33 -07:00
James R. Barlow
d13d70fd56 Fix version checker failing for qpdf 10.0.0
Fixes #527
2020-04-10 13:00:19 -07:00
James R. Barlow
23bc3d3a29 tests: workaround for Ghostscript 9.52 txtwrite problem 2020-03-29 22:45:16 -07:00
James R. Barlow
8307832ce9 tests: add force OCR to a file with text that Ghostscript doesn't see
For gs 9.52 support.

Also refactor use of pikepdf.open() to use with blocks.
2020-03-29 22:44:27 -07:00
James R. Barlow
378e4dae3b Expand documentation for subprocess.run() from test 2020-03-04 13:37:44 -08:00
James R. Barlow
b3b61c152c Handle malformed DocumentInfo (#497)
User submitted a PDF in which /Trailer /Info pointed to the XMP metadata
block instead of a DocumentInfo dictionary. Fix and add test.
2020-03-03 03:27:01 -08:00
James R. Barlow
4a27124eab Simplify metadata for invalid xml in output
Removes possibly non-free resource enron1.pdf.
2020-02-12 00:07:18 -08:00
James R. Barlow
ce97af5a79 Add OCR quality measurement API 2020-01-17 03:10:27 -08:00
James R. Barlow
61a2674317 Skip test that needs chmod when on Windows 2020-01-06 02:36:04 -08:00
James R. Barlow
9ad8cbf1f6 Fix assert that depends on POSIX-y file handling 2020-01-06 02:02:05 -08:00
James R. Barlow
9c5f0d0ec6 Eliminate last use of PyPDF2 from test suite 2020-01-04 16:32:01 -08:00
James R. Barlow
32041c43e1 tests: improve tesseract coverage 2020-01-04 02:35:14 -08:00
James R. Barlow
1037d73efb tests: use smaller files for ghostscript 2019-12-31 17:20:28 -08:00
James R. Barlow
aeb7b142a9 tests: skip tests not compatible with coverage
For reasons not entirely clear, stdout will get some data injected when
pytest-cov is running. Our tests that
check for clean stdout need to ignore this.

We check for an environment variable that is defined only when coverage is
running.
2019-12-31 17:10:51 -08:00
James R. Barlow
422ea9777e Remove session scope from fixtures
pytest seems to prepare os.environ in complex ways, so we want to ensure
these fixtures are not reused.
2019-12-31 17:09:23 -08:00
James R. Barlow
2f1c743227 Rewrite main pool loop
pytest-cov documentation recommends using explicit
management of multiprocessing.Pool rather than the context manager.
This is supposed to work better for collecting coverage data, particularly
on Windows.
2019-12-31 16:23:41 -08:00
James R. Barlow
96ee21aee9 Try to set up subprocess coverage better 2019-12-31 15:39:45 -08:00
James R. Barlow
4b759af6ff tests: fix problems with ghostscript spoofers 2019-12-31 15:33:03 -08:00
James R. Barlow
25d2b0cda4 test: environment warnings/cleanup 2019-12-30 22:38:50 -08:00
James R. Barlow
c36e9950ae tests: test TqdmConsole 2019-12-30 17:51:09 -08:00
James R. Barlow
0c0d53b10f tests: AcroForm test case did not work correctly; fixed 2019-12-30 17:50:32 -08:00
James R. Barlow
63de7e1677 Improve error message for unreadable input files 2019-12-30 16:14:52 -08:00
James R. Barlow
b0e92760a2 tests: add coverage for helpers 2019-12-30 15:52:10 -08:00
James R. Barlow
c5edff2c2f Sort imports 2019-12-19 15:31:18 -08:00
James R. Barlow
c5571388e2 Improve test coverage of _sync.py 2019-12-10 01:06:27 -08:00
James R. Barlow
607eee198d tests: split out preprocessing tests 2019-12-09 16:18:01 -08:00
James R. Barlow
5e2a7f8a56 tests: speed up several slow tests 2019-12-09 16:17:57 -08:00
James R. Barlow
7be293f628 Address tests that fail on Windows with Python 3.7 or 3.6 2019-12-09 16:17:10 -08:00
James R. Barlow
f6510e2b15 Document function of symlink shim 2019-12-06 15:00:12 -08:00
James R. Barlow
51abd79136 Tesseract no longer posts an error message if config file not found 2019-12-04 21:35:28 -08:00
James R. Barlow
5607429d9a tests: error message from tesseract change 2019-12-04 21:31:01 -08:00
James R. Barlow
9db01c7ff5 Remove test_bad_utf8
Due to difficulties of getting this to work on Python 3.8, Windows, and
high probability that this behavior is now gone from Tesseract 4.0+.

Originally added in 2017.
2019-12-04 21:01:09 -08:00
James R. Barlow
cff37bf681 Make test_german more Windows-friendly 2019-12-04 21:01:09 -08:00
James R. Barlow
66d04dd6e3 Don't expect filenames to be replicated on NT 2019-12-04 21:01:09 -08:00
James R. Barlow
06a1f987d4 Use _OCRMYPDF_TEST_PATH for testing and .py stubs to simulate symlinks 2019-12-04 21:01:06 -08:00
James R. Barlow
e51e21c6b6 ghostscript: Refactor checking for executable name on Windows 2019-12-04 21:01:06 -08:00
James R. Barlow
43ab7c88d7 Remove os_environ() context manager 2019-12-04 17:37:38 -08:00
James R. Barlow
ca9669742d Move gs tests to test_ghostscript 2019-12-04 17:14:27 -08:00
James R. Barlow
8a1dddc3ee Don't worry about closed streams on Windows 2019-12-04 17:14:27 -08:00
James R. Barlow
0cd424ffcb Enforce str-only environment for Windows since it's more strict 2019-12-04 17:14:27 -08:00
James R. Barlow
fde550f9a7 test: Replace many instances of run_ocrmypdf in subprocess with inline 2019-12-04 17:14:27 -08:00
James R. Barlow
a3726e4ce3 Fix test_metadata: use mmap in a Windows and POSIX compatible way 2019-12-04 17:13:52 -08:00
James R. Barlow
4ab0a8ff35 Fix test_single_page_inline_image - remove temp file 2019-12-04 17:13:51 -08:00
James R. Barlow
37f6f72df3 tests: a few Windows fixes 2019-12-04 17:13:51 -08:00
James R. Barlow
3f92867ae6 Fix TypeError "environment can only contain strings"
Apparently Windows Python doesn't coerce pathlib.Path to str.
2019-12-04 17:13:51 -08:00
James R. Barlow
fe7c69ce95 leptonica: don't open files by name; use memory buffers
Avoids encoding issues and makes error trap unnecessary in some cases.
2019-12-04 17:13:51 -08:00
James R. Barlow
9baccee8c5 leptonica: Handle API change for pixFindPageForeground 2019-12-04 17:13:51 -08:00
James R. Barlow
72d3ee3a87 Refactor symlink usage to support Windows 2019-12-04 17:13:51 -08:00
James R. Barlow
5f5421f23d test: further fixes to test_report_file_size 2019-11-12 01:14:21 -08:00