James R. Barlow
4a640b8dcd
Fix language argument not working as list
...
Fixes #523
2020-04-14 23:18:52 -07:00
James R. Barlow
9471bc8921
Fix versions with leading v, e.g. v5.0
2020-04-10 13:42:33 -07:00
James R. Barlow
d13d70fd56
Fix version checker failing for qpdf 10.0.0
...
Fixes #527
2020-04-10 13:00:19 -07:00
James R. Barlow
23bc3d3a29
tests: workaround for Ghostscript 9.52 txtwrite problem
2020-03-29 22:45:16 -07:00
James R. Barlow
8307832ce9
tests: add force OCR to a file with text that Ghostscript doesn't see
...
For gs 9.52 support.
Also refactor use of pikepdf.open() to use with blocks.
2020-03-29 22:44:27 -07:00
James R. Barlow
378e4dae3b
Expand documentation for subprocess.run() from test
2020-03-04 13:37:44 -08:00
James R. Barlow
b3b61c152c
Handle malformed DocumentInfo ( #497 )
...
User submitted a PDF in which /Trailer /Info pointed to the XMP metadata
block instead of a DocumentInfo dictionary. Fix and add test.
2020-03-03 03:27:01 -08:00
James R. Barlow
4a27124eab
Simplify metadata for invalid xml in output
...
Removes possibly non-free resource enron1.pdf.
2020-02-12 00:07:18 -08:00
James R. Barlow
ce97af5a79
Add OCR quality measurement API
2020-01-17 03:10:27 -08:00
James R. Barlow
61a2674317
Skip test that needs chmod when on Windows
2020-01-06 02:36:04 -08:00
James R. Barlow
9ad8cbf1f6
Fix assert that depends on POSIX-y file handling
2020-01-06 02:02:05 -08:00
James R. Barlow
9c5f0d0ec6
Eliminate last use of PyPDF2 from test suite
2020-01-04 16:32:01 -08:00
James R. Barlow
32041c43e1
tests: improve tesseract coverage
2020-01-04 02:35:14 -08:00
James R. Barlow
1037d73efb
tests: use smaller files for ghostscript
2019-12-31 17:20:28 -08:00
James R. Barlow
aeb7b142a9
tests: skip tests not compatible with coverage
...
For reasons not entirely clear, stdout will get some data injected when
pytest-cov is running. Our tests that
check for clean stdout need to ignore this.
We check for an environment variable that is defined only when coverage is
running.
2019-12-31 17:10:51 -08:00
James R. Barlow
422ea9777e
Remove session scope from fixtures
...
pytest seems to prepare os.environ in complex ways, so we want to ensure
these fixtures are not reused.
2019-12-31 17:09:23 -08:00
James R. Barlow
2f1c743227
Rewrite main pool loop
...
pytest-cov documentation recommends using explicit
management of multiprocessing.Pool rather than the context manager.
This is supposed to work better for collecting coverage data, particularly
on Windows.
2019-12-31 16:23:41 -08:00
James R. Barlow
96ee21aee9
Try to set up subprocess coverage better
2019-12-31 15:39:45 -08:00
James R. Barlow
4b759af6ff
tests: fix problems with ghostscript spoofers
2019-12-31 15:33:03 -08:00
James R. Barlow
25d2b0cda4
test: environment warnings/cleanup
2019-12-30 22:38:50 -08:00
James R. Barlow
c36e9950ae
tests: test TqdmConsole
2019-12-30 17:51:09 -08:00
James R. Barlow
0c0d53b10f
tests: AcroForm test case did not work correctly; fixed
2019-12-30 17:50:32 -08:00
James R. Barlow
63de7e1677
Improve error message for unreadable input files
2019-12-30 16:14:52 -08:00
James R. Barlow
b0e92760a2
tests: add coverage for helpers
2019-12-30 15:52:10 -08:00
James R. Barlow
c5edff2c2f
Sort imports
2019-12-19 15:31:18 -08:00
James R. Barlow
c5571388e2
Improve test coverage of _sync.py
2019-12-10 01:06:27 -08:00
James R. Barlow
607eee198d
tests: split out preprocessing tests
2019-12-09 16:18:01 -08:00
James R. Barlow
5e2a7f8a56
tests: speed up several slow tests
2019-12-09 16:17:57 -08:00
James R. Barlow
7be293f628
Address tests that fail on Windows with Python 3.7 or 3.6
2019-12-09 16:17:10 -08:00
James R. Barlow
f6510e2b15
Document function of symlink shim
2019-12-06 15:00:12 -08:00
James R. Barlow
51abd79136
Tesseract no longer posts an error message if config file not found
2019-12-04 21:35:28 -08:00
James R. Barlow
5607429d9a
tests: error message from tesseract change
2019-12-04 21:31:01 -08:00
James R. Barlow
9db01c7ff5
Remove test_bad_utf8
...
Due to difficulties of getting this to work on Python 3.8, Windows, and
high probability that this behavior is now gone from Tesseract 4.0+.
Originally added in 2017.
2019-12-04 21:01:09 -08:00
James R. Barlow
cff37bf681
Make test_german more Windows-friendly
2019-12-04 21:01:09 -08:00
James R. Barlow
66d04dd6e3
Don't expect filenames to be replicated on NT
2019-12-04 21:01:09 -08:00
James R. Barlow
06a1f987d4
Use _OCRMYPDF_TEST_PATH for testing and .py stubs to simulate symlinks
2019-12-04 21:01:06 -08:00
James R. Barlow
e51e21c6b6
ghostscript: Refactor checking for executable name on Windows
2019-12-04 21:01:06 -08:00
James R. Barlow
43ab7c88d7
Remove os_environ() context manager
2019-12-04 17:37:38 -08:00
James R. Barlow
ca9669742d
Move gs tests to test_ghostscript
2019-12-04 17:14:27 -08:00
James R. Barlow
8a1dddc3ee
Don't worry about closed streams on Windows
2019-12-04 17:14:27 -08:00
James R. Barlow
0cd424ffcb
Enforce str-only environment for Windows since it's more strict
2019-12-04 17:14:27 -08:00
James R. Barlow
fde550f9a7
test: Replace many instances of run_ocrmypdf in subprocess with inline
2019-12-04 17:14:27 -08:00
James R. Barlow
a3726e4ce3
Fix test_metadata: use mmap in a Windows and POSIX compatible way
2019-12-04 17:13:52 -08:00
James R. Barlow
4ab0a8ff35
Fix test_single_page_inline_image - remove temp file
2019-12-04 17:13:51 -08:00
James R. Barlow
37f6f72df3
tests: a few Windows fixes
2019-12-04 17:13:51 -08:00
James R. Barlow
3f92867ae6
Fix TypeError "environment can only contain strings"
...
Apparently Windows Python doesn't coerce pathlib.Path to str.
2019-12-04 17:13:51 -08:00
James R. Barlow
fe7c69ce95
leptonica: don't open files by name; use memory buffers
...
Avoids encoding issues and makes error trap unnecessary in some cases.
2019-12-04 17:13:51 -08:00
James R. Barlow
9baccee8c5
leptonica: Handle API change for pixFindPageForeground
2019-12-04 17:13:51 -08:00
James R. Barlow
72d3ee3a87
Refactor symlink usage to support Windows
2019-12-04 17:13:51 -08:00
James R. Barlow
5f5421f23d
test: further fixes to test_report_file_size
2019-11-12 01:14:21 -08:00