James R. Barlow
aa10a70d70
Rebuild test cache due to hocr output change
2021-08-01 01:00:05 -07:00
James R. Barlow
37923ffe52
Work around Pillow 8.3.1 DPI changes
...
Pillow decided against round-tripping DPI values.
https://github.com/python-pillow/Pillow/pull/5476
Fixes #802
2021-07-14 02:34:28 -07:00
James R. Barlow
5cba68b93d
tests: Don't require symlink permissions on Windows
...
Some of tests required symlink permissions, which CI workers have but typical Windows
user accounts do not. Mostly these are just correctness tests.
2021-07-14 00:11:47 -07:00
James R. Barlow
5f01c5e330
Fix another species of Tesseract version number breaking regex
...
Fixes #795
2021-06-16 00:09:03 -07:00
James R. Barlow
7b1e5b4f41
Fix "invalid version number" for untagged tesseract versions
...
Fixes #770
2021-04-26 01:18:07 -07:00
James R. Barlow
757b72b0af
Revert "Remove apparently unused portion of a test"
...
This reverts commit d89a633ba7 .
2021-04-16 00:21:11 -07:00
James R. Barlow
d673126994
Fix ZeroDivisionError on files containing images drawn at scale 0
...
Fixes #761
2021-04-15 23:26:14 -07:00
James R. Barlow
d89a633ba7
Remove apparently unused portion of a test
2021-04-15 23:25:18 -07:00
James R. Barlow
9db9a3d6ec
helpers: improve test coverage of Resolution
2021-04-07 23:26:37 -07:00
James R. Barlow
336d274a54
Drop remnants of support for Tesseract without has_textonly_pdf
...
Also improve Tesseract version checking so it can compare all of their
weird conventions.
2021-04-07 23:05:21 -07:00
James R. Barlow
906d77b389
tests: remove obsolete running_in_travis()
2021-04-07 02:25:10 -07:00
James R. Barlow
9416e850ff
Remove another instance of helpers_namespace
2021-04-07 02:23:04 -07:00
James R. Barlow
2a09a668f6
Delinting: unused args
2021-04-07 02:18:08 -07:00
James R. Barlow
e788dde607
tests: eliminate unnecessary mmap
2021-04-07 02:11:31 -07:00
James R. Barlow
173a80864d
Delinting
2021-04-07 02:09:45 -07:00
James R. Barlow
aa115a8be3
Remove pytest_helpers_namespace
2021-04-07 01:56:51 -07:00
James R. Barlow
b1306bd7a8
tests: skip test_bash on Windows
2021-04-06 01:15:00 -07:00
James R. Barlow
ec1d585d40
Merge branch 'feature/misc-breaking'
2021-04-01 16:51:04 -07:00
James R. Barlow
a4e1f8e1f3
Merge branch 'feature/lambda'
2021-04-01 16:36:22 -07:00
James R. Barlow
0a42934c08
Exclude Group 3 images from optimization
2021-03-20 23:28:21 -07:00
James R. Barlow
079c162a96
Ensure sidecar is not input or output file
2021-03-05 00:29:42 -08:00
James R. Barlow
4124889f36
Don't generate PDF/A-1b with object streams
...
Acrobat insists that PDF/A-1b should not have object streams.
Other programs like veraPDF disagree with this restriction, but
we can accommodate Acrobat so we will.
Also add more tests around this.
2021-02-26 00:23:57 -08:00
Dima Kuznetsov
5e2206bae7
Allow --sidecar along --pages ( #735 )
2021-02-19 16:55:35 -08:00
James R. Barlow
064f935699
Fix page rotation regression
...
Page size fixes in commit b26749 did accounted for a "kept" rotation,
but not a corrected rotation.
Fixes #730 .
2021-02-15 01:47:09 -08:00
James R. Barlow
8770fff968
tests: remove unreliable/incomplete test
2021-02-15 01:05:08 -08:00
James R. Barlow
bccf2f423f
Stricter parameter checking for many public functions
2021-01-31 19:27:25 -08:00
James R. Barlow
390fdf8c05
Package OCR in Form XObject
...
Should improve results in some situations where the initial content
stream is messy or not well-formed.
2021-01-31 19:27:25 -08:00
James R. Barlow
16bda74974
Refactor - decouple progressbar from executor
2021-01-30 20:42:00 -08:00
James R. Barlow
d274d88929
Refactor to eliminate global state in _concurrent
2021-01-30 17:36:30 -08:00
James R. Barlow
7bccb8c748
tests: fix concurrency
2021-01-24 23:46:33 -08:00
James R. Barlow
1a982da442
tests: confirm that we produce pdf when optimization is off
2021-01-24 01:54:25 -08:00
James R. Barlow
ebacff1b39
tests: Fix debug logging test
2021-01-09 16:41:57 -08:00
James R. Barlow
c7c447be66
Add test for configure_debug_logging
...
Since we can't directly test it
2021-01-09 16:02:12 -08:00
James R. Barlow
91aa175602
Consider text when determining page raster DPI
...
Previously if we found vectors of any sort on a page, we would bump
the DPI up to 400. We did nothing
about pages with text. As a result,
pages with a low image resolution
and printable text would have the text downgraded to image
resolution when --force-ocr was used.
We don't try to determine if the text is visible or invisible OCR text, since
that is a slower test. --redo-ocr would improve such cases anyway.
2021-01-09 16:01:49 -08:00
James R. Barlow
b267494e4a
Create raster PDF pages to match input page size
...
Previously we produced a raster image, then multiplied image width
by DPI to get the page size. However if there is rounding the
page size may not match exactly. In this modified approach we
constrain the page size to match.
2021-01-08 15:10:43 -08:00
James R. Barlow
f687180ecc
tests: tidy pdfinfo
2021-01-08 15:04:52 -08:00
James R. Barlow
2846d46bb8
Remove .coveragerc and fold into setup.cfg
2021-01-06 03:58:18 -08:00
James R. Barlow
0b3a526049
Partial fix crash on 'userunit' None ( #700 )
...
Our method of getting data from pdfminer would silently consume a StopIteration
if pdfminer returned no processed pages, leading to odd error message.
We improve an error from pdfminer properly, and returning a more
descriptive error of our own.
It would be possible for ocrmypdf to repair the file before sending it to
pdfminer, but this seems to be rare enough that we won't do that yet.
2021-01-01 01:11:32 -08:00
James R. Barlow
bd0f005861
tests: tag tests that need pngquant, jbig2enc
2020-12-30 01:58:57 -08:00
James R. Barlow
72fa347c38
tests: skip metadata test for two pikepdf versions that warn incorrectly
2020-12-29 01:47:52 -08:00
James R. Barlow
babc76fa74
tests: assert that most patched functions are called
...
We were not actually checking if functions we patched we called when
expected.
2020-12-28 23:58:33 -08:00
James R. Barlow
81602cf420
Fix test not patching properly after Ghostscript polling change
2020-12-27 16:01:50 -08:00
James R. Barlow
bb258fc99c
pdfinfo: Refactor pageinfo dictionary into a class
2020-12-24 01:47:53 -08:00
James R. Barlow
3675ae918c
Fix certain invalid page ranges causing exception
...
Closes #686
2020-12-22 01:22:14 -08:00
James R. Barlow
f11bb53e61
Change prefix of temporary folders
...
Shouldn't really use a name that suggests a connection to GitHub.
2020-12-07 21:51:46 -08:00
James R. Barlow
3cba50bfbd
windows: look in registry for Tesseract and Ghostscript
2020-12-04 13:21:54 -08:00
James R. Barlow
ce0e0ecd4d
Decouple tqdm from progressbar setup
2020-12-04 13:20:28 -08:00
James R. Barlow
7e1223c12c
ghostscript: add output tracing
2020-11-29 14:53:35 -08:00
James R. Barlow
895fddd85e
Replace most uses of universal_newlines with text
...
The parameters are equivalent but the latter is better named. Since
Python 3.6 doesn't support text= we use our wrapper to add it in that
place.
This is for subprocess.run.
2020-11-07 00:48:08 -08:00
James R. Barlow
3707af3b74
Change pdf.root to pdf.Root
2020-11-03 01:30:31 -08:00