Commit Graph

2777 Commits

Author SHA1 Message Date
James R. Barlow
997bf7578d hocrtransform: fix exception if no div ocr_page object 2020-12-11 14:14:21 -08:00
James R. Barlow
043258242c hocrtransform: trivial typing 2020-12-11 14:14:21 -08:00
James R. Barlow
156d5d9a9c cli: typing 2020-12-11 14:14:21 -08:00
James R. Barlow
0b7e52fb5e api: parse cmdline in more type friendly way 2020-12-11 14:14:21 -08:00
James R. Barlow
a5feef07d0 Declare ocrmypdf as typed 2020-12-11 14:14:21 -08:00
James R. Barlow
f11bb53e61 Change prefix of temporary folders
Shouldn't really use a name that suggests a connection to GitHub.
2020-12-07 21:51:46 -08:00
James R. Barlow
68a57a7839 Add feature to generate hocr-pdf with visible debug text 2020-12-04 17:38:48 -08:00
James R. Barlow
4194430dc1 Begin next release notes 2020-12-04 13:28:04 -08:00
James R. Barlow
a707c56fae docs: improve windows instructions 2020-12-04 13:21:54 -08:00
James R. Barlow
3cba50bfbd windows: look in registry for Tesseract and Ghostscript 2020-12-04 13:21:54 -08:00
James R. Barlow
ed5e17d0a4 completions: consider *.PDF and some images too 2020-12-04 13:20:35 -08:00
James R. Barlow
ce0e0ecd4d Decouple tqdm from progressbar setup 2020-12-04 13:20:28 -08:00
James R. Barlow
7e1223c12c ghostscript: add output tracing 2020-11-29 14:53:35 -08:00
James R. Barlow
b83d7f6d1a subprocess: refactor and add run_polling_stderr 2020-11-29 14:36:03 -08:00
James R. Barlow
80e957908a tesseract: fix run call with logs_errors_to_stdout 2020-11-29 14:25:46 -08:00
James R. Barlow
f0e7bea8ba docs: remove redundant statement 2020-11-27 13:54:36 -08:00
James R. Barlow
0cdb9bd04a docs: remove description of how OMP_THREAD_LIMIT is managed 2020-11-23 12:36:04 -08:00
James R. Barlow
8224d89bc6 v11.3.4 release notes v11.3.4 2020-11-18 11:57:28 -08:00
James R. Barlow
a2bbbe2a26 v11.3.4 release notes 2020-11-18 11:56:29 -08:00
James R. Barlow
43f41863fa check_pdf: document how we handle linearization 2020-11-18 11:54:07 -08:00
James R. Barlow
d71e50e83d Fix "readLinearizationData for file that is not linearized"
pikepdf 2.1.0 throws wrong type of exception in this case, so special-case it.

Closes #680
Closes #681
2020-11-18 11:52:17 -08:00
James R. Barlow
1f598da3c1 ghostscript: better docs and comments 2020-11-18 11:34:17 -08:00
James R. Barlow
d0cdbd5e1c watcher: include uppercase .PDF too 2020-11-12 02:29:47 -08:00
James R. Barlow
5c56f61209 unpaper: type hints 2020-11-11 02:59:37 -08:00
James R. Barlow
9bec85470a Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2020-11-10 04:08:05 -08:00
James R. Barlow
a03863a17d docs: fix link to docker image 2020-11-10 04:08:01 -08:00
James R. Barlow
22cd9b2364 docs: fix csv-table errors 2020-11-10 04:07:49 -08:00
pretentious7
4fc7d6d93e fix typo "charcter" -> "character" (#673) 2020-11-09 16:53:02 -08:00
James R. Barlow
71f0e7f545 v11.3.3 release notes v11.3.3 2020-11-07 00:53:33 -08:00
James R. Barlow
895fddd85e Replace most uses of universal_newlines with text
The parameters are equivalent but the latter is better named. Since
Python 3.6 doesn't support text= we use our wrapper to add it in that
place.

This is for subprocess.run.
2020-11-07 00:48:08 -08:00
James R. Barlow
5a59e4d543 unpaper: don't use universal_newlines=True
There's no specific reason to do this. We can log binary output equally
 well.
2020-11-07 00:18:27 -08:00
James R. Barlow
b51abf2249 azure: Fix indentation mistake 2020-11-04 12:19:35 -08:00
James R. Barlow
6d3f9ff15a api: rework ocr() slightly to simplify variable handling 2020-11-03 17:10:52 -08:00
James R. Barlow
5d1d1a712b docs: more details about macOS API changes
Due to fork->spawn
2020-11-03 17:09:58 -08:00
James R. Barlow
6d5f8133e0 docs: show ifmain guard in example 2020-11-03 15:28:33 -08:00
James R. Barlow
13018d3d5c ci: Extend test matrix to Python 3.9 2020-11-03 04:15:14 -08:00
James R. Barlow
14a85f9473 Fix pinned dependencies v11.3.2 2020-11-03 04:12:47 -08:00
James R. Barlow
d22a1b3367 v11.3.2 release notes (2)
Since we never tagged it, fix other things.
2020-11-03 02:03:25 -08:00
James R. Barlow
b913e5dfef ghostscript: don't repeat log in debug
Subprocess already does this for us.
2020-11-03 01:45:06 -08:00
James R. Barlow
dd8a5a4c72 Fix log domain names
ocrmypdf.subprocess.subprocess.ghostscript -> ocrmypdf.subprocess.ghostscript
2020-11-03 01:44:35 -08:00
James R. Barlow
36e9a54f02 Remove extraneous page rotation
This was added in commit b5ccbfd but seems to have been ill-advised.
2020-11-03 01:34:28 -08:00
James R. Barlow
3707af3b74 Change pdf.root to pdf.Root 2020-11-03 01:30:31 -08:00
James R. Barlow
ced7ad9164 unpaper: round off DPI 2020-11-03 01:14:57 -08:00
James R. Barlow
54bbbfdeb3 Fix UnboundLocalError when considering ImageMasks for optimization
Uncovered by test file in issue 667, although unrelated to that issue.
2020-11-03 01:08:14 -08:00
James R. Barlow
7f73a6ed1e Some Python 3.9 fixes 2020-11-03 00:45:47 -08:00
James R. Barlow
dce206d3dc Fix pre-commit for Py3.9 2020-11-03 00:20:25 -08:00
James R. Barlow
9304c856cf Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2020-11-02 02:47:36 -08:00
James R. Barlow
e5df98cbdf v11.3.2 release notes 2020-11-02 02:43:32 -08:00
James R. Barlow
19bf3aeb00 api: improve typing 2020-11-02 02:33:34 -08:00
James R. Barlow
e86be0031c unpaper: fix process output handling
With the ocrmypdf.subprocess wrapper, logging the output here
is redundant and loses the page number context.
2020-11-02 01:07:41 -08:00