Commit Graph

2812 Commits

Author SHA1 Message Date
James R. Barlow
2846d46bb8 Remove .coveragerc and fold into setup.cfg 2021-01-06 03:58:18 -08:00
James R. Barlow
47ef1914d4 v11.4.4 release notes v11.4.4 2021-01-01 01:39:24 -08:00
James R. Barlow
df157552f3 Make ocrmypdf.ocr take a threading lock 2021-01-01 01:37:09 -08:00
James R. Barlow
0b3a526049 Partial fix crash on 'userunit' None (#700)
Our method of getting data from pdfminer would silently consume a StopIteration
if pdfminer returned no processed pages, leading to odd error message.

We improve an error from pdfminer properly, and returning a more
descriptive error of our own.

It would be possible for ocrmypdf to repair the file before sending it to
pdfminer, but this seems to be rare enough that we won't do that yet.
2021-01-01 01:11:32 -08:00
James R. Barlow
1e80d412fa tesseract: fix typing of some optional arguments 2021-01-01 00:46:00 -08:00
James R. Barlow
df6e106203 concurrent: simplify results loop 2021-01-01 00:44:46 -08:00
James R. Barlow
bd0f005861 tests: tag tests that need pngquant, jbig2enc v11.4.3 2020-12-30 01:58:57 -08:00
James R. Barlow
6ba4b7b3f3 ci: temporarily disable pngquant on Windows
Looks like a packaging error, choco complains of bad hashes.
2020-12-30 01:40:56 -08:00
James R. Barlow
2c11349ee8 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2020-12-29 21:40:46 -08:00
James R. Barlow
b0afef09ef v11.4.3 release notes 2020-12-29 21:40:35 -08:00
James R. Barlow
72fa347c38 tests: skip metadata test for two pikepdf versions that warn incorrectly 2020-12-29 01:47:52 -08:00
James R. Barlow
96d68c2413 pipeline: refactor metadata_fixup 2020-12-29 01:47:32 -08:00
James R. Barlow
babc76fa74 tests: assert that most patched functions are called
We were not actually checking if functions we patched we called when
expected.
2020-12-28 23:58:33 -08:00
Tim Gates
dc06990e5d docs: fix simple typo, instsalled -> installed (#704)
There is a small typo in docs/installation.rst.

Should read `installed` rather than `instsalled`.
2020-12-28 15:28:34 -08:00
James R. Barlow
0ff0d2f8d1 Remove PDF/A overprint debug message
Since we currently log all of a process's output at debug it's
redundant to log this separate message.
2020-12-27 16:19:05 -08:00
James R. Barlow
81602cf420 Fix test not patching properly after Ghostscript polling change 2020-12-27 16:01:50 -08:00
James R. Barlow
607e2d7e81 v11.4.2 release notes v11.4.2 2020-12-27 03:29:35 -08:00
James R. Barlow
b01d9e07e8 Deal with missing pthread_sigmask on Cygwin
Closes #701
2020-12-27 02:24:00 -08:00
James R. Barlow
91db94cf2e watcher: fix OCR_LOGLEVEL env var not processed
Closes #702
2020-12-27 02:02:44 -08:00
James R. Barlow
416df803d4 pdfinfo: stricter typing 2020-12-24 22:39:00 -08:00
James R. Barlow
037b96ca16 pdfinfo: refactor to eliminate RawPageInfo 2020-12-24 02:57:44 -08:00
James R. Barlow
bb258fc99c pdfinfo: Refactor pageinfo dictionary into a class 2020-12-24 01:47:53 -08:00
James R. Barlow
4b8ccbe8cb v11.4.1 release notes v11.4.1 2020-12-22 01:41:15 -08:00
James R. Barlow
ab1ff3331b misc: synology fix
Accept user-contributed fix. Not testable.

Close #690.
2020-12-22 01:38:41 -08:00
James R. Barlow
3675ae918c Fix certain invalid page ranges causing exception
Closes #686
2020-12-22 01:22:14 -08:00
James R. Barlow
0ba32b96b7 Revert "v11.4.0 release notes - remove change not actually implemented"
This reverts commit ad202693b3.
Temporary folder prefix was actually changed in commit f11bb53e.
2020-12-22 00:47:25 -08:00
James R. Barlow
add64e4fa2 docs: com.github.ocrmypdf -> ocrmypdf.io 2020-12-22 00:46:42 -08:00
James R. Barlow
7fe2954ede Change wheel tag to py36, update package_data to include py.typed 2020-12-12 16:49:04 -08:00
James R. Barlow
ad202693b3 v11.4.0 release notes - remove change not actually implemented
Remove a change that was pushed back to a future release.
2020-12-12 16:27:38 -08:00
James R. Barlow
594ef83551 v11.4.0 release notes v11.4.0 2020-12-11 15:09:49 -08:00
James R. Barlow
78b71618c1 Fix BufferedReader TypeError 2020-12-11 14:19:20 -08:00
James R. Barlow
b8aa89e1ec Fix log message queue flooding on certain files
Fixes #692
2020-12-11 14:14:21 -08:00
James R. Barlow
b4c1f66bc1 typing: tidy up 2020-12-11 14:14:21 -08:00
James R. Barlow
5172dbde8d subprocess: use more mypy-friendly syntax 2020-12-11 14:14:21 -08:00
James R. Barlow
d2908640c6 pdfa: help mypy figure out a type 2020-12-11 14:14:21 -08:00
James R. Barlow
997bf7578d hocrtransform: fix exception if no div ocr_page object 2020-12-11 14:14:21 -08:00
James R. Barlow
043258242c hocrtransform: trivial typing 2020-12-11 14:14:21 -08:00
James R. Barlow
156d5d9a9c cli: typing 2020-12-11 14:14:21 -08:00
James R. Barlow
0b7e52fb5e api: parse cmdline in more type friendly way 2020-12-11 14:14:21 -08:00
James R. Barlow
a5feef07d0 Declare ocrmypdf as typed 2020-12-11 14:14:21 -08:00
James R. Barlow
f11bb53e61 Change prefix of temporary folders
Shouldn't really use a name that suggests a connection to GitHub.
2020-12-07 21:51:46 -08:00
James R. Barlow
68a57a7839 Add feature to generate hocr-pdf with visible debug text 2020-12-04 17:38:48 -08:00
James R. Barlow
4194430dc1 Begin next release notes 2020-12-04 13:28:04 -08:00
James R. Barlow
a707c56fae docs: improve windows instructions 2020-12-04 13:21:54 -08:00
James R. Barlow
3cba50bfbd windows: look in registry for Tesseract and Ghostscript 2020-12-04 13:21:54 -08:00
James R. Barlow
ed5e17d0a4 completions: consider *.PDF and some images too 2020-12-04 13:20:35 -08:00
James R. Barlow
ce0e0ecd4d Decouple tqdm from progressbar setup 2020-12-04 13:20:28 -08:00
James R. Barlow
7e1223c12c ghostscript: add output tracing 2020-11-29 14:53:35 -08:00
James R. Barlow
b83d7f6d1a subprocess: refactor and add run_polling_stderr 2020-11-29 14:36:03 -08:00
James R. Barlow
80e957908a tesseract: fix run call with logs_errors_to_stdout 2020-11-29 14:25:46 -08:00