Commit Graph

3153 Commits

Author SHA1 Message Date
rdiez
2be8eeec2c Fix spelling of 'ephemeral' (#908) 2022-02-03 10:33:51 -08:00
James R. Barlow
3dfde479e2 The world is not ready for := v13.3.0 2022-01-26 00:16:51 -08:00
James R. Barlow
aea1862644 v13.3.0 release notes 2022-01-25 23:50:48 -08:00
James R. Barlow
3b406112d0 ghostscript: improve test coverage of error cases 2022-01-25 23:45:47 -08:00
James R. Barlow
fcc4c2d371 ghostscript: improve error message if image cannot be opened 2022-01-25 23:12:51 -08:00
James R. Barlow
3de18ed612 tesseract: account for more tesseract 5 output differences 2022-01-25 00:19:49 -08:00
James R. Barlow
93cca42e20 optimize: don't try to optimize an image we can't save 2022-01-25 00:19:23 -08:00
James R. Barlow
2d0ac4707c Use better img2pdf settings where possible while supporting old versions
Fixes #894
2022-01-14 11:55:54 -08:00
James R. Barlow
7d208175cf unpaper: refactoring 2022-01-11 10:57:54 -08:00
James R. Barlow
ea69e868ed unpaper: issue warning if image too large to clean 2022-01-11 10:44:38 -08:00
James R. Barlow
beea603ab3 Revert "docs: add sphinx-panels"
This reverts commit 7966192d6e.
2022-01-04 11:44:20 -08:00
James R. Barlow
7966192d6e docs: add sphinx-panels 2022-01-04 11:43:28 -08:00
Anton Gladky
5acbd7a252 Wrap exception on non-CMYK images into the log warning (#881) 2021-12-29 01:15:49 -08:00
Krasimir Nedelchev
aed955ca8c Fix typo (#882) 2021-12-27 15:36:19 -08:00
James R. Barlow
298bdb8690 v13.2.0 release notes v13.2.0 2021-12-19 11:40:23 -08:00
James R. Barlow
1a58abcc6a concurrency: fix extra update of progressbar 2021-12-19 11:34:09 -08:00
James R. Barlow
dbfceba020 Standardize ghostscript version default 2021-12-18 02:01:42 -08:00
James R. Barlow
0faa618c3c Replace deprecated distutils with packaging.version 2021-12-18 01:57:59 -08:00
James R. Barlow
7035002c03 Order of operations suppressed detailed Ghostscript missing error message 2021-12-18 01:27:39 -08:00
James R. Barlow
f8fadaef41 _windows: remove use of deprecated distutils 2021-12-13 20:45:54 -08:00
James R. Barlow
ee21bf9ef6 Update cache 2021-12-13 20:45:30 -08:00
James R. Barlow
190ca81951 v13.1.1 release notes v13.1.1 2021-12-10 21:49:04 -08:00
James R. Barlow
d48254d477 Fix issue with attempting to deskew a blank page on Tesseract 5
Closes #868
2021-12-10 21:48:09 -08:00
James R. Barlow
1ec2ccca14 docs: add warning about multiproc on macOS 2021-12-10 17:42:50 -08:00
James R. Barlow
e78f0cc56f v13.1.0 release notes v13.1.0 2021-12-06 20:00:53 -08:00
James R. Barlow
13af3252ff tests: simplify run_ocrmypdf API 2021-12-06 17:00:25 -08:00
James R. Barlow
0528867e0b config: yaml strings for versions 2021-12-06 16:59:44 -08:00
James R. Barlow
6910c48b81 Fix test_outputtype_none on Windows and cleanup docs 2021-12-06 15:38:38 -08:00
James R. Barlow
69aa3981c4 docs: Remove reference to long removed 'tesseract' renderer 2021-12-06 15:38:38 -08:00
James R. Barlow
9c1e5adfe6 docs: remove Ubuntu 16.04 install instructions
It's EOL.
2021-12-06 15:38:38 -08:00
James R. Barlow
e642dd4b35 Fix kill signal on Windows 2021-12-06 15:38:32 -08:00
James R. Barlow
9de06f62ee Use Python executors instead of pools
ProcessPool/ThreadPool don't have the ability to notice when a child worker
was terminated. ProcessPoolExecutor and ThreadPoolExecutor do notice and
provide better error messages.

Add tests to check.
2021-12-06 15:38:27 -08:00
James R. Barlow
1414a8f5dc Tidy pyproject.toml 2021-12-06 15:38:27 -08:00
James R. Barlow
26badf2882 typing: small improvements 2021-12-06 15:38:27 -08:00
James R. Barlow
8f873aaa45 sync: typing improvements 2021-12-06 15:38:27 -08:00
James R. Barlow
8fdcb15b4e tests: improve typing and remove some legacy code 2021-12-06 15:38:27 -08:00
James R. Barlow
0323738ada ocrmypdf.fish: fix indents
[ci skip]
2021-12-06 15:38:27 -08:00
FPille
aae5591f7e Update ocrmypdf.bash completion
Squashed commit of the following:

commit 974de2e8ccad7fd34694f2c3a7a17c64bb52cdab
Merge: a8d7f969 ee04aa72
Author: James R. Barlow <james@purplerock.ca>
Date:   Sat Dec 4 20:22:50 2021 -0800

    Merge branch 'update_bash-completion' of git://github.com/FPille/OCRmyPDF into FPille-update_bash-completion

commit ee04aa7225
Author: FPille <f.pille@gmail.com>
Date:   Thu Oct 14 11:09:23 2021 +0200

    update

commit 76f64537aa
Author: FPille <f.pille@gmail.com>
Date:   Thu Oct 14 11:04:10 2021 +0200

    updated and descriptions for arguments and choices added
    deprecated arguments removed
    bug fix: typo "_init_completion" instead of "_init_completions"

commit de9b93e852
Merge: c23374de 42713b77
Author: Frank <50119297+FPille@users.noreply.github.com>
Date:   Thu Oct 14 08:08:11 2021 +0200

    Merge branch 'jbarlow83:master' into master

commit c23374de81
Merge: 40b2ebcb c409fa58
Author: Frank <50119297+FPille@users.noreply.github.com>
Date:   Wed May 26 20:31:00 2021 +0200

    Merge branch 'jbarlow83:master' into master

commit 40b2ebcb37
Merge: 79c84eef 7e388f59
Author: Frank <50119297+FPille@users.noreply.github.com>
Date:   Sat Jun 1 11:09:07 2019 +0200

    Merge pull request #1 from jbarlow83/master

    update master
2021-12-06 15:38:26 -08:00
James R. Barlow
4c1ff1086c tess cache: don't include full platform - could be sensitive 2021-12-06 15:38:26 -08:00
James R. Barlow
f91faf9795 Add new argument --tesseract-thresholding to control tesseract thresholding where available
Also add missing test for --tesseract-oem
2021-12-06 15:38:14 -08:00
James R. Barlow
793cc33a90 Whitespace 2021-12-04 16:07:34 -08:00
James R. Barlow
fbd72efd45 build: typo v13.0.0 2021-12-04 01:41:31 -08:00
James R. Barlow
1115923995 build: address checksum error from choco 2021-12-04 01:26:38 -08:00
James R. Barlow
8478d67b28 Merge branch 'release/v13' of github.com:jbarlow83/OCRmyPDF into release/v13 2021-11-15 16:38:11 -08:00
James R. Barlow
c75ff4687a Turning on Ghostscript interpolation changes this test
Seems acceptable. We don't normally use Ghostscript to downsample PDFs
like is happening in this test.
2021-11-15 16:36:24 -08:00
mara004
312c1e51b5 [ci skip] minor corrections to maintainers.rst (#858) 2021-11-15 15:13:12 -08:00
James R. Barlow
cfe2bb25ba Merge commit 'cd49e70154f82f54bf74fc5bb2586fe7e0358971' into release/v13 2021-11-15 00:33:34 -08:00
Tristan Porteries
cd49e70154 ghostscript: force interpolation when rendering (#855)
Specifying option --oversample tends to introduce upsampling in rendering
by rasterizing page to an higher DPI.

This upsampling improves OCR results, but a correct choice of interpolation
method can increase even more the OCR quality.

Ghostscript seems to use a nearest interpolation as default choice for pdf.
This method doesn't average new introduced pixels with original pixels
resulting in an almost similar image but with more pixels.

Providing -dInterpolateControl=-1 force switching interpolation on.

In this commit the above option is passed to all ghostscript rendering
calls.

After testing, rendering a page at same DPI with interpolation
enabled does not introduce significant time overhead.

time (repeat 40 gs -dQUIET -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m \
	-dFirstPage=1 -dLastPage=1 -r100.000000x100.000000 \
	-dInterpolateControl=-1 -o /dev/null -dAutoRotatePages=/None -f pzII.pdf)
7,66s user 0,33s system 99% cpu 8,012 total

time (repeat 40 gs -dQUIET -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m \
	-dFirstPage=1 -dLastPage=1 -r100.000000x100.000000 \
        -o /dev/null -dAutoRotatePages=/None -f pzII.pdf)
7,42s user 0,39s system 99% cpu 7,808 total

Ghostscript interpolation control reference:
https://www.ghostscript.com/doc/current/Use.htm
2021-11-15 00:32:58 -08:00
James R. Barlow
7ce1692eef windows: default version to '0' when looking for Ghostscript
To avoid ValueError: max() arg is an empty sequence

As suggested by @meet1919 in #833.
2021-11-14 23:00:08 -08:00
James R. Barlow
7959f7628d pyproject: tell black to target py37 2021-11-14 15:49:01 -08:00