rdiez
2be8eeec2c
Fix spelling of 'ephemeral' ( #908 )
2022-02-03 10:33:51 -08:00
James R. Barlow
3dfde479e2
The world is not ready for :=
v13.3.0
2022-01-26 00:16:51 -08:00
James R. Barlow
aea1862644
v13.3.0 release notes
2022-01-25 23:50:48 -08:00
James R. Barlow
3b406112d0
ghostscript: improve test coverage of error cases
2022-01-25 23:45:47 -08:00
James R. Barlow
fcc4c2d371
ghostscript: improve error message if image cannot be opened
2022-01-25 23:12:51 -08:00
James R. Barlow
3de18ed612
tesseract: account for more tesseract 5 output differences
2022-01-25 00:19:49 -08:00
James R. Barlow
93cca42e20
optimize: don't try to optimize an image we can't save
2022-01-25 00:19:23 -08:00
James R. Barlow
2d0ac4707c
Use better img2pdf settings where possible while supporting old versions
...
Fixes #894
2022-01-14 11:55:54 -08:00
James R. Barlow
7d208175cf
unpaper: refactoring
2022-01-11 10:57:54 -08:00
James R. Barlow
ea69e868ed
unpaper: issue warning if image too large to clean
2022-01-11 10:44:38 -08:00
James R. Barlow
beea603ab3
Revert "docs: add sphinx-panels"
...
This reverts commit 7966192d6e .
2022-01-04 11:44:20 -08:00
James R. Barlow
7966192d6e
docs: add sphinx-panels
2022-01-04 11:43:28 -08:00
Anton Gladky
5acbd7a252
Wrap exception on non-CMYK images into the log warning ( #881 )
2021-12-29 01:15:49 -08:00
Krasimir Nedelchev
aed955ca8c
Fix typo ( #882 )
2021-12-27 15:36:19 -08:00
James R. Barlow
298bdb8690
v13.2.0 release notes
v13.2.0
2021-12-19 11:40:23 -08:00
James R. Barlow
1a58abcc6a
concurrency: fix extra update of progressbar
2021-12-19 11:34:09 -08:00
James R. Barlow
dbfceba020
Standardize ghostscript version default
2021-12-18 02:01:42 -08:00
James R. Barlow
0faa618c3c
Replace deprecated distutils with packaging.version
2021-12-18 01:57:59 -08:00
James R. Barlow
7035002c03
Order of operations suppressed detailed Ghostscript missing error message
2021-12-18 01:27:39 -08:00
James R. Barlow
f8fadaef41
_windows: remove use of deprecated distutils
2021-12-13 20:45:54 -08:00
James R. Barlow
ee21bf9ef6
Update cache
2021-12-13 20:45:30 -08:00
James R. Barlow
190ca81951
v13.1.1 release notes
v13.1.1
2021-12-10 21:49:04 -08:00
James R. Barlow
d48254d477
Fix issue with attempting to deskew a blank page on Tesseract 5
...
Closes #868
2021-12-10 21:48:09 -08:00
James R. Barlow
1ec2ccca14
docs: add warning about multiproc on macOS
2021-12-10 17:42:50 -08:00
James R. Barlow
e78f0cc56f
v13.1.0 release notes
v13.1.0
2021-12-06 20:00:53 -08:00
James R. Barlow
13af3252ff
tests: simplify run_ocrmypdf API
2021-12-06 17:00:25 -08:00
James R. Barlow
0528867e0b
config: yaml strings for versions
2021-12-06 16:59:44 -08:00
James R. Barlow
6910c48b81
Fix test_outputtype_none on Windows and cleanup docs
2021-12-06 15:38:38 -08:00
James R. Barlow
69aa3981c4
docs: Remove reference to long removed 'tesseract' renderer
2021-12-06 15:38:38 -08:00
James R. Barlow
9c1e5adfe6
docs: remove Ubuntu 16.04 install instructions
...
It's EOL.
2021-12-06 15:38:38 -08:00
James R. Barlow
e642dd4b35
Fix kill signal on Windows
2021-12-06 15:38:32 -08:00
James R. Barlow
9de06f62ee
Use Python executors instead of pools
...
ProcessPool/ThreadPool don't have the ability to notice when a child worker
was terminated. ProcessPoolExecutor and ThreadPoolExecutor do notice and
provide better error messages.
Add tests to check.
2021-12-06 15:38:27 -08:00
James R. Barlow
1414a8f5dc
Tidy pyproject.toml
2021-12-06 15:38:27 -08:00
James R. Barlow
26badf2882
typing: small improvements
2021-12-06 15:38:27 -08:00
James R. Barlow
8f873aaa45
sync: typing improvements
2021-12-06 15:38:27 -08:00
James R. Barlow
8fdcb15b4e
tests: improve typing and remove some legacy code
2021-12-06 15:38:27 -08:00
James R. Barlow
0323738ada
ocrmypdf.fish: fix indents
...
[ci skip]
2021-12-06 15:38:27 -08:00
FPille
aae5591f7e
Update ocrmypdf.bash completion
...
Squashed commit of the following:
commit 974de2e8ccad7fd34694f2c3a7a17c64bb52cdab
Merge: a8d7f969 ee04aa72
Author: James R. Barlow <james@purplerock.ca >
Date: Sat Dec 4 20:22:50 2021 -0800
Merge branch 'update_bash-completion' of git://github.com/FPille/OCRmyPDF into FPille-update_bash-completion
commit ee04aa7225
Author: FPille <f.pille@gmail.com >
Date: Thu Oct 14 11:09:23 2021 +0200
update
commit 76f64537aa
Author: FPille <f.pille@gmail.com >
Date: Thu Oct 14 11:04:10 2021 +0200
updated and descriptions for arguments and choices added
deprecated arguments removed
bug fix: typo "_init_completion" instead of "_init_completions"
commit de9b93e852
Merge: c23374de 42713b77
Author: Frank <50119297+FPille@users.noreply.github.com >
Date: Thu Oct 14 08:08:11 2021 +0200
Merge branch 'jbarlow83:master' into master
commit c23374de81
Merge: 40b2ebcb c409fa58
Author: Frank <50119297+FPille@users.noreply.github.com >
Date: Wed May 26 20:31:00 2021 +0200
Merge branch 'jbarlow83:master' into master
commit 40b2ebcb37
Merge: 79c84eef 7e388f59
Author: Frank <50119297+FPille@users.noreply.github.com >
Date: Sat Jun 1 11:09:07 2019 +0200
Merge pull request #1 from jbarlow83/master
update master
2021-12-06 15:38:26 -08:00
James R. Barlow
4c1ff1086c
tess cache: don't include full platform - could be sensitive
2021-12-06 15:38:26 -08:00
James R. Barlow
f91faf9795
Add new argument --tesseract-thresholding to control tesseract thresholding where available
...
Also add missing test for --tesseract-oem
2021-12-06 15:38:14 -08:00
James R. Barlow
793cc33a90
Whitespace
2021-12-04 16:07:34 -08:00
James R. Barlow
fbd72efd45
build: typo
v13.0.0
2021-12-04 01:41:31 -08:00
James R. Barlow
1115923995
build: address checksum error from choco
2021-12-04 01:26:38 -08:00
James R. Barlow
8478d67b28
Merge branch 'release/v13' of github.com:jbarlow83/OCRmyPDF into release/v13
2021-11-15 16:38:11 -08:00
James R. Barlow
c75ff4687a
Turning on Ghostscript interpolation changes this test
...
Seems acceptable. We don't normally use Ghostscript to downsample PDFs
like is happening in this test.
2021-11-15 16:36:24 -08:00
mara004
312c1e51b5
[ci skip] minor corrections to maintainers.rst ( #858 )
2021-11-15 15:13:12 -08:00
James R. Barlow
cfe2bb25ba
Merge commit 'cd49e70154f82f54bf74fc5bb2586fe7e0358971' into release/v13
2021-11-15 00:33:34 -08:00
Tristan Porteries
cd49e70154
ghostscript: force interpolation when rendering ( #855 )
...
Specifying option --oversample tends to introduce upsampling in rendering
by rasterizing page to an higher DPI.
This upsampling improves OCR results, but a correct choice of interpolation
method can increase even more the OCR quality.
Ghostscript seems to use a nearest interpolation as default choice for pdf.
This method doesn't average new introduced pixels with original pixels
resulting in an almost similar image but with more pixels.
Providing -dInterpolateControl=-1 force switching interpolation on.
In this commit the above option is passed to all ghostscript rendering
calls.
After testing, rendering a page at same DPI with interpolation
enabled does not introduce significant time overhead.
time (repeat 40 gs -dQUIET -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m \
-dFirstPage=1 -dLastPage=1 -r100.000000x100.000000 \
-dInterpolateControl=-1 -o /dev/null -dAutoRotatePages=/None -f pzII.pdf)
7,66s user 0,33s system 99% cpu 8,012 total
time (repeat 40 gs -dQUIET -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m \
-dFirstPage=1 -dLastPage=1 -r100.000000x100.000000 \
-o /dev/null -dAutoRotatePages=/None -f pzII.pdf)
7,42s user 0,39s system 99% cpu 7,808 total
Ghostscript interpolation control reference:
https://www.ghostscript.com/doc/current/Use.htm
2021-11-15 00:32:58 -08:00
James R. Barlow
7ce1692eef
windows: default version to '0' when looking for Ghostscript
...
To avoid ValueError: max() arg is an empty sequence
As suggested by @meet1919 in #833 .
2021-11-14 23:00:08 -08:00
James R. Barlow
7959f7628d
pyproject: tell black to target py37
2021-11-14 15:49:01 -08:00