James R. Barlow
190ca81951
v13.1.1 release notes
v13.1.1
2021-12-10 21:49:04 -08:00
James R. Barlow
d48254d477
Fix issue with attempting to deskew a blank page on Tesseract 5
...
Closes #868
2021-12-10 21:48:09 -08:00
James R. Barlow
1ec2ccca14
docs: add warning about multiproc on macOS
2021-12-10 17:42:50 -08:00
James R. Barlow
e78f0cc56f
v13.1.0 release notes
v13.1.0
2021-12-06 20:00:53 -08:00
James R. Barlow
13af3252ff
tests: simplify run_ocrmypdf API
2021-12-06 17:00:25 -08:00
James R. Barlow
0528867e0b
config: yaml strings for versions
2021-12-06 16:59:44 -08:00
James R. Barlow
6910c48b81
Fix test_outputtype_none on Windows and cleanup docs
2021-12-06 15:38:38 -08:00
James R. Barlow
69aa3981c4
docs: Remove reference to long removed 'tesseract' renderer
2021-12-06 15:38:38 -08:00
James R. Barlow
9c1e5adfe6
docs: remove Ubuntu 16.04 install instructions
...
It's EOL.
2021-12-06 15:38:38 -08:00
James R. Barlow
e642dd4b35
Fix kill signal on Windows
2021-12-06 15:38:32 -08:00
James R. Barlow
9de06f62ee
Use Python executors instead of pools
...
ProcessPool/ThreadPool don't have the ability to notice when a child worker
was terminated. ProcessPoolExecutor and ThreadPoolExecutor do notice and
provide better error messages.
Add tests to check.
2021-12-06 15:38:27 -08:00
James R. Barlow
1414a8f5dc
Tidy pyproject.toml
2021-12-06 15:38:27 -08:00
James R. Barlow
26badf2882
typing: small improvements
2021-12-06 15:38:27 -08:00
James R. Barlow
8f873aaa45
sync: typing improvements
2021-12-06 15:38:27 -08:00
James R. Barlow
8fdcb15b4e
tests: improve typing and remove some legacy code
2021-12-06 15:38:27 -08:00
James R. Barlow
0323738ada
ocrmypdf.fish: fix indents
...
[ci skip]
2021-12-06 15:38:27 -08:00
FPille
aae5591f7e
Update ocrmypdf.bash completion
...
Squashed commit of the following:
commit 974de2e8ccad7fd34694f2c3a7a17c64bb52cdab
Merge: a8d7f969 ee04aa72
Author: James R. Barlow <james@purplerock.ca >
Date: Sat Dec 4 20:22:50 2021 -0800
Merge branch 'update_bash-completion' of git://github.com/FPille/OCRmyPDF into FPille-update_bash-completion
commit ee04aa7225
Author: FPille <f.pille@gmail.com >
Date: Thu Oct 14 11:09:23 2021 +0200
update
commit 76f64537aa
Author: FPille <f.pille@gmail.com >
Date: Thu Oct 14 11:04:10 2021 +0200
updated and descriptions for arguments and choices added
deprecated arguments removed
bug fix: typo "_init_completion" instead of "_init_completions"
commit de9b93e852
Merge: c23374de 42713b77
Author: Frank <50119297+FPille@users.noreply.github.com >
Date: Thu Oct 14 08:08:11 2021 +0200
Merge branch 'jbarlow83:master' into master
commit c23374de81
Merge: 40b2ebcb c409fa58
Author: Frank <50119297+FPille@users.noreply.github.com >
Date: Wed May 26 20:31:00 2021 +0200
Merge branch 'jbarlow83:master' into master
commit 40b2ebcb37
Merge: 79c84eef 7e388f59
Author: Frank <50119297+FPille@users.noreply.github.com >
Date: Sat Jun 1 11:09:07 2019 +0200
Merge pull request #1 from jbarlow83/master
update master
2021-12-06 15:38:26 -08:00
James R. Barlow
4c1ff1086c
tess cache: don't include full platform - could be sensitive
2021-12-06 15:38:26 -08:00
James R. Barlow
f91faf9795
Add new argument --tesseract-thresholding to control tesseract thresholding where available
...
Also add missing test for --tesseract-oem
2021-12-06 15:38:14 -08:00
James R. Barlow
793cc33a90
Whitespace
2021-12-04 16:07:34 -08:00
James R. Barlow
fbd72efd45
build: typo
v13.0.0
2021-12-04 01:41:31 -08:00
James R. Barlow
1115923995
build: address checksum error from choco
2021-12-04 01:26:38 -08:00
James R. Barlow
8478d67b28
Merge branch 'release/v13' of github.com:jbarlow83/OCRmyPDF into release/v13
2021-11-15 16:38:11 -08:00
James R. Barlow
c75ff4687a
Turning on Ghostscript interpolation changes this test
...
Seems acceptable. We don't normally use Ghostscript to downsample PDFs
like is happening in this test.
2021-11-15 16:36:24 -08:00
mara004
312c1e51b5
[ci skip] minor corrections to maintainers.rst ( #858 )
2021-11-15 15:13:12 -08:00
James R. Barlow
cfe2bb25ba
Merge commit 'cd49e70154f82f54bf74fc5bb2586fe7e0358971' into release/v13
2021-11-15 00:33:34 -08:00
Tristan Porteries
cd49e70154
ghostscript: force interpolation when rendering ( #855 )
...
Specifying option --oversample tends to introduce upsampling in rendering
by rasterizing page to an higher DPI.
This upsampling improves OCR results, but a correct choice of interpolation
method can increase even more the OCR quality.
Ghostscript seems to use a nearest interpolation as default choice for pdf.
This method doesn't average new introduced pixels with original pixels
resulting in an almost similar image but with more pixels.
Providing -dInterpolateControl=-1 force switching interpolation on.
In this commit the above option is passed to all ghostscript rendering
calls.
After testing, rendering a page at same DPI with interpolation
enabled does not introduce significant time overhead.
time (repeat 40 gs -dQUIET -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m \
-dFirstPage=1 -dLastPage=1 -r100.000000x100.000000 \
-dInterpolateControl=-1 -o /dev/null -dAutoRotatePages=/None -f pzII.pdf)
7,66s user 0,33s system 99% cpu 8,012 total
time (repeat 40 gs -dQUIET -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m \
-dFirstPage=1 -dLastPage=1 -r100.000000x100.000000 \
-o /dev/null -dAutoRotatePages=/None -f pzII.pdf)
7,42s user 0,39s system 99% cpu 7,808 total
Ghostscript interpolation control reference:
https://www.ghostscript.com/doc/current/Use.htm
2021-11-15 00:32:58 -08:00
James R. Barlow
7ce1692eef
windows: default version to '0' when looking for Ghostscript
...
To avoid ValueError: max() arg is an empty sequence
As suggested by @meet1919 in #833 .
2021-11-14 23:00:08 -08:00
James R. Barlow
7959f7628d
pyproject: tell black to target py37
2021-11-14 15:49:01 -08:00
James R. Barlow
4634b20de5
Raise max-image-mpixels again
...
PDFs are quite likely to have a lot of pixels, e.g. large high resolution scans.
250 MP is a page of A0 sized paper scanned at 400 DPI,
should be enough in most cases.
2021-11-14 15:47:39 -08:00
James R. Barlow
3810e576ff
optimize: fix mypy lint
2021-11-13 14:48:00 -08:00
James R. Barlow
01c7895044
pipeline: tidy
2021-11-13 14:47:49 -08:00
James R. Barlow
fdc6aa03fb
docs: new maintainer notes
2021-11-13 14:29:30 -08:00
James R. Barlow
25cc17ee03
v13 release notes (2)
v13.0.0rc1
2021-11-13 02:02:04 -08:00
James R. Barlow
e8098a1475
Dockerfile: remove requirements/
2021-11-13 01:57:17 -08:00
James R. Barlow
6b773883dc
build: use latest pip and wheel in all cases
2021-11-13 01:57:03 -08:00
James R. Barlow
4ed9622335
v13 release notes
2021-11-13 01:37:38 -08:00
James R. Barlow
acc9d58c39
Skip no language test for Tess 5
2021-11-13 01:37:27 -08:00
James R. Barlow
659e738f92
Remove some 'liblept' references we no longer need
2021-11-13 01:22:09 -08:00
James R. Barlow
7b3d7ca92a
ghostscript: choco doesn't put Ghostscript on PATH anymore
...
It seems that chocolately doesn't put gswin[32,64]c on PATH anymore,
so compensate.
2021-11-13 01:18:12 -08:00
James R. Barlow
e3126d2806
Adjust test to support Tesseract 5 working harder to find its files
2021-11-13 01:16:35 -08:00
James R. Barlow
45020a7fcd
build: tweak CI
2021-11-13 00:56:49 -08:00
James R. Barlow
f51164aff8
Upgrade test version of pymupdf
2021-11-13 00:53:41 -08:00
James R. Barlow
6f58a14351
pdfa: remove deprecated pkg_resources based access and tests
2021-11-13 00:52:03 -08:00
James R. Barlow
7ba04267b1
Remove shims to support for old versions of pikepdf < 4
2021-11-13 00:43:20 -08:00
James R. Barlow
9749564313
Remove requirements/*.txt - use pip install ocrmypdf[etc] instead
2021-11-13 00:31:42 -08:00
James R. Barlow
698e8791d7
Remove Python 3.6 specific unicode environment checks
2021-11-13 00:28:52 -08:00
James R. Barlow
380b981763
Remove most Python 3.6 special casing
2021-11-13 00:27:48 -08:00
James R. Barlow
5abfb14c2a
Remove leptonica and cffi
2021-11-13 00:06:35 -08:00
James R. Barlow
036afc4d88
Update cache, related to previous apparently
2021-11-12 23:57:50 -08:00