919 Commits

Author SHA1 Message Date
James R. Barlow
4eacb3454f hOCR: write text in correct order
Fixes #642
2020-09-29 02:45:11 -07:00
James R. Barlow
3ef8872a1e pngquant driver: refactor, use streams instead of temporary files 2020-09-25 00:18:02 -07:00
James R. Barlow
28eec73eed Tighten unpaper-args validation to exclude . and ..
Just in case
2020-09-25 00:18:02 -07:00
James R. Barlow
bfe4a5b329 Tidy a log message 2020-09-25 00:17:57 -07:00
Suyash Behera
9a6cd95e5f load zlib before liblept on windows (#633)
fixes #631
2020-09-17 03:14:42 -07:00
James R. Barlow
d464d3122e Use img2pdf to create optimized PNG images
Fixes #629, #620
2020-09-17 03:11:26 -07:00
James R. Barlow
1327ab37d4 Fix page rotation regression
Fixes #634, #581
2020-09-17 02:57:00 -07:00
James R. Barlow
67553fc5c6 Display page numbers in log messages when grafting 2020-09-17 01:20:50 -07:00
James R. Barlow
306a903854 Remove unused function log_page_orientations 2020-09-17 01:20:02 -07:00
James R. Barlow
b93cf51c0f Disable pikepdf mmap
Infrequently we can reproduce this error:

terminating with uncaught exception of type std::runtime_error: pybind11_object_dealloc(): Tried to deallocate unregistered instance!

The error is probably related to pybind11 issue #2252 and a bunch of
other related issues. Until that is resolved in pybind11 and pikepdf
we will disable the pikepdf mmap interface.
2020-09-16 23:48:55 -07:00
James R. Barlow
8b5b02e0d8 Expand documentation of filter_page_image 2020-09-14 14:36:17 -07:00
James R. Barlow
31994258fb metadata fixup: don't try to update original PDF's metadata with docinfo 2020-09-08 02:35:16 -07:00
James R. Barlow
1f15ecbca5 Add "Postprocessing" message as a hint for long Ghostscript runs 2020-09-08 02:34:10 -07:00
James R. Barlow
e6a7b58863 Merge branch 'de-gpl' 2020-08-12 12:20:38 -07:00
James R. Barlow
9b641055e1 Fix KeyError: 'dpi' when using --threshold on image to PDF
Fixes #607
2020-08-07 02:21:02 -07:00
James R. Barlow
8c90f7c972 Replace GPLv3-derived PDF/A template with PostScript generator 2020-08-05 01:30:45 -07:00
James R. Barlow
aa0ec40102 Change license of all GPLv3 files to MPL-2.0
https://github.com/jbarlow83/OCRmyPDF/issues/600
2020-08-05 00:44:42 -07:00
James R. Barlow
4cc0dc6b4a Additional size increase reasons 2020-08-03 16:03:29 -07:00
James R. Barlow
d6128e6937 Fix support for older versions of pdfminer.six (boxes_flow error) 2020-07-26 21:51:25 -07:00
James R. Barlow
642437e804 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2020-07-22 00:34:33 -07:00
James R. Barlow
a672422b0b Enable pikepdf mmap in other contexts 2020-07-22 00:20:07 -07:00
James R. Barlow
addc2cbad0 Enable pikepdf mmap and set up signal handlers 2020-07-22 00:19:50 -07:00
James R. Barlow
93f9bffb37 Merge branch 'feature/leptonica-179' 2020-07-20 21:23:53 -07:00
James R. Barlow
44149ad319 Disable test_error_trap for Leptonica < 1.79
Old error trap seems unreliable in the first place so difficult to set up
a test.
2020-07-20 21:12:00 -07:00
fcatus
d80d963cea pdfinfo: Replace list comp with gen expr'n 2020-07-20 02:21:58 -07:00
James R. Barlow
5cbbff8472 For Leptonica 1.79+ use leptSetStderrHandler
Lock free and considerably less dangerous to stderr messages.
2020-07-19 03:40:33 -07:00
James R. Barlow
fa6e47c277 Merge branch 'feature/optimize-cleanup' 2020-07-19 01:53:11 -07:00
James R. Barlow
4ea9cffebd Add locking to Leptonica error trap
To protect another thread from interfering with our redirection of
stderr.
2020-07-19 01:51:58 -07:00
James R. Barlow
1558e068f1 docs: explain firstresult hook behavior 2020-07-16 00:01:59 -07:00
James R. Barlow
a510b21b20 optimize: add typing for Xref, remove fspath()'s 2020-07-09 14:06:41 -07:00
James R. Barlow
373f27832b optimize: improve typing of xref_exts 2020-07-07 22:41:29 -07:00
James R. Barlow
b20a6e4c5d optimize: add type hints 2020-07-07 22:18:50 -07:00
James R. Barlow
49734d5456 optimize: fix incorrect to prevent re-optimizing JBIG2s 2020-07-07 21:52:11 -07:00
James R. Barlow
60be64a5f1 Fix debug.log missing pageno handler 2020-07-04 03:59:38 -07:00
James R. Barlow
190294634c docs: edit plugins 2020-07-03 16:16:01 -07:00
James R. Barlow
dc42beb6a8 More typing improvements
Typing fixes bugs.
2020-06-30 15:02:30 -07:00
James R. Barlow
378f543619 TextPositionTracker: set boxes_flow=None
We don't care about the order of lines in our analysis, and this is an
expensive calculation in pdfminer.
2020-06-30 04:20:58 -07:00
James R. Barlow
62924ee280 Improve API documentation 2020-06-30 04:20:14 -07:00
James R. Barlow
86a73191b0 Plugin manager: accept Path(plugin) 2020-06-30 04:17:30 -07:00
James R. Barlow
86875997b8 Fix more mypy errors 2020-06-29 02:17:14 -07:00
James R. Barlow
b939584c7a quality: fixing typing issues 2020-06-29 01:45:45 -07:00
James R. Barlow
30404f53f0 Add test to sanity check our pdf renderers 2020-06-22 16:18:38 -07:00
James R. Barlow
1ce8edbdfe hocrtransform: some text not included in output after Tesseract changes 2020-06-22 15:48:23 -07:00
James R. Barlow
d4b704a0ae hocrtransform: refactor colors 2020-06-22 15:22:48 -07:00
James R. Barlow
2d64e1536d hocrtransform: refactor xpath manipulations 2020-06-22 14:44:34 -07:00
James R. Barlow
c8b581ac31 hoctransform: remove deprecated element.getchildren()
Breaks Python 3.9.
2020-06-22 14:28:18 -07:00
James R. Barlow
ad8dead7df Document that API accepts streams now 2020-06-22 14:27:27 -07:00
James R. Barlow
c9bd87254e A few minor typing issues 2020-06-22 02:31:53 -07:00
James R. Barlow
f4cb424451 Support input/output streams at API level 2020-06-22 02:02:18 -07:00
James R. Barlow
86ec63f215 Decouple plugin manager forking from PdfContext/Pagecontext 2020-06-22 01:16:59 -07:00