James R. Barlow
4eacb3454f
hOCR: write text in correct order
...
Fixes #642
2020-09-29 02:45:11 -07:00
James R. Barlow
3ef8872a1e
pngquant driver: refactor, use streams instead of temporary files
2020-09-25 00:18:02 -07:00
James R. Barlow
28eec73eed
Tighten unpaper-args validation to exclude . and ..
...
Just in case
2020-09-25 00:18:02 -07:00
James R. Barlow
bfe4a5b329
Tidy a log message
2020-09-25 00:17:57 -07:00
Suyash Behera
9a6cd95e5f
load zlib before liblept on windows ( #633 )
...
fixes #631
2020-09-17 03:14:42 -07:00
James R. Barlow
d464d3122e
Use img2pdf to create optimized PNG images
...
Fixes #629 , #620
2020-09-17 03:11:26 -07:00
James R. Barlow
1327ab37d4
Fix page rotation regression
...
Fixes #634 , #581
2020-09-17 02:57:00 -07:00
James R. Barlow
67553fc5c6
Display page numbers in log messages when grafting
2020-09-17 01:20:50 -07:00
James R. Barlow
306a903854
Remove unused function log_page_orientations
2020-09-17 01:20:02 -07:00
James R. Barlow
b93cf51c0f
Disable pikepdf mmap
...
Infrequently we can reproduce this error:
terminating with uncaught exception of type std::runtime_error: pybind11_object_dealloc(): Tried to deallocate unregistered instance!
The error is probably related to pybind11 issue #2252 and a bunch of
other related issues. Until that is resolved in pybind11 and pikepdf
we will disable the pikepdf mmap interface.
2020-09-16 23:48:55 -07:00
James R. Barlow
8b5b02e0d8
Expand documentation of filter_page_image
2020-09-14 14:36:17 -07:00
James R. Barlow
31994258fb
metadata fixup: don't try to update original PDF's metadata with docinfo
2020-09-08 02:35:16 -07:00
James R. Barlow
1f15ecbca5
Add "Postprocessing" message as a hint for long Ghostscript runs
2020-09-08 02:34:10 -07:00
James R. Barlow
e6a7b58863
Merge branch 'de-gpl'
2020-08-12 12:20:38 -07:00
James R. Barlow
9b641055e1
Fix KeyError: 'dpi' when using --threshold on image to PDF
...
Fixes #607
2020-08-07 02:21:02 -07:00
James R. Barlow
8c90f7c972
Replace GPLv3-derived PDF/A template with PostScript generator
2020-08-05 01:30:45 -07:00
James R. Barlow
aa0ec40102
Change license of all GPLv3 files to MPL-2.0
...
https://github.com/jbarlow83/OCRmyPDF/issues/600
2020-08-05 00:44:42 -07:00
James R. Barlow
4cc0dc6b4a
Additional size increase reasons
2020-08-03 16:03:29 -07:00
James R. Barlow
d6128e6937
Fix support for older versions of pdfminer.six (boxes_flow error)
2020-07-26 21:51:25 -07:00
James R. Barlow
642437e804
Merge branch 'master' of github.com:jbarlow83/OCRmyPDF
2020-07-22 00:34:33 -07:00
James R. Barlow
a672422b0b
Enable pikepdf mmap in other contexts
2020-07-22 00:20:07 -07:00
James R. Barlow
addc2cbad0
Enable pikepdf mmap and set up signal handlers
2020-07-22 00:19:50 -07:00
James R. Barlow
93f9bffb37
Merge branch 'feature/leptonica-179'
2020-07-20 21:23:53 -07:00
James R. Barlow
44149ad319
Disable test_error_trap for Leptonica < 1.79
...
Old error trap seems unreliable in the first place so difficult to set up
a test.
2020-07-20 21:12:00 -07:00
fcatus
d80d963cea
pdfinfo: Replace list comp with gen expr'n
2020-07-20 02:21:58 -07:00
James R. Barlow
5cbbff8472
For Leptonica 1.79+ use leptSetStderrHandler
...
Lock free and considerably less dangerous to stderr messages.
2020-07-19 03:40:33 -07:00
James R. Barlow
fa6e47c277
Merge branch 'feature/optimize-cleanup'
2020-07-19 01:53:11 -07:00
James R. Barlow
4ea9cffebd
Add locking to Leptonica error trap
...
To protect another thread from interfering with our redirection of
stderr.
2020-07-19 01:51:58 -07:00
James R. Barlow
1558e068f1
docs: explain firstresult hook behavior
2020-07-16 00:01:59 -07:00
James R. Barlow
a510b21b20
optimize: add typing for Xref, remove fspath()'s
2020-07-09 14:06:41 -07:00
James R. Barlow
373f27832b
optimize: improve typing of xref_exts
2020-07-07 22:41:29 -07:00
James R. Barlow
b20a6e4c5d
optimize: add type hints
2020-07-07 22:18:50 -07:00
James R. Barlow
49734d5456
optimize: fix incorrect to prevent re-optimizing JBIG2s
2020-07-07 21:52:11 -07:00
James R. Barlow
60be64a5f1
Fix debug.log missing pageno handler
2020-07-04 03:59:38 -07:00
James R. Barlow
190294634c
docs: edit plugins
2020-07-03 16:16:01 -07:00
James R. Barlow
dc42beb6a8
More typing improvements
...
Typing fixes bugs.
2020-06-30 15:02:30 -07:00
James R. Barlow
378f543619
TextPositionTracker: set boxes_flow=None
...
We don't care about the order of lines in our analysis, and this is an
expensive calculation in pdfminer.
2020-06-30 04:20:58 -07:00
James R. Barlow
62924ee280
Improve API documentation
2020-06-30 04:20:14 -07:00
James R. Barlow
86a73191b0
Plugin manager: accept Path(plugin)
2020-06-30 04:17:30 -07:00
James R. Barlow
86875997b8
Fix more mypy errors
2020-06-29 02:17:14 -07:00
James R. Barlow
b939584c7a
quality: fixing typing issues
2020-06-29 01:45:45 -07:00
James R. Barlow
30404f53f0
Add test to sanity check our pdf renderers
2020-06-22 16:18:38 -07:00
James R. Barlow
1ce8edbdfe
hocrtransform: some text not included in output after Tesseract changes
2020-06-22 15:48:23 -07:00
James R. Barlow
d4b704a0ae
hocrtransform: refactor colors
2020-06-22 15:22:48 -07:00
James R. Barlow
2d64e1536d
hocrtransform: refactor xpath manipulations
2020-06-22 14:44:34 -07:00
James R. Barlow
c8b581ac31
hoctransform: remove deprecated element.getchildren()
...
Breaks Python 3.9.
2020-06-22 14:28:18 -07:00
James R. Barlow
ad8dead7df
Document that API accepts streams now
2020-06-22 14:27:27 -07:00
James R. Barlow
c9bd87254e
A few minor typing issues
2020-06-22 02:31:53 -07:00
James R. Barlow
f4cb424451
Support input/output streams at API level
2020-06-22 02:02:18 -07:00
James R. Barlow
86ec63f215
Decouple plugin manager forking from PdfContext/Pagecontext
2020-06-22 01:16:59 -07:00