Commit Graph

2720 Commits

Author SHA1 Message Date
James R. Barlow
d55e673d9c Fix warning about --pdfa-image-compression argument at wrong times
Closes #663
2020-10-27 23:09:45 -07:00
James R. Barlow
21b90d2d14 Endorse pikepdf 2.x 2020-10-27 23:09:45 -07:00
Edward Betts
2def7e3392 Use % for percentage in string format (#643) 2020-10-27 23:09:14 -07:00
James R. Barlow
b0dcaa7512 v11.3.0 release notes v11.3.0 2020-10-24 03:19:32 -07:00
James R. Barlow
e8285b1d10 Add test to confirm rasterize_pdf_page rotates correct 2020-10-24 03:10:59 -07:00
James R. Barlow
5ba56adb53 Fix page rotation issue (again)
Commit 1327ab3 introduced a fix for a regression, which was reported
in #581, #634. It appears that the actual cause of this issue was
default parameters to rasterize_pdf_page in pluggy not working as
expected, causing a default rotation=0 even when a rotation was needed.
As such the OCR image was generated with the wrong orientation,
causing the initial regression and fix in commit 1327ab3.

Now that the real problem is identified, it's apparent that the logic
prior to 1327ab3 was found and we can revert to 1327ab3 since it fixes
all known cases including #658.

This reverts 1327ab3 except for retaining improves to rotation output.
2020-10-24 02:45:21 -07:00
James R. Barlow
ca735278e0 setup: Version pluggy better 2020-10-24 02:35:41 -07:00
James R. Barlow
b5ccbfdf25 Fix hookspec of rasterize_pdf_page to remove default parameters 2020-10-24 02:35:18 -07:00
James R. Barlow
8c35d6e6e4 Fix debug log messages being suppressed from child processes 2020-10-22 02:20:06 -07:00
James R. Barlow
d1e0c81eda Ensure worker_pdf is closed after gathering info in a thread
This is hacky, uses global state, but it does improve the situation for now.
2020-10-22 00:38:24 -07:00
James R. Barlow
10c8e4f8b4 Only create debug.log when running from command line
When used as a library ocrmypdf shouldn't make policy decisions, like where to
put a log file. Unsurprisingly, creating it causes problems for library users
because we deleted the temporary folder which held the log file and made no
effort to move it to a new location.

Also update the documentation to better described how an application should
handle this.

Closes #657
2020-10-20 01:29:36 -07:00
James R. Barlow
6be2242c21 Describe "OCR" step as "Image processing" when --tesseract-timeout=0
Fixes #647
2020-10-08 01:03:42 -07:00
James R. Barlow
204c9d6ae1 Fix inverted colors during JBIG2 optimization on paletted images
Fixes #640
v11.2.1
2020-10-07 04:08:50 -07:00
James R. Barlow
6eb393590b v11.2.0 release notes
Change v11.1.3 to v11.2.0 since it contains functional changes.
v11.2.0
2020-10-06 03:24:31 -07:00
James R. Barlow
07c6654057 v11.1.3 release notes 2020-10-06 03:22:48 -07:00
James R. Barlow
4e15eb8d14 Fix image optimization discarding image masks and soft masks associated with PNGs
Fixes #648
2020-10-06 03:20:54 -07:00
James R. Barlow
8b01ab8ad2 Better type checking on ocrmypdf.ocr(plugins=...) 2020-10-05 15:02:34 -07:00
James R. Barlow
e0a522ad50 Document the example plugin 2020-10-05 15:01:44 -07:00
James R. Barlow
a1a8788c5a Merge branch 'master' of github.com:jbarlow83/OCRmyPDF v11.1.2 2020-09-29 02:46:27 -07:00
James R. Barlow
cccdc178c3 v11.1.2 release notes 2020-09-29 02:46:18 -07:00
James R. Barlow
4eacb3454f hOCR: write text in correct order
Fixes #642
2020-09-29 02:45:11 -07:00
Jimit Dholakia
82b8b41e80 docs: Add 'unpaper' optional dependency for Ubuntu 18.04 (#639) 2020-09-25 11:54:31 -07:00
James R. Barlow
581c5020ab v11.1.1 release notes v11.1.1 2020-09-25 00:28:38 -07:00
James R. Barlow
3ef8872a1e pngquant driver: refactor, use streams instead of temporary files 2020-09-25 00:18:02 -07:00
James R. Barlow
28eec73eed Tighten unpaper-args validation to exclude . and ..
Just in case
2020-09-25 00:18:02 -07:00
James R. Barlow
bfe4a5b329 Tidy a log message 2020-09-25 00:17:57 -07:00
James R. Barlow
29097837d6 Release notes typo 2020-09-19 00:49:36 -07:00
James R. Barlow
a40361db3c Remove unpaper from macOS build
Homebrew seems to be having issues with its deps?
v11.1.0
2020-09-17 03:38:48 -07:00
James R. Barlow
8b29e3cbab Merge commit '9a6cd95e5fe2826d40861229aaa0431b76e302e7' 2020-09-17 03:34:35 -07:00
James R. Barlow
b170be120b v11.1.0 release notes 2020-09-17 03:21:06 -07:00
Suyash Behera
9a6cd95e5f load zlib before liblept on windows (#633)
fixes #631
2020-09-17 03:14:42 -07:00
James R. Barlow
d464d3122e Use img2pdf to create optimized PNG images
Fixes #629, #620
2020-09-17 03:11:26 -07:00
James R. Barlow
1327ab37d4 Fix page rotation regression
Fixes #634, #581
2020-09-17 02:57:00 -07:00
James R. Barlow
67553fc5c6 Display page numbers in log messages when grafting 2020-09-17 01:20:50 -07:00
James R. Barlow
306a903854 Remove unused function log_page_orientations 2020-09-17 01:20:02 -07:00
James R. Barlow
b93cf51c0f Disable pikepdf mmap
Infrequently we can reproduce this error:

terminating with uncaught exception of type std::runtime_error: pybind11_object_dealloc(): Tried to deallocate unregistered instance!

The error is probably related to pybind11 issue #2252 and a bunch of
other related issues. Until that is resolved in pybind11 and pikepdf
we will disable the pikepdf mmap interface.
2020-09-16 23:48:55 -07:00
James R. Barlow
6b994221c6 Remove Python 3.7 from build since homebrew removed it 2020-09-16 23:44:18 -07:00
James R. Barlow
8b5b02e0d8 Expand documentation of filter_page_image 2020-09-14 14:36:17 -07:00
James R. Barlow
624df9bb23 Extend example plugin with example of mono conversion 2020-09-14 14:35:50 -07:00
James R. Barlow
fa06ea3600 v11.0.2 release notes v11.0.2 2020-09-08 02:38:57 -07:00
James R. Barlow
31994258fb metadata fixup: don't try to update original PDF's metadata with docinfo 2020-09-08 02:35:16 -07:00
James R. Barlow
1f15ecbca5 Add "Postprocessing" message as a hint for long Ghostscript runs 2020-09-08 02:34:10 -07:00
James R. Barlow
bcf5657e5c Reorganize issue templates 2020-08-26 17:11:52 -07:00
jbarlow83
2ae028bf38 Update issue templates 2020-08-26 17:03:09 -07:00
James R. Barlow
b51a5887e5 v11.0.1 release notes v11.0.1 2020-08-17 23:25:31 -07:00
James R. Barlow
fc523e837c Clarify that the GPL-3 portion of pdfa.py was removed
Removal was in 8c90f7c97.

pdfa.py now has no special licensing and falls unders the
"Files:  *" clause of debian/copyright.
2020-08-17 23:23:31 -07:00
James R. Barlow
caeba76a61 Approve img2pdf 0.4 as it passes tests 2020-08-14 01:34:04 -07:00
James R. Barlow
cd35216f21 setup: blacklist pdfminer.six 20200720
NotAllowedError is going to removed
99f0c09869
2020-08-12 13:10:08 -07:00
James R. Barlow
e6a7b58863 Merge branch 'de-gpl' v11.0.0 2020-08-12 12:20:38 -07:00
James R. Barlow
56184a762f Issue template:Give stronger hints about sample input files 2020-08-12 12:12:37 -07:00