Commit Graph

1446 Commits

Author SHA1 Message Date
James R. Barlow
6376f77b8c Refactor, remove trigonometry 2018-05-02 12:30:34 -07:00
James R. Barlow
e27e614ed9 Fixed rotation hard case 2018-05-02 01:32:11 -07:00
James R. Barlow
b0c04704a1 Fixed all but one rotation case 2018-05-02 01:24:21 -07:00
James R. Barlow
6bb6bf8323 Fix correction angle used from wrong page 2018-05-02 01:00:30 -07:00
James R. Barlow
e22fe8aefc Silence debug messages 2018-05-01 23:51:54 -07:00
James R. Barlow
76276f61e5 Split out rotation related tests 2018-05-01 23:51:35 -07:00
James R. Barlow
bfd26e6ec6 Tests: confirm OCR layer copied 2018-05-01 23:16:41 -07:00
James R. Barlow
d787e1ea0f ghostscript.py not saved in last commit
Given importance of last one, confirmed that when the file is saved all tests pass too.
Passing is invariant with this change.
2018-05-01 22:59:22 -07:00
James R. Barlow
b5d7e9cbb0 Fix all issues with rotations
All tests now pass
2018-05-01 22:50:20 -07:00
James R. Barlow
f3b6d9dcdf Fix a comment about Tesseract behavior in certain versions 2018-05-01 21:31:09 -07:00
James R. Barlow
a9abe13185 Remove the old tesseract pdf_renderer 2018-05-01 17:31:34 -07:00
James R. Barlow
6b315e8315 Add ability to disable cache 2018-05-01 15:52:00 -07:00
James R. Barlow
37677de884 Fix regressions: pdfa.ps not used, PDF/A failures, handling of text layers with no font 2018-05-01 15:51:46 -07:00
James R. Barlow
c7387de325 Fix auto rotate 2018-05-01 15:18:28 -07:00
James R. Barlow
2495b1e038 Refactor find font, get test cases working again 2018-05-01 14:48:41 -07:00
James R. Barlow
073ee52ce7 Use hocr and weave; eliminate old combine layers and merge pages 2018-05-01 14:21:53 -07:00
James R. Barlow
54150a14e9 Further elimination of tesseract renderer special casing
We don't need to keep a "skip page" around anymore since
skipping means just not grafting on the text layer.
2018-05-01 13:36:20 -07:00
James R. Barlow
88ff091cce Unify tesseract and sandwich renderer paths
Since the new weaving method copies the font and content
stream from the Tesseract PDF, it doesn't matter if Tesseract
happens to have an image or not.
If Tesseract is text-only capable we use that feature for efficiency,
but ignore the image either way.
2018-05-01 13:24:20 -07:00
James R. Barlow
e87a5776f1 Remove now-unnecessary code to rotate pages
Track only the decision to change rotation.
2018-05-01 13:01:25 -07:00
James R. Barlow
0806ce6406 Fix rotation for unsplit (modulo --rotate-pages) 2018-04-30 20:58:42 -07:00
James R. Barlow
6409894a71 feature/unsplit-try-imagerotate 2018-04-30 20:48:59 -07:00
James R. Barlow
e7286f6129 Unsplit now works with multipage, --force-ocr 2018-04-30 14:46:20 -07:00
James R. Barlow
2ab94b3151 unsplit: it's alive
First successful file output.
2018-04-28 01:57:41 -07:00
James R. Barlow
7ee90890ec Add copying of essential information from Tesseract textonly 2018-04-27 23:19:08 -07:00
James R. Barlow
8d2a917676 Page unsplit, development 2018-04-25 21:56:43 -07:00
James R. Barlow
44b4afa534 Begin conversion from page splititng to page markers 2018-04-23 22:57:50 -07:00
James R. Barlow
775be3933c Cherrypick merge_pages unification 2018-04-20 23:08:15 -07:00
Hugo
d761d80750 Use more standard __version__ rather than PILLOW_VERSION (#257) 2018-04-19 23:35:32 -07:00
James R. Barlow
0b10db91be Fix regression: Disable Ghostscript JPEG passthrough entirely v6.1.5 2018-04-17 17:00:24 -07:00
James R. Barlow
1a516b2af9 Fix regression: time stamp test suite failures 2018-04-17 16:59:21 -07:00
James R. Barlow
076363d78e Disable JPEG passthrough for Ghostscript 9.23
Seems to corrupt JPEGs involved in image masks?
2018-04-17 16:31:03 -07:00
James R. Barlow
5fde214290 Update notes for v6.1.5 2018-04-17 15:23:35 -07:00
James R. Barlow
a620724d6a Fix PDF/A validation failure due to timezone being omitted from /ModDate 2018-04-17 15:16:48 -07:00
James R. Barlow
7368399f8b Clarify license of two test files - https://github.com/jbarlow83/OCRmyPDF/issues/254 2018-04-17 11:56:36 -07:00
James R. Barlow
34c78a892a Fix list table for tests/resources
[ci skip]
2018-04-15 23:52:19 -07:00
James R. Barlow
9d28879505 Update Ubuntu 14.04 instructions
Closes #252
2018-04-14 17:30:33 -07:00
James R. Barlow
2482296e2b hocr: avoid division by zero
Issue #253 - PDF that produces the error is not available, but if font_width
is zero, chances are the text is nonprinting characters, so suppress it.
2018-04-14 17:24:21 -07:00
James R. Barlow
7fc897e6dc Fix NameError 'ghostscript' v6.1.4 2018-04-12 21:24:05 -07:00
James R. Barlow
9b731d63b8 Set Ghostscript -sColorConversionStrategy the way old/new versions expect 2018-04-12 16:28:48 -07:00
James R. Barlow
10aa59f674 v6.1.4 fix test suite regression with Ghostscript 9.23 2018-04-12 15:16:54 -07:00
James R. Barlow
1f7837e7b1 v6.1.4 release notes update 2018-04-12 00:55:45 -07:00
James R. Barlow
ba0535e3fb Update test cache to account for unpaper --layout none change 2018-04-12 00:48:21 -07:00
James R. Barlow
49fa7f6b5c tesseract_cache: don't reveal host system file paths in manifest file 2018-04-12 00:47:28 -07:00
James R. Barlow
c95db246d4 v6.1.4 merge 2018-04-11 15:58:00 -07:00
James R. Barlow
1ba93371ce docs: Update installation to reflect qpdf 7.0.0 requirement 2018-04-11 15:40:50 -07:00
James R. Barlow
fedbbdb575 Travis: compile qpdf from source
The older version in Travis's Ubuntu 14.04 can't pass the test suite anymore.
2018-04-11 15:40:45 -07:00
James R. Barlow
85ebba72bc Fix setup.py syntax 2018-04-10 18:30:48 -07:00
James R. Barlow
b6cd436d5d setup: Blacklist Pillow 5.1.0 on macos
https://github.com/python-pillow/Pillow/issues/3068
2018-04-10 18:15:37 -07:00
James R. Barlow
ec170c7e1e Travis: use setup.py for requirements, don't override with .txt 2018-04-10 17:52:19 -07:00
James R. Barlow
3d69b46fca Release notes 2018-04-10 15:53:02 -07:00