James R. Barlow
b3097a2384
Fix broken test case related to language packs
v5.3.2
2017-08-24 13:01:02 -07:00
James R. Barlow
6d9ddbe98b
v5.3.1 notes
v5.3.1
2017-08-24 01:09:19 -07:00
James R. Barlow
9bb42c0229
Wrong error type used for missing language
2017-08-24 01:07:23 -07:00
James R. Barlow
bd7226b27a
Merge branch 'master' of github.com:jbarlow83/OCRmyPDF
2017-08-23 23:30:19 -07:00
James R. Barlow
5b413e3873
Cookbook: add "don't OCR" examples
2017-08-23 23:29:41 -07:00
James R. Barlow
be5831a629
Offer the readme as a long description for new PyPI
2017-08-23 23:29:21 -07:00
jbarlow83
084d2bf8e2
More badges
2017-08-23 23:19:29 -07:00
James R. Barlow
da79e6bac7
macos: Skip brew audit because it seems to crash ruby on travis
2017-07-27 16:00:41 -07:00
James R. Barlow
c4831ac00c
v5.3 release notes
v5.3
2017-07-27 00:11:12 -07:00
James R. Barlow
93a954ef9f
Fix missing import for Py3.5
2017-07-26 23:40:01 -07:00
James R. Barlow
f7ce8f44e9
Weaken the --user-words test so it will pass on Travis
2017-07-26 21:03:51 -07:00
James R. Barlow
0b012697e5
Whitelist the Latin-1 languages that work with HOCR
...
Omitted French because the rare 'oe' and 'ÿ' glyphs are not in Latin-1.
Basically steer people away from HOCR renderer but avoid a potential
disruptive behavior change.
2017-07-26 21:03:18 -07:00
James R. Barlow
58e357c992
Report location of attempted output_file that fails to write
2017-07-22 17:49:56 -07:00
James R. Barlow
71fbad83ad
Fix py3.5 test
2017-07-21 17:01:06 -07:00
James R. Barlow
52483072dc
Add a differential test that checks tesseract uses supplied word list
2017-07-21 16:40:20 -07:00
James R. Barlow
7f0b8621f3
Tests: accept rich path objects without having to str() everything
2017-07-21 16:39:22 -07:00
James R. Barlow
cd8db60b06
Crash test all renderers, not just two
2017-07-21 14:10:02 -07:00
James R. Barlow
1aa34f5d2e
Make some interfaces accepting of both str-paths and Path objects
2017-07-21 13:28:30 -07:00
James R. Barlow
dfa1d88ce9
Fix missing user_words/user_patterns from textonly_pdf case
2017-07-20 17:14:04 -07:00
James R. Barlow
dd38519f07
Merge branch 'feature/user-words' into develop
...
# Conflicts:
# ocrmypdf/exec/tesseract.py
2017-07-20 16:25:20 -07:00
James R. Barlow
098f5d4f0b
docs: remove deprecated example of pdftotext
2017-07-20 16:20:17 -07:00
James R. Barlow
ffc685d536
docs: envvar markup
2017-07-20 16:19:57 -07:00
James R. Barlow
cd1a99a0de
Refactor int(os.path.basename(s)[0:6]) -> page_number(s)
2017-06-26 13:29:40 -07:00
James R. Barlow
48e3b267fc
Accept PDFs with whitespace ahead of %PDF marker
...
Noticed in @aagahi 's fork
2017-06-26 13:17:47 -07:00
James R. Barlow
3a7c3417bb
Don’t check tags and branch at the same time as Travis doesn’t get this
...
Travis is weird
v5.2
2017-06-13 13:14:34 -07:00
James R. Barlow
d792ef7222
Give the ‘auto’ renderer setting more test covfefe
2017-06-13 13:13:58 -07:00
James R. Barlow
2c24f67deb
Rename “tess4” renderer to “sandwich” and make it default in Tess 3.05.01
...
Tesseract 3.05.01 backported the textonly_pdf=1 which allows the use
of this superior PDF renderer prior to 4.00 alpha. This means that
the tess4 name is no longer accurate, so call it a sandwich because of
its merge-preserve characteristic. Preserve the tess4 name. Fix the
documentation and tests to reflect this.
Make it the default, because it’s better. It does not have the issues
the “tesseract” renderer does prior to Tess 3.05.00 with rendering
PDFs that Ghostscript corrupts, and it produces better output without
re-rastering.
Deprecate some old stuff to avoid the test suite growing obscenely
large.
2017-06-13 13:09:12 -07:00
James R. Barlow
9e75e28d0c
Homebrew needs x11 to compile Pillow
2017-06-13 11:03:26 -07:00
James R. Barlow
3232643809
Support “textonly PDF” renderer in Tesseract 3.05.01
2017-06-13 10:18:08 -07:00
James R. Barlow
f7ee9e90ce
Document what is meant by the ocrmypdf “API”
2017-06-13 10:15:11 -07:00
James R. Barlow
47298be132
Remove Python <3.5 test
2017-06-13 10:14:28 -07:00
James R. Barlow
a88fa83515
Travis: fix deploy conditions for homebrew autobrew
2017-05-31 02:29:32 -07:00
James R. Barlow
12bfe20385
v5.1 release notes
v5.1
2017-05-29 14:36:50 -07:00
James R. Barlow
3d2f6f0772
Fix tess4 test using old-style pageinfo API
2017-05-29 13:51:21 -07:00
James R. Barlow
1cb607f64b
Merge UserUnit
2017-05-29 13:22:55 -07:00
James R. Barlow
d3c54fbbde
For —rotate-pages, rasterize preview at half DPI instead of 200 DPI
...
Ensures that time is not wasted on previews at higher resolution than
the input as was sometimes the case
2017-05-29 13:01:18 -07:00
James R. Barlow
28341b755f
Refactor common test fixtures
2017-05-29 12:47:55 -07:00
James R. Barlow
4b5cd420e1
Add new test file
2017-05-29 12:16:08 -07:00
James R. Barlow
1d57bcc99e
Fix Ghostscript rasterizing of UserUnit pages and related sizing issues
2017-05-29 12:14:10 -07:00
James R. Barlow
facdd13879
Ghostscript: refactor image output resizing
2017-05-29 11:42:27 -07:00
James R. Barlow
6e891f91d3
ghostscript, qpdf: Restore API backward compatibility
2017-05-29 11:13:06 -07:00
James R. Barlow
9b50ede977
Partially solve ghostscript rasterize_pdf producing wrong file size
...
Kludge. Assumes JPEG for now. Messy.
2017-05-25 01:17:43 -07:00
James R. Barlow
82cf010333
Error out if trying to produce PDF/A >200” due to Ghostscript limitation
2017-05-25 00:07:29 -07:00
James R. Barlow
6ff6c8614f
—output-type=pdf now outputs /UserUnit PDFs at the correct size
...
This currently distorts the output size because Tesseract assumes it
knows the DPI better than we do.
Does not work for Ghostscript, because it emerges that Ghostscript
honors /UserUnit for rasterizing but not in pdfwrite (resolve/wontfix).
https://bugs.ghostscript.com/show_bug.cgi?id=690781
Ghostscript’s output would need to be patched in a PDF/A safe way for
this to work. Temporary route may be to block Ghostscript if
/UserUnit.
2017-05-24 23:26:07 -07:00
James R. Barlow
eb1cd38f6c
Add an open helper that is compatible with pathlib
2017-05-24 16:19:15 -07:00
James R. Barlow
148b632b4f
Prove multiprocessing works, although it is still racy in some places
2017-05-23 16:32:13 -07:00
James R. Barlow
591e213713
Add more dependencies for autobrew
2017-05-23 13:52:28 -07:00
James R. Barlow
75f2262659
Ensure JobContext stuff is actually tested for IPC consistency
2017-05-19 17:57:07 -07:00
James R. Barlow
d9005a1074
pdfinfo: replace most remaining dict-style access
2017-05-19 16:17:36 -07:00
James R. Barlow
3e73fa81bf
pageinfo: deprecation warning
2017-05-19 16:17:07 -07:00