James R. Barlow
c4831ac00c
v5.3 release notes
v5.3
2017-07-27 00:11:12 -07:00
James R. Barlow
93a954ef9f
Fix missing import for Py3.5
2017-07-26 23:40:01 -07:00
James R. Barlow
f7ce8f44e9
Weaken the --user-words test so it will pass on Travis
2017-07-26 21:03:51 -07:00
James R. Barlow
0b012697e5
Whitelist the Latin-1 languages that work with HOCR
...
Omitted French because the rare 'oe' and 'ÿ' glyphs are not in Latin-1.
Basically steer people away from HOCR renderer but avoid a potential
disruptive behavior change.
2017-07-26 21:03:18 -07:00
James R. Barlow
58e357c992
Report location of attempted output_file that fails to write
2017-07-22 17:49:56 -07:00
James R. Barlow
71fbad83ad
Fix py3.5 test
2017-07-21 17:01:06 -07:00
James R. Barlow
52483072dc
Add a differential test that checks tesseract uses supplied word list
2017-07-21 16:40:20 -07:00
James R. Barlow
7f0b8621f3
Tests: accept rich path objects without having to str() everything
2017-07-21 16:39:22 -07:00
James R. Barlow
cd8db60b06
Crash test all renderers, not just two
2017-07-21 14:10:02 -07:00
James R. Barlow
1aa34f5d2e
Make some interfaces accepting of both str-paths and Path objects
2017-07-21 13:28:30 -07:00
James R. Barlow
dfa1d88ce9
Fix missing user_words/user_patterns from textonly_pdf case
2017-07-20 17:14:04 -07:00
James R. Barlow
dd38519f07
Merge branch 'feature/user-words' into develop
...
# Conflicts:
# ocrmypdf/exec/tesseract.py
2017-07-20 16:25:20 -07:00
James R. Barlow
098f5d4f0b
docs: remove deprecated example of pdftotext
2017-07-20 16:20:17 -07:00
James R. Barlow
ffc685d536
docs: envvar markup
2017-07-20 16:19:57 -07:00
James R. Barlow
cd1a99a0de
Refactor int(os.path.basename(s)[0:6]) -> page_number(s)
2017-06-26 13:29:40 -07:00
James R. Barlow
48e3b267fc
Accept PDFs with whitespace ahead of %PDF marker
...
Noticed in @aagahi 's fork
2017-06-26 13:17:47 -07:00
James R. Barlow
3a7c3417bb
Don’t check tags and branch at the same time as Travis doesn’t get this
...
Travis is weird
v5.2
2017-06-13 13:14:34 -07:00
James R. Barlow
d792ef7222
Give the ‘auto’ renderer setting more test covfefe
2017-06-13 13:13:58 -07:00
James R. Barlow
2c24f67deb
Rename “tess4” renderer to “sandwich” and make it default in Tess 3.05.01
...
Tesseract 3.05.01 backported the textonly_pdf=1 which allows the use
of this superior PDF renderer prior to 4.00 alpha. This means that
the tess4 name is no longer accurate, so call it a sandwich because of
its merge-preserve characteristic. Preserve the tess4 name. Fix the
documentation and tests to reflect this.
Make it the default, because it’s better. It does not have the issues
the “tesseract” renderer does prior to Tess 3.05.00 with rendering
PDFs that Ghostscript corrupts, and it produces better output without
re-rastering.
Deprecate some old stuff to avoid the test suite growing obscenely
large.
2017-06-13 13:09:12 -07:00
James R. Barlow
9e75e28d0c
Homebrew needs x11 to compile Pillow
2017-06-13 11:03:26 -07:00
James R. Barlow
3232643809
Support “textonly PDF” renderer in Tesseract 3.05.01
2017-06-13 10:18:08 -07:00
James R. Barlow
f7ee9e90ce
Document what is meant by the ocrmypdf “API”
2017-06-13 10:15:11 -07:00
James R. Barlow
47298be132
Remove Python <3.5 test
2017-06-13 10:14:28 -07:00
James R. Barlow
a88fa83515
Travis: fix deploy conditions for homebrew autobrew
2017-05-31 02:29:32 -07:00
James R. Barlow
12bfe20385
v5.1 release notes
v5.1
2017-05-29 14:36:50 -07:00
James R. Barlow
3d2f6f0772
Fix tess4 test using old-style pageinfo API
2017-05-29 13:51:21 -07:00
James R. Barlow
1cb607f64b
Merge UserUnit
2017-05-29 13:22:55 -07:00
James R. Barlow
d3c54fbbde
For —rotate-pages, rasterize preview at half DPI instead of 200 DPI
...
Ensures that time is not wasted on previews at higher resolution than
the input as was sometimes the case
2017-05-29 13:01:18 -07:00
James R. Barlow
28341b755f
Refactor common test fixtures
2017-05-29 12:47:55 -07:00
James R. Barlow
4b5cd420e1
Add new test file
2017-05-29 12:16:08 -07:00
James R. Barlow
1d57bcc99e
Fix Ghostscript rasterizing of UserUnit pages and related sizing issues
2017-05-29 12:14:10 -07:00
James R. Barlow
facdd13879
Ghostscript: refactor image output resizing
2017-05-29 11:42:27 -07:00
James R. Barlow
6e891f91d3
ghostscript, qpdf: Restore API backward compatibility
2017-05-29 11:13:06 -07:00
James R. Barlow
9b50ede977
Partially solve ghostscript rasterize_pdf producing wrong file size
...
Kludge. Assumes JPEG for now. Messy.
2017-05-25 01:17:43 -07:00
James R. Barlow
82cf010333
Error out if trying to produce PDF/A >200” due to Ghostscript limitation
2017-05-25 00:07:29 -07:00
James R. Barlow
6ff6c8614f
—output-type=pdf now outputs /UserUnit PDFs at the correct size
...
This currently distorts the output size because Tesseract assumes it
knows the DPI better than we do.
Does not work for Ghostscript, because it emerges that Ghostscript
honors /UserUnit for rasterizing but not in pdfwrite (resolve/wontfix).
https://bugs.ghostscript.com/show_bug.cgi?id=690781
Ghostscript’s output would need to be patched in a PDF/A safe way for
this to work. Temporary route may be to block Ghostscript if
/UserUnit.
2017-05-24 23:26:07 -07:00
James R. Barlow
eb1cd38f6c
Add an open helper that is compatible with pathlib
2017-05-24 16:19:15 -07:00
James R. Barlow
148b632b4f
Prove multiprocessing works, although it is still racy in some places
2017-05-23 16:32:13 -07:00
James R. Barlow
591e213713
Add more dependencies for autobrew
2017-05-23 13:52:28 -07:00
James R. Barlow
75f2262659
Ensure JobContext stuff is actually tested for IPC consistency
2017-05-19 17:57:07 -07:00
James R. Barlow
d9005a1074
pdfinfo: replace most remaining dict-style access
2017-05-19 16:17:36 -07:00
James R. Barlow
3e73fa81bf
pageinfo: deprecation warning
2017-05-19 16:17:07 -07:00
James R. Barlow
ba6e290231
Restore old pageinfo.py to avoid breaking compatibility
2017-05-19 15:49:23 -07:00
James R. Barlow
08e47117a3
Rename pageinfo to pdfinfo
2017-05-19 15:48:23 -07:00
James R. Barlow
532ef38157
/UserUnit is a scalar, not an array
2017-05-19 14:19:50 -07:00
James R. Barlow
4c09875890
docs: upload unpaper Dropbox link, .rst typo blocking macOS install
...
[ci skip]
2017-05-19 12:18:09 -07:00
James R. Barlow
0e98139712
Upload to upload.pypi.org/legacy as recommend by PyPA
...
https://github.com/pypa/warehouse/issues/1996#issuecomment-302784126
2017-05-19 12:06:24 -07:00
James R. Barlow
4c04d802d7
Introduce /UserUnit checking
2017-05-19 12:01:19 -07:00
James R. Barlow
b3dc404571
Update unpaper.deb link ( fixes #171 )
...
*Shakes fist a Dropbox*
2017-05-19 11:28:45 -07:00
James R. Barlow
8694f8d2eb
Replace magic strings colorspace and encoding with Enums
2017-05-18 22:32:27 -07:00