James R. Barlow
79b3472b26
All tests passed, bump version
v3.1
2015-12-04 04:31:01 -08:00
James R. Barlow
f1b2f1ae08
Merge branch 'feature/pdfa-2' into develop
2015-12-04 04:04:08 -08:00
James R. Barlow
ee7d97ae8c
Trivial
2015-12-04 04:03:38 -08:00
James R. Barlow
7d9f473bb1
Remove eval() call by introspecting ExitCode
2015-12-04 03:34:53 -08:00
James R. Barlow
e77a5e5e75
We don't want threads. Really. Do. Not. Want.
2015-12-04 03:11:38 -08:00
James R. Barlow
6ab19af122
Comments
2015-12-04 03:09:39 -08:00
James R. Barlow
276fe49867
Better error messages for input file not found or invalid
...
Not as good finding a general way to deal with ruffus exceptions, but
better than nil.
2015-12-04 03:07:53 -08:00
James R. Barlow
acb31abe86
Fix issue #20 - fails on uppercase .PDF
2015-12-04 02:14:09 -08:00
James R. Barlow
4f964a3c8a
Introduce --pdf-renderer auto
...
Tess 3.03's has various quality problems like wrong DPI that are fixed
in Tess 3.04. Idea here is to introduce an option to let OCRmyPDF
select the rendering backend based on the options and system.
However, we're not ready for tesseract as the main renderer.
Setting pdf-renderer to tesseract does not pass all test cases, mainly
the one where --tesseract-timeout is triggered, and some others.
2015-12-02 23:20:31 -08:00
James R. Barlow
df1fda7438
pageinfo: workaround PyPDF extractText limitations on hidden text
...
It appears that extractText() does not find all text. At a glance it
may be that Tesseract's PDF renderer generates a font and uses glyphs
that map to different Unicode code points that PyPDF expects, so it
discards the content and finds nothing. As a proxy in lieu of better
PDF parsing, assume that a "GlyphLessFont" means there is a text there.
I had previously found it does not work to check for the presence of a
font on page. Some PDF generators create a font resource entry even if
the font is never called for.
2015-12-02 23:16:36 -08:00
James R. Barlow
d6124c1787
pageinfo: improve robustness of text test for Tesseract produced PDFs
2015-12-02 03:12:52 -08:00
James R. Barlow
80d89b5420
Set /Creator metadata to OCRmyPDF
...
with reference to Tess version and settings
2015-12-02 02:19:39 -08:00
James R. Barlow
74059eecf1
Choose PDF/A-2b by default instead of A-1b
2015-12-02 01:48:10 -08:00
James R. Barlow
78697341a2
pytest: don't run tests that happened to be part of pyvenv
2015-12-02 01:19:43 -08:00
James R. Barlow
cfb56dd8ff
Merge commit 'b1769cbe18e6380ddfe96b3b22e6d02cb603338b' into develop
2015-12-01 00:40:43 -08:00
jbarlow83
b1769cbe18
README: El Capitan supported now, Py3.5 supported
2015-11-26 16:31:33 -08:00
James R. Barlow
955b801e7f
Merge branch 'master' into develop
2015-09-14 00:34:21 -07:00
James R. Barlow
3cea3f1afe
Try to work around git binary file bug again
2015-09-14 00:34:16 -07:00
James R. Barlow
fd4a227ccb
Force this file to stop thinking it was modified
2015-09-13 17:53:01 -07:00
James R. Barlow
19c3097483
Update notes
2015-09-13 17:51:18 -07:00
James R. Barlow
cdd1a6d03c
Suppress failing test
2015-09-10 07:01:14 -07:00
James R. Barlow
5fb8411571
Try new PPA for libav
2015-09-10 06:01:59 -07:00
James R. Barlow
334a15b8c7
typo fix
2015-09-10 05:01:44 -07:00
James R. Barlow
6390736577
ffmpeg-dev instead?
2015-09-10 04:27:57 -07:00
James R. Barlow
d55a214516
Autoreconf?
2015-09-10 04:10:12 -07:00
James R. Barlow
0994164b9a
travis: apt-get install in wrong place
2015-09-06 01:43:47 -07:00
James R. Barlow
54ee0dd147
travis: fix typo
2015-09-06 01:39:54 -07:00
James R. Barlow
47c7990fb3
travis: build unpaper with cache
2015-09-06 01:38:01 -07:00
James R. Barlow
997e95de4d
travis: build unpaper
2015-09-06 01:29:07 -07:00
James R. Barlow
44204be256
Fix order of PPAs
2015-09-06 00:54:50 -07:00
James R. Barlow
9b1d9aa88a
travis: improve, add new PPA, etc.
2015-09-06 00:41:23 -07:00
James R. Barlow
b775762f6a
travis: doesn't like gcc-4.8, try just gcc
2015-09-06 00:23:05 -07:00
James R. Barlow
df1a28e319
Travis needs sudo mode
2015-09-06 00:21:20 -07:00
James R. Barlow
c300b2802a
travis: tabs -> spaces
2015-09-06 00:08:25 -07:00
James R. Barlow
01040ace4c
More complete travis.yml
2015-09-06 00:02:58 -07:00
James R. Barlow
8367172e0b
Start setting up Travis CI
2015-09-05 23:44:43 -07:00
James R. Barlow
09afd8d25d
Move to my repo: github.com/fritz-hh => jbarlow83
...
I made several efforts to contact fritz but he is no longer
communicating, and to set up Github integrations with Docker and Travis
CI I need admin access. Which I don't have. So I'm moving it to my own
and aiming the old one at me.
v3.0
2015-09-05 01:14:54 -07:00
James R. Barlow
7ed60429b3
Test case: No longer using JHOVE
...
So JHOVE will not claim this is an invalid PDF and we should see it
reported as valid.
2015-09-05 01:12:33 -07:00
James R. Barlow
281eafada0
bump to v3.0 and move repos
2015-09-05 00:53:14 -07:00
James R. Barlow
c14e10128a
Bump version to -rc9
v3.0-rc9
2015-08-29 16:43:22 -07:00
James R. Barlow
3270635192
ghostscript: quiet startup on rasterize
2015-08-28 04:51:36 -07:00
James R. Barlow
3d26257710
Add test cases for additional image formats
2015-08-28 04:51:11 -07:00
James R. Barlow
c4f134d694
Prevent running validation on missing file after an exception is thrown
2015-08-28 04:48:29 -07:00
James R. Barlow
83f9dfbac4
Use png256 raster device when possible
...
Someone reported a bug where the .png input to unpaper ended up being
type 'P' (palette) for some reason, which was not supported in unpaper.
Not sure how it happened, but seemed easier to fix by explicitly
supporting. Here we use png256 if it would capture all colors in the
input file. It's up to tesseract/reportlab to make use of the palette
PNG when rendering.
2015-08-28 04:47:57 -07:00
James R. Barlow
3a445ad5f7
unpaper: support paletted files by conversion instead of bailing
2015-08-28 04:44:26 -07:00
James R. Barlow
c6d106ec33
Throw exception if iccprofiles not found instead of returning None
...
So far iccprofiles were only missing for a user who had a custom and
possibly broken ghostscript installation.
2015-08-28 03:59:35 -07:00
James R. Barlow
2ce6834be4
Bump to -rc8
v3.0-rc8
2015-08-24 01:25:01 -07:00
James R. Barlow
b376672dbc
Bug fix: exception thrown if input PDF was missing DocumentInfo block
2015-08-24 01:23:30 -07:00
James R. Barlow
d07db8547f
Merge branch 'master' of https://github.com/fritz-hh/OCRmyPDF
v3.0-rc7
2015-08-23 12:30:46 -07:00
James R. Barlow
aab08bfcc7
Fix requirements.txt problem
2015-08-23 12:30:40 -07:00