Commit Graph

844 Commits

Author SHA1 Message Date
James R. Barlow
7d2009ccef ghostscript: log errors from stdout 2016-10-27 15:36:20 -07:00
James R. Barlow
18ae5db06d ghostscript: ensure raster resolution is specified in integer units 2016-10-27 15:35:33 -07:00
James R. Barlow
9a1838f102 pageinfo: accept "cm/Do" image drawing without the usual "q/Q" wrapper
Some PDFs omit the traditional q/Q wrapper and alter ctm with a stack
depth of zero, so make our test for stack depth specifically test for
the case where the PDF calls for rendering to an uninitialized ctm.

Probably related to #97.
2016-10-27 15:35:00 -07:00
James R. Barlow
e20346032d leptonica: add color testing functions for future experiments 2016-10-27 14:49:49 -07:00
James R. Barlow
693a27d76c leptonica: add iPython display hook and equality test 2016-10-26 14:44:41 -07:00
James R. Barlow
203966d86b leptonica: fix Pillow conversion for 1-bit and 8-bit gray images 2016-10-26 13:10:13 -07:00
James R. Barlow
7eca8508fd Implement new preprocessing feature, background removal 2016-10-14 17:23:34 -07:00
James R. Barlow
b85270df1c Merge branch 'master' into develop 2016-10-14 15:56:58 -07:00
James R. Barlow
aff597cef4 v4.2.5: update release notes, fix silly typo in pageinfo.py v4.2.5 2016-10-13 13:26:39 -07:00
James R. Barlow
61b05b3dee Fix issue: BitsPerComponent is an optional field, sometimes omitted 2016-10-13 13:15:27 -07:00
Julian Kahnert
453c4ef602 Update README.rst (#98)
`brew install tesseract` just installed the english language pack not French, German or Spanish
2016-10-12 11:20:58 -07:00
James R. Barlow
cf4b04f92d The main 'quick' test should be a file that OCRs to recognizable text 2016-10-07 16:25:34 -07:00
James R. Barlow
06c6999987 Merge commit '07891d994aab92e7a14aebe1ac509aab2d4f170c' 2016-10-07 12:45:56 -07:00
James R. Barlow
013c5a369f Replace redacted file with an OCR-able file 2016-10-07 12:45:22 -07:00
James R. Barlow
07891d994a Replace redacted file with an OCR-able file 2016-10-07 12:44:49 -07:00
James R. Barlow
6baf8668a6 Replace with non-free file milk.pdf with free equivalent 2016-10-06 13:10:28 -07:00
James R. Barlow
4ba2962c56 Comment on non-free files 2016-10-05 16:48:16 -07:00
James R. Barlow
7ad92f5db4 Merge branch 'master' of https://github.com/jbarlow83/OCRmyPDF 2016-10-05 16:39:00 -07:00
James R. Barlow
4dad09cc91 resources/README: replace the other large table with a list table 2016-10-05 16:38:51 -07:00
Sean Whitton
7b2e0c7a7a also exclude .git in pytest.ini (#94) 2016-09-15 08:56:14 -07:00
Sean Whitton
7f08f15fc9 pytest skipif for milk.pdf test (#95)
Skip the test if the fair use restricted milk.pdf is not present.
2016-09-15 08:55:31 -07:00
James R. Barlow
825c0f8b2a Note that milk.pdf is non-free, start using list-tables 2016-09-10 14:44:00 -07:00
James R. Barlow
dbe880bc41 Update tesseract supported languages 2016-09-09 12:55:07 -07:00
James R. Barlow
2ec516b6ff leptonica: learn a few new tricks
Found some interesting options for background norm.
2016-09-09 12:54:36 -07:00
James R. Barlow
7942a01e50 leptonica: This is not a Py2 module anymore 2016-09-08 20:59:38 -07:00
James R. Barlow
df684f9344 Update tesseract supported languages 2016-09-08 15:52:25 -07:00
James R. Barlow
ae16e95e42 leptonica: scale should be a tuple for consistency 2016-09-08 11:38:10 -07:00
James R. Barlow
220f1ce161 tasks.py: stop tracking this file for now
This helper script is still in development and needs to be changed each
release, which breaks the release.

It shouldn't be in MANIFEST.in at all because it's not part of a
distribution.
2016-09-04 10:55:57 -07:00
James R. Barlow
c62a8a97c9 v4.2.4 release notes v4.2.4 2016-09-01 21:33:38 -07:00
James R. Barlow
f8a1136979 tasks: show logging info 2016-09-01 21:24:13 -07:00
James R. Barlow
9ca29c787b Update description of masks.pdf to reflect what it actually tests 2016-09-01 21:21:14 -07:00
James R. Barlow
6af748a251 pageinfo: regression - didn't add inline images to list 2016-09-01 15:27:51 -07:00
James R. Barlow
9041867f86 pageinfo: exclude images from DPI calculation if drawn at stack depth 0
More thorough testing showed that Acrobat do not presume that images
fill the page if the CTM is unspecified, as tests/resources/masks.pdf
seems to want.  Instead they treat it literally and draw the image
as 1x1 PDF units or 1/72" square in the bottom left corner of the page.

Seems like the best thing to do is ignore any such images for the purpose
of DPI calculation.  masks.pdf still works out okay because it has
other images.

For more robustness we could consider invalidating any DPI above some
limit, or warning the user about these microdot thumbnails.
2016-09-01 14:23:31 -07:00
James R. Barlow
04099b087c pageinfo: handle stencil masks when stack depth > 0 2016-09-01 14:03:30 -07:00
James R. Barlow
6d6234714c tasks: fix logic error and make magic numbers disappear 2016-09-01 14:03:08 -07:00
James R. Barlow
520be23481 Add release helper script v4.2.3 2016-08-31 20:33:04 -07:00
James R. Barlow
346c3c8dd3 Start tracking development requirements 2016-08-31 20:31:31 -07:00
James R. Barlow
bd534c3313 main.py -> __main__.py
Executing a package with python -m packagename will check for
__main__.py inside the package.  In other words main.py should have
always been named __main__.py.

In the unlikely event that someone depends on "import ocrmypdf.main"
being meaningful, main.py continues to exist and replicates the
behavior of __main__.  (It's unlikely because import ocrmypdf.main does
unpythonic ruffus-related things at things import time, essentially
configuring itself to work with sys.argv.  To fix another day.)

This should solve the problem of Debian needing to run test suites
before installation and afterwards for continuous integration without
having to patch either file, as python -m ocrmypdf will follow import
order.  That is, if the current directory contains "ocrmypdf/" (e.g.
staging a new version) then that will be tested, else sys.path will
be checked.
2016-08-31 17:01:42 -07:00
James R. Barlow
2625368aed link: more MANIFEST.in tweaks 2016-08-31 16:28:39 -07:00
James R. Barlow
8ac94879f1 lint: no need to check for DEVNULL; all supported versions have it 2016-08-31 16:28:18 -07:00
James R. Barlow
dd8c0f3756 Merge branch 'master' of https://github.com/jbarlow83/OCRmyPDF 2016-08-31 13:19:46 -07:00
James R. Barlow
010f353a5e v4.2.3 release notes 2016-08-31 13:19:27 -07:00
James R. Barlow
e0a18edb92 Fix MANIFEST.in, as Python packages require 2016-08-31 13:19:17 -07:00
James R. Barlow
c6f2eea058 Reinstate OCRmyPDF.sh with a deprecation warning 2016-08-31 11:57:02 -07:00
James R. Barlow
bf89e38c69 Add milk.pdf test case 2016-08-31 11:42:21 -07:00
jbarlow83
e1f0640d42 Create issue template 2016-08-31 11:26:29 -07:00
James R. Barlow
71b54035ba Bug fix issue #89: trying to perform arithmetic on IndirectObject
TypeError: bad operand type for unary -: 'IndirectObject'
2016-08-31 10:25:58 -07:00
James R. Barlow
325cc0beca Allow test cases to run without installing first
As @spwhitton found:

The test suite needs to call "python3 -m ocrmypdf.main" instead of
just "ocrmypdf" because this /usr/bin/ocrmypdf script has not yet been
generated when dh runs the test suite.

---

Seems reasonable to perform in-place testing independent of installation.

Source:
https://sources.debian.net/src/ocrmypdf/4.2.1%2Bgit.20160824.1.5d67cc7-1/debian/patches/0001-patch-test-suite-executable.patch/
2016-08-26 15:23:26 -07:00
James R. Barlow
1a9f09c4d5 Remove OCRmyPDF.sh and its usage in all test cases 2016-08-26 15:18:38 -07:00
James R. Barlow
4fed4e2af3 tests: don't try to pass Unicode arguments on command line on Linux
Depends on locale being configured properly, and it's not necessary
to be able to do this.
2016-08-26 15:08:56 -07:00