OCRmyPDF

mirror of https://github.com/ocrmypdf/OCRmyPDF.git synced 2026-05-04 20:54:18 -04:00

Author	SHA1	Message	Date
James R. Barlow	ff16a00a3d	Remove test for Pillow JPEG and PNG As of 3.1.1, our minimum version, these codecs are now required by default for a successful installation, effectively solving the problem of Pillow installed without libjpeg/libpng.	2016-12-03 14:25:46 -08:00
James R. Barlow	8982b3e1e2	Update requirements -update requirements.txt and dev_requirements.txt to more recent version -setup.py updated to Ubuntu 14.04 rather than 12.04 backports -request at least Pillow 3.1.1 now (since this makes jpeg/png mandatory)	2016-12-03 14:14:07 -08:00
James R. Barlow	be0fa35d14	Merge branch 'master' into feature/ooruffus	2016-12-03 14:02:43 -08:00
James R. Barlow	9f51ed9d01	Finalize v4.3.3 release notes v4.3.3	2016-12-03 00:39:24 -08:00
James R. Barlow	731e6792c7	Add test cases for Ghostscript PDF/A warnings	2016-12-03 00:32:09 -08:00
James R. Barlow	c35ec0b4aa	ghostscript: more effort at error logging	2016-12-03 00:22:03 -08:00
James R. Barlow	03aaf575dc	v4.3.3 release notes, fix more gs 9.20 issues	2016-12-02 16:26:34 -08:00
James R. Barlow	9a060579ba	Move work_folder into multiprocessing manager	2016-12-02 01:39:17 -08:00
James R. Barlow	d40a5c4f7a	Remove all remaining traces of ‘options’ global state from task runners	2016-12-02 01:31:57 -08:00
James R. Barlow	21f7dc3377	Distribute ‘options’ to worker processes via the multiprocessing manager	2016-12-02 01:06:11 -08:00
James R. Barlow	43c13a1ed9	Replace pdfinfo, pdfinfo_lock with multiprocessing manager Using a context manager to guard the pdfinfo list makes the lock unnecessary. (Although it was probably unnecessary in the first place anyway.)	2016-12-01 23:36:30 -08:00
James R. Barlow	6bc3f189e1	Remove “WrappedLogger” - does not do anything useful Never really investigated the reason why ruffus returns a mutex to go along with its logger. It seems that the mutex is only needed if one wanted to make multiple successive calls to a log function and have them appear appear atomically. It is not needed to protect the logger proxy because accessing the proxy triggers IPC in the child process that handles the multiprocessing.Manager() object. The logging wrapper only logs one line at a time, so the mutex does not actually protect logging sequence. Cut it. Also manager.Lock() returns a threading.Lock object so the purpose of it is actually to help processes share a thread-level lock. It would be more appropriate to use a semaphore based multiprocessing.Lock.	2016-12-01 15:27:07 -08:00
James R. Barlow	2c5437135c	Remove temporary re_symlink logging shim	2016-12-01 00:31:42 -08:00
James R. Barlow	444da02523	Fix mistake made in converting pipeline; incredibly, all tests pass now	2016-12-01 00:30:19 -08:00
James R. Barlow	00e8af2381	Reactivate the pipeline; surprisingly works in quick test	2016-12-01 00:03:03 -08:00
James R. Barlow	401b21864f	Convert to object oriented ruffus syntax (does not run) I experimented with the idea of using asyncio-based processing but realized that that does not solve the import time binding problem that is the real issue. Therefore the simpler refactoring is to convert to ruffus-oo syntax and get things working again. build_pipeline() is really ugly at the moment. The old syntax had its advantages. This test reproduces the complete pipeline graph but does not work otherwise.	2016-11-30 23:58:26 -08:00
James R. Barlow	de939951d4	Record version in debug log	2016-11-29 15:30:50 -08:00
James R. Barlow	7725d16a26	Fix exception on inline stencil masks with no /CS attribute	2016-11-24 22:37:00 -08:00
James R. Barlow	8a74408d83	Add security suggestions	2016-11-21 20:58:31 -08:00
James R. Barlow	3d0dc95a06	Moved venvs	2016-11-21 20:40:22 -08:00
James R. Barlow	04a57a3cc2	OS X -> macOS	2016-11-21 20:40:06 -08:00
James R. Barlow	d0c22ce01d	v4.3.2 release notes v4.3.2	2016-11-10 23:16:08 -08:00
James R. Barlow	23c95e9660	ghostscript: elide overprinting to fix PDF/A errors in GS 9.20 It looks like GS 9.19 can incorrectly set overprinting for the text layer even though this makes no sense in PDF/A, or at least someone produced PDFs that have this after a Tesseract PDF -> GS PDF/A conversion. GS 9.20 complains about this. Instead of aborting, elide the feature. See http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=094d5a1880f1cb9ed320ca9353eb69436e09b594 and issue #107. It looks like it is better to elide features and warn about elision rather than abort with an error.	2016-11-10 14:48:02 -08:00
James R. Barlow	eecab9b95d	pdfa: fix KeyError on pdfa_dict if document has some xmp metadata but not exactly what we’re looking for	2016-11-09 05:41:12 -08:00
James R. Barlow	8abc2f113c	Merge branch 'develop' v4.3.1	2016-11-07 14:36:50 -08:00
James R. Barlow	949d2ff1c2	v4.3.1 release notes	2016-11-07 14:36:08 -08:00
James R. Barlow	1c8b763d53	test_pageinfo: Remove bits per component test The behavior of this test will ultimately depend on what version of img2pdf is installed, since after my patch it will be able to produce 1bpp images.	2016-11-07 14:35:54 -08:00
James R. Barlow	bb91393b85	Fix “deskew-rotate” bug. Turns out this occurred in any case where pdf-renderer hocr was used and a tesseract timeout or error occurred. We created a replacement page based on the unrotated page dimensions instead of the input image’s dimensions.	2016-11-07 14:17:31 -08:00
James R. Barlow	cc9c0d819e	Add test case for documents that get rotated incorrectly after deskew	2016-11-07 14:15:03 -08:00
James R. Barlow	a72b8caf47	Update documentation on other languages, multilingual documents	2016-11-07 14:14:06 -08:00
James R. Barlow	fdd9b8b8ce	Optimize some of the test resources to reduce file sizes Mostly by reducing RGB -> monochrome and applying JBIG2 compression	2016-11-07 14:01:23 -08:00
James R. Barlow	c096b4ca8c	Make debug dump of pageinfo at the end of processing readable	2016-11-04 02:23:02 -07:00
James R. Barlow	427add3008	Add @posttask debug hooks	2016-11-03 18:15:21 -07:00
James R. Barlow	c45871700d	Fix bug: LeptonicaErrorTrap() leaks file handles	2016-11-03 15:51:27 -07:00
Sean Whitton	6821e8eeb2	disable mathjax sphinx extension (#103 ) Mathjax isn't actually needed for OCRmyPDF's docs, but enabling this extension causes the browser to download a copy of mathjax.js from cdn.mathjax.org anyway. I have to disable this for the offline docs bundled with Debian, but since you're not using mathjax, it would be nice to have the diff merged upstream.	2016-11-01 21:56:57 -07:00
James R. Barlow	a4f07756a5	tesseract caching: don't transcode tesseract's output, hash source file For sanity's sake, deal with tesseract streams in binary without transcoding (via universal_newlines, etc.). The only differences are printing messages regarding spoofing. Also hash the source file so that changes to the cache mechanism invalidate old cache automatically. That is probably too aggressive, but simple and safer than the previous approach.	2016-10-28 16:44:12 -07:00
James R. Barlow	f24fb0e0c5	Obligatory MANIFEST.in repair v4.3	2016-10-28 01:28:46 -07:00
James R. Barlow	73b88a0a6f	More work on documentation	2016-10-28 01:22:40 -07:00
James R. Barlow	c42f39e2d4	Update README to point to ReadTheDocs	2016-10-28 00:33:17 -07:00
James R. Barlow	5e5fe3175f	docs: OS X -> macOS branding change	2016-10-28 00:32:57 -07:00
James R. Barlow	cab65d1f11	pageinfo: add a python3.4 implementation of isclose()	2016-10-28 00:31:04 -07:00
James R. Barlow	245f05d5f4	docs: allow python setup.py install --force to bypass checks ReadTheDocs needs this.	2016-10-28 00:07:26 -07:00
James R. Barlow	dda751f9e3	Merge branch 'feature/docs' into develop # Conflicts: # ocrmypdf/__main__.py	2016-10-27 23:50:08 -07:00
James R. Barlow	3d37ae988a	Update release notes for 4.3	2016-10-27 23:48:12 -07:00
James R. Barlow	717acd9855	Prevent dumping binary PDFs to stdout	2016-10-27 16:20:53 -07:00
James R. Barlow	2e4431cc63	Allow piping output to stdout	2016-10-27 16:14:42 -07:00
James R. Barlow	f7387b0859	test_stdin: simplify this test No need to involve 'cat', just hook the file up to stdin.	2016-10-27 16:01:07 -07:00
James R. Barlow	a09f6b8977	Test cases: check that stdout is clear of output To ensure piping to stdout is possible.	2016-10-27 15:58:24 -07:00
James R. Barlow	d63449c214	main: don't print output file location to stdout, use stderr	2016-10-27 15:57:33 -07:00
James R. Barlow	a86805f0d9	Remove possibly non-free page from "multipage.pdf"	2016-10-27 15:56:43 -07:00

1 2 3 4 5 ...

896 Commits