Update docs for eventual v4.4 release

This commit is contained in:
James R. Barlow
2017-01-26 12:29:11 -08:00
parent bad67c6dc5
commit 467b7f0163
4 changed files with 20 additions and 2 deletions

View File

@@ -3,6 +3,23 @@ RELEASE NOTES
OCRmyPDF uses `semantic versioning <http://semver.org/>`_.
v4.4:
=====
- Tesseract 4.00 is now supported on an experimental basis.
+ A new rendering option ``--pdf-renderer tess4`` exploits Tesseract 4's new text-only output PDF mode. See the documentation on PDF Renderers for details.
+ The ``--tesseract-oem`` argument allows control over the Tesseract 4 OCR
engine mode.
+ Fixed poor performance with Tesseract 4.00 on Linux
- Fixed an issue that caused corruption of output to stdout in some cases
- Removed test for Pillow JPEG and PNG support, as the minimum supported version of Pillow now enforces this
- Significant code reorganization to make OCRmyPDF re-entrant and improve performance. All changes should be backward compatible for the v4.x series.
+ However, OCRmyPDF's dependency "ruffus" is not re-entrant, so no Python API is available. Scripts should continue to use the command line interface.
v4.3.5:
=======

View File

@@ -20,6 +20,7 @@ Contents:
installation
languages
cookbook
renderers
security
errors

View File

@@ -262,7 +262,7 @@ where /c/Users/sampleuser is a Unix representation of the Windows path C:\\Users
Installing HEAD revision from sources
-------------------------------------
If you have ``git`` and ``python3.4`` or ``python3.5`` installed, you can install from source. When the ``pip`` installer runs,
If you have ``git`` and Python 3.4 or newer installed, you can install from source. When the ``pip`` installer runs,
it will alert you if dependencies are missing.
To install the HEAD revision from sources in the current Python 3 environment:

View File

@@ -14,7 +14,7 @@ PDF is a rich, complex file format. The official PDF 1.7 specification, ISO 3200
In short, PDFs `may contain viruses <https://security.stackexchange.com/questions/64052/can-a-pdf-file-contain-a-virus>`_.
This `article <https://theinvisiblethings.blogspot.ca/2013/02/converting-untrusted-pdfs-into-trusted.html>`_ describes a method which allows potentially hostile PDFs to be viewed and rasterized safely in a disposable virtual machine. A trusted PDF created in this manner is converted to images and loses all information making it searchable. OCRmyPDF could be used restore searchability.
This `article <https://theinvisiblethings.blogspot.ca/2013/02/converting-untrusted-pdfs-into-trusted.html>`_ describes a high-paranoia method which allows potentially hostile PDFs to be viewed and rasterized safely in a disposable virtual machine. A trusted PDF created in this manner is converted to images and loses all information making it searchable and losing all compression. OCRmyPDF could be used restore searchability.
How OCRmyPDF processes PDFs
---------------------------