mirror of
https://github.com/ocrmypdf/OCRmyPDF.git
synced 2026-05-04 12:48:02 -04:00
Update docs for eventual v4.4 release
This commit is contained in:
@@ -3,6 +3,23 @@ RELEASE NOTES
|
||||
|
||||
OCRmyPDF uses `semantic versioning <http://semver.org/>`_.
|
||||
|
||||
v4.4:
|
||||
=====
|
||||
|
||||
- Tesseract 4.00 is now supported on an experimental basis.
|
||||
|
||||
+ A new rendering option ``--pdf-renderer tess4`` exploits Tesseract 4's new text-only output PDF mode. See the documentation on PDF Renderers for details.
|
||||
+ The ``--tesseract-oem`` argument allows control over the Tesseract 4 OCR
|
||||
engine mode.
|
||||
+ Fixed poor performance with Tesseract 4.00 on Linux
|
||||
|
||||
- Fixed an issue that caused corruption of output to stdout in some cases
|
||||
- Removed test for Pillow JPEG and PNG support, as the minimum supported version of Pillow now enforces this
|
||||
- Significant code reorganization to make OCRmyPDF re-entrant and improve performance. All changes should be backward compatible for the v4.x series.
|
||||
|
||||
+ However, OCRmyPDF's dependency "ruffus" is not re-entrant, so no Python API is available. Scripts should continue to use the command line interface.
|
||||
|
||||
|
||||
v4.3.5:
|
||||
=======
|
||||
|
||||
|
||||
@@ -20,6 +20,7 @@ Contents:
|
||||
installation
|
||||
languages
|
||||
cookbook
|
||||
renderers
|
||||
security
|
||||
errors
|
||||
|
||||
|
||||
@@ -262,7 +262,7 @@ where /c/Users/sampleuser is a Unix representation of the Windows path C:\\Users
|
||||
Installing HEAD revision from sources
|
||||
-------------------------------------
|
||||
|
||||
If you have ``git`` and ``python3.4`` or ``python3.5`` installed, you can install from source. When the ``pip`` installer runs,
|
||||
If you have ``git`` and Python 3.4 or newer installed, you can install from source. When the ``pip`` installer runs,
|
||||
it will alert you if dependencies are missing.
|
||||
|
||||
To install the HEAD revision from sources in the current Python 3 environment:
|
||||
|
||||
@@ -14,7 +14,7 @@ PDF is a rich, complex file format. The official PDF 1.7 specification, ISO 3200
|
||||
|
||||
In short, PDFs `may contain viruses <https://security.stackexchange.com/questions/64052/can-a-pdf-file-contain-a-virus>`_.
|
||||
|
||||
This `article <https://theinvisiblethings.blogspot.ca/2013/02/converting-untrusted-pdfs-into-trusted.html>`_ describes a method which allows potentially hostile PDFs to be viewed and rasterized safely in a disposable virtual machine. A trusted PDF created in this manner is converted to images and loses all information making it searchable. OCRmyPDF could be used restore searchability.
|
||||
This `article <https://theinvisiblethings.blogspot.ca/2013/02/converting-untrusted-pdfs-into-trusted.html>`_ describes a high-paranoia method which allows potentially hostile PDFs to be viewed and rasterized safely in a disposable virtual machine. A trusted PDF created in this manner is converted to images and loses all information making it searchable and losing all compression. OCRmyPDF could be used restore searchability.
|
||||
|
||||
How OCRmyPDF processes PDFs
|
||||
---------------------------
|
||||
|
||||
Reference in New Issue
Block a user