Merge commit '68cf9cbd87c188823027f9d1bfe9029017e7281f' into develop

2026-05-24 06:25:26 -04:00 · 2016-07-17 00:29:48 -07:00
parent 410111d6fb 68cf9cbd87
commit c02dbc809a
2 changed files with 90 additions and 31 deletions
--- a/README.rst
+++ b/README.rst
@@ -54,6 +54,8 @@ Debian and Ubuntu
 Users of Debian 9 or later or Ubuntu 16.10 or later may simply
 ``apt-get install ocrmypdf``.

+.. _Docker:
+
 Installing the Docker image
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~

@@ -63,11 +65,15 @@ If you have `Docker <https://docs.docker.com/>`__ installed on your system, you
 a Docker image of the latest release.

 Follow the Docker installation instructions for your platform.  If you can run this command
-successfully, your system is ready to download and execute the image::
+successfully, your system is ready to download and execute the image:
+
+.. code-block:: bash

   docker run hello-world
   
-OCRmyPDF will use all available CPU cores.  By default, the VirtualBox machine instance on Windows and OS X has only a single CPU core enabled. Use the VirtualBox Manager to determine the name of your Docker engine host, and then follow these optional steps to enable multiple CPUs::
+OCRmyPDF will use all available CPU cores.  By default, the VirtualBox machine instance on Windows and OS X has only a single CPU core enabled. Use the VirtualBox Manager to determine the name of your Docker engine host, and then follow these optional steps to enable multiple CPUs:
+
+.. code-block:: bash

   # Optional step for Mac OS X users
   docker-machine stop "yourVM"
@@ -76,29 +82,41 @@ OCRmyPDF will use all available CPU cores.  By default, the VirtualBox machine i
   eval $(docker-machine env "yourVM")

 Assuming you have a Docker engine running somewhere, you can run these commands to download
-the image::
+the image:
+
+.. code-block:: bash

   docker pull jbarlow83/ocrmypdf

-Then tag it to give a more convenient name, just ocrmypdf::
+Then tag it to give a more convenient name, just ocrmypdf:
+
+.. code-block:: bash

   docker tag jbarlow83/ocrmypdf ocrmypdf

-This image contains language packs for English, French, Spanish and German. The alternative "polyglot" image provides `all available language packs <https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages>`__::
+This image contains language packs for English, French, Spanish and German. The alternative "polyglot" image provides `all available language packs <https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages>`__:
+
+.. code-block:: bash

   # Alternative step: If you need all language packs
   docker pull jbarlow83/ocrmypdf-polyglot
   docker tag jbarlow83/ocrmypdf-polyglot ocrmypdf

-You can then run ocrmypdf using the command::
+You can then run ocrmypdf using the command:
+
+.. code-block:: bash

   docker run ocrmypdf --help
  
-To execute the OCRmyPDF on a local file, you must `provide a writable volume to the Docker image <https://docs.docker.com/userguide/dockervolumes/>`__, such as this in this template::
+To execute the OCRmyPDF on a local file, you must `provide a writable volume to the Docker image <https://docs.docker.com/userguide/dockervolumes/>`__, such as this in this template:
+
+.. code-block:: bash

   docker run -v "$(pwd):/home/docker" <other docker arguments>   ocrmypdf <your arguments to ocrmypdf>

-In this worked example, the current working directory contains an input file called ``test.pdf`` and the output will go to ``output.pdf``:: 
+In this worked example, the current working directory contains an input file called ``test.pdf`` and the output will go to ``output.pdf``: 
+
+.. code-block:: bash

   docker run -v "$(pwd):/home/docker"   ocrmypdf --skip-text test.pdf output.pdf

@@ -112,11 +130,15 @@ These instructions probably work on all Mac OS X versions later than 10.7 (Lion)

 If it's not already present, `install Homebrew <http://brew.sh/>`__.

-Update Homebrew::
+Update Homebrew:
+
+.. code-block:: bash

   brew update
   
-Install or upgrade the required Homebrew packages, if any are missing::
+Install or upgrade the required Homebrew packages, if any are missing:
+
+.. code-block:: bash

   brew install libpng openjpeg jbig2dec     # image libraries
   brew install qpdf
@@ -126,16 +148,22 @@ Install or upgrade the required Homebrew packages, if any are missing::
   brew install unpaper    # optional
   brew install tesseract
   
-Update the homebrew pip and install Pillow::
+Update the homebrew pip and install Pillow:
+
+.. code-block:: bash

   pip3 install --upgrade pip
   pip3 install --upgrade pillow

-You can then install OCRmyPDF from PyPI::
+You can then install OCRmyPDF from PyPI:
+
+.. code-block:: bash

   pip3 install ocrmypdf

-The command line program should now be available::
+The command line program should now be available:
+
+.. code-block:: bash

   ocrmypdf --help

@@ -144,12 +172,16 @@ Installing on Ubuntu 14.04 LTS

 Installing on Ubuntu 14.04 LTS (trusty) is more difficult than other options, because of certain bugs in Python package installation.

-Update apt-get::
+Update apt-get:
+
+.. code-block:: bash

   sudo apt-get update
   sudo apt-get upgrade
   
-Install system dependencies::
+Install system dependencies:
+
+.. code-block:: bash

   sudo apt-get install \
      zlib1g-dev \
@@ -165,13 +197,17 @@ Install system dependencies::
      python3-reportlab

 If you wish install OCRmyPDF to the system Python, then install as follows (note this installs new packages
-into your system Python, which could interfere with other programs)::
+into your system Python, which could interfere with other programs):
+
+.. code-block:: bash

   sudo pip3 install ocrmypdf
   
 If you wish to install OCRmyPDF to a virtual environment to isolate system Python from modified, you can
 follow these steps.  This includes a workaround `for a known, unresolved issue in Ubuntu 14.04's ensurepip
-package <http://www.thefourtheye.in/2014/12/Python-venv-problem-with-ensurepip-in-Ubuntu.html>`__::
+package <http://www.thefourtheye.in/2014/12/Python-venv-problem-with-ensurepip-in-Ubuntu.html>`__:
+
+.. code-block:: bash

   sudo apt-get install python3-venv
   python3 -m venv venv-ocrmypdf --without-pip
@@ -187,14 +223,18 @@ Ubuntu 14.04 only installs ``unpaper`` version 0.4.2, which is not supported by
 Installing on Windows
 ~~~~~~~~~~~~~~~~~~~~~

-Direct installation on Windows is not possible.  Install the Docker container as described above.  Ensure that your command prompt can run the docker "hello world" container.
+Direct installation on Windows is not possible.  Install the _`Docker` container as described above.  Ensure that your command prompt can run the docker "hello world" container.

-The command line syntax to run ocrmypdf from a command prompt will resemble::
+Running on Windows
+~~~~~~~~~~~~~~~~~~
+
+The command line syntax to run ocrmypdf from a command prompt will resemble:
+
+.. code-block:: bat

   docker run -v /c/Users/sampleuser:/home/docker ocrmypdf --skip-text test.pdf output.pdf

-where /c/Users/sampleuser is a Unix representation of the Windows path C:\Users\sampleuser, assuming a user named "sampleuser" is running ocrmypdf on a file in their home directory, and the files "test.pdf" and "output.pdf" are in the sampleuser folder. The Windows user must have read and write permissions.
-
+where /c/Users/sampleuser is a Unix representation of the Windows path C:\\Users\\sampleuser, assuming a user named "sampleuser" is running ocrmypdf on a file in their home directory, and the files "test.pdf" and "output.pdf" are in the sampleuser folder. The Windows user must have read and write permissions.
      
 Installing HEAD revision from sources
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -202,21 +242,29 @@ Installing HEAD revision from sources
 If you have ``git`` and ``python3.4`` or ``python3.5`` installed, you can install from source. When the ``pip`` installer runs,
 it will alert you if dependencies are missing.

-To install the HEAD revision from sources in the current Python 3 environment::
+To install the HEAD revision from sources in the current Python 3 environment:
+
+.. code-block:: bash

   pip3 install git+https://github.com/jbarlow83/OCRmyPDF.git

-Or, to install in `development mode <https://pythonhosted.org/setuptools/setuptools.html#development-mode>`__,  allowing customization of OCRmyPDF, use the ``-e`` flag::
+Or, to install in `development mode <https://pythonhosted.org/setuptools/setuptools.html#development-mode>`__,  allowing customization of OCRmyPDF, use the ``-e`` flag:
+
+.. code-block:: bash

   pip3 install -e git+https://github.com/jbarlow83/OCRmyPDF.git
   
 On certain Linux distributions such as Ubuntu, you may need to use 
-run the install command as superuser::
+run the install command as superuser:
+
+.. code-block:: bash

   sudo pip3 install [-e] git+https://github.com/jbarlow83/OCRmyPDF.git
   
 Note that this will alter your system's Python distribution. If you prefer 
-to not install as superuser, you can install the package in a Python virtual environment::
+to not install as superuser, you can install the package in a Python virtual environment:
+
+.. code-block:: bash

   git clone -b master https://github.com/jbarlow83/OCRmyPDF.git
   pyvenv venv
@@ -227,7 +275,9 @@ to not install as superuser, you can install the package in a Python virtual env
 However, ``ocrmypdf`` will only be accessible on the system PATH after
 you activate the virtual environment.

-To run the program::
+To run the program:
+
+.. code-block:: bash
   
   ocrmypdf --help

@@ -240,7 +290,9 @@ Languages
 ---------

 OCRmyPDF uses Tesseract for OCR, and relies on its language packs. For Linux users,
-you can often find packages that provide language packs::
+you can often find packages that provide language packs:
+
+.. code-block:: bash

   # Debian/Ubuntu users
   sudo apt-get install tesseract-ocr-chi-sim
@@ -251,9 +303,15 @@ languages can be requested.
 Support
 -------

-In case you detect an issue, please:
+Once ocrmypdf is installed, the built-in help which explains the command syntax and options can be accessed via:

-  Check if your issue is already known
+.. code-block:: bash
+
+   ocrmypdf --help
+
+If you detect an issue, please:
+
+-  Check whether your issue is already known
 -  If no problem report exists on github, please create one here:
   https://github.com/jbarlow83/OCRmyPDF/issues
 -  Describe your problem thoroughly
--- a/tests/resources/README.rst
+++ b/tests/resources/README.rst
@@ -47,12 +47,13 @@ These test resources are assemblies from other previously mentioned files, relea

 - cardinal.pdf (four cardinal directions, rotated copies of LinnSequencer.jpg)
 - ccitt.pdf (LinnSequencer.jpg, converted to CCITT encoding)
+- encrypted_algo4.pdf (congress.jpg, encrypted with algorithm 4 - not supported by PyPDF2)
 - graph_ocred.pdf (from graph.pdf)
 - jbig2.pdf (congress.jpg, converted to JBIG2 encoding)
 - multipage.pdf (from several other files)
 - palette.pdf (congress.jpg, converted to a 256-color palette)
 - skew.pdf (from c02-22.pdf)
- skew-encrypted.pdf (skew.pdf with encrypted applied)
+- skew-encrypted.pdf (skew.pdf with encryption - access supported by PyPDF2)


 .. _`Wikimedia: LinnSequencer`: https://upload.wikimedia.org/wikipedia/en/b/b7/LinnSequencer_hardware_MIDI_sequencer_brochure_page_2_300dpi.jpg
@@ -63,4 +64,4 @@ These test resources are assemblies from other previously mentioned files, relea

 .. _`Wikimedia: Pandas text analysis.png`: https://en.wikipedia.org/wiki/File:Pandas_text_analysis.png

-.. _`Wikimedia: JPEG2000 Lichtenstein`: https://en.wikipedia.org/wiki/JPEG_2000#/media/File:Jpeg2000_2-level_wavelet_transform-lichtenstein.png
+.. _`Wikimedia: JPEG2000 Lichtenstein`: https://en.wikipedia.org/wiki/JPEG_2000#/media/File:Jpeg2000_2-level_wavelet_transform-lichtenstein.png