From d2d39de92f129a8cacabfd453584eddaac40477d Mon Sep 17 00:00:00 2001 From: mara004 <65915611+mara004@users.noreply.github.com> Date: Mon, 5 Jul 2021 23:12:51 +0200 Subject: [PATCH 1/2] Update api.rst (#797) fix excess 'no' --- docs/api.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/api.rst b/docs/api.rst index 3459e8b8..5da9cf4d 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -44,7 +44,7 @@ execution. To do this, it will: The Python process that calls ``ocrmypdf.ocr()`` must be sufficiently privileged to perform these actions. -There is no currently no option to manage how jobs are scheduled other +There currently is no option to manage how jobs are scheduled other than the argument ``jobs=`` which will limit the number of worker processes. From de74b80335a6e54eed1bddeb223a9c73a9dd05e3 Mon Sep 17 00:00:00 2001 From: mara004 <65915611+mara004@users.noreply.github.com> Date: Wed, 14 Jul 2021 09:48:00 +0200 Subject: [PATCH 2/2] docs: Fix two missing words (#805) * Fix two missing words --- docs/pdfsecurity.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/pdfsecurity.rst b/docs/pdfsecurity.rst index 4885ab03..9288e311 100644 --- a/docs/pdfsecurity.rst +++ b/docs/pdfsecurity.rst @@ -19,7 +19,7 @@ PDF is a rich, complex file format. The official PDF 1.7 specification, ISO 32000:2008, is hundreds of pages long and references several annexes each of which are similar in length. PDFs can contain video, audio, XML, JavaScript and other programming, and forms. In some cases, they can -open internet connections to pre-selected URLs. All of these possible +open internet connections to pre-selected URLs. All of these are possible attack vectors. In short, PDFs `may contain @@ -31,7 +31,7 @@ describes a high-paranoia method which allows potentially hostile PDFs to be viewed and rasterized safely in a disposable virtual machine. A trusted PDF created in this manner is converted to images and loses all information making it searchable and losing all compression. OCRmyPDF -could be used restore searchability. +could be used to restore searchability. How OCRmyPDF processes PDFs =========================== @@ -66,8 +66,8 @@ service. OCRmyPDF relies on Ghostscript, and therefore, if deployed online one should be prepared to comply with Ghostscript's Affero GPL license, and any other licenses. -Setting aside these concerns, a side effect of OCRmyPDF is it may -incidentally sanitize PDFs that contain certain types of malware. It +Setting aside these concerns, a side effect of OCRmyPDF is that it may +incidentally sanitize PDFs containing certain types of malware. It repairs the PDF with pikepdf/libqpdf, which could correct malformed PDF structures that are part of an attack. When PDF/A output is selected (the default), the input PDF is partially reconstructed by Ghostscript. @@ -83,7 +83,7 @@ Limiting CPU usage OCRmyPDF will attempt to use all available CPUs and storage, so executing ``nice ocrmypdf`` or limiting the number of jobs with the ``-j`` argument may ensure the server remains available. Another option -would be run OCRmyPDF jobs inside a Docker container, a virtual machine, +would be to run OCRmyPDF jobs inside a Docker container, a virtual machine, or a cloud instance, which can impose its own limits on CPU usage and be terminated "from orbit" if it fails to complete.