Commit Graph

3808 Commits

Author SHA1 Message Date
James R. Barlow
397fad249d v16.3.0 release notes v16.3.0 2024-05-19 00:50:59 -07:00
James R. Barlow
9a3c5a3f7c Add progressbar for metadata_fixup
Might take time for big files. Pdf.open() potentially is expensive as well, but QPDF doesn't give us progress feedback for that.

Closes Show progress during postprocessing #1313
2024-05-19 00:46:50 -07:00
James R. Barlow
950c700274 Fix Ghostscript PDF/A progressbar not displaying 2024-05-19 00:44:21 -07:00
James R. Barlow
26432c38a9 Raise exception if rotate pages threshold adjusted without --rotate-pages
Fixes Make usage of --rotate-pages-threshold clearer #1309
2024-05-18 23:49:27 -07:00
James R. Barlow
28be50136c hocr: If a line box's coords are invalid, log and error and don't render
Addresses [Bug]: Crash on multiple .pdf files #1312

Not actually a fix, but at least it will get us better diagnostics. Appears old Tesseract 4.x generates bad line boxes at times.
2024-05-18 23:32:18 -07:00
James R. Barlow
0c62f2de5d Issue template: check for EOL OS 2024-05-17 19:51:15 -07:00
James R. Barlow
5caf654f22 Add new codecov token 2024-05-11 01:03:41 -07:00
James R. Barlow
205593445e Change test to run on macos x64 and arm64 2024-05-11 00:13:08 -07:00
James R. Barlow
f25fb8c63a Merge branch 'main' of github.com:ocrmypdf/OCRmyPDF 2024-05-08 00:39:27 -07:00
James R. Barlow
99c78650b6 Add better error message for PDFs with invalid CTMs
Closes #1303
2024-05-07 14:00:30 -07:00
Ahmed Abdou
08e89e2dbe Adding language install docs for archlinux (#1296)
Adding language install docs for archlinux
2024-04-24 14:46:05 -07:00
James R. Barlow
0e013df161 v16.2.0 release notes v16.2.0 2024-04-16 00:37:03 -07:00
James R. Barlow
9ba4e3ab46 Log unusual exceptions when trying to obtain a version
Fixes #1262
2024-04-07 14:39:08 -07:00
James R. Barlow
5fdcb7602b Make downsampling large images that Tesseract would otherwise error on into default behavior
Fixes #1281
2024-04-07 13:43:20 -07:00
James R. Barlow
b4db1b741f optimize: fix handling of [/FlateDecode none] - type images
Closes #1271
2024-04-07 01:44:08 -07:00
James R. Barlow
7a8cc21e31 Add support for sidecar output to io.BytesIO
Closes #1252
2024-04-07 01:38:55 -07:00
James R. Barlow
0674829d8f Remove tool.black config 2024-04-07 00:36:52 -07:00
James R. Barlow
315aa0474b Merge branch 'main' of github.com:ocrmypdf/OCRmyPDF 2024-04-07 00:34:51 -07:00
Ben Beasley
df3451e779 Update the typer[all] dependency to typer-slim[standard] (#1287)
In 0.12.1, Typer was significantly reorganized.

- `typer-slim` is the library (for `import typer`)
- `typer-slim[standard]` adds optional dependencies (currently `rich`
  and `shellingham`, basically equivalent to the old `typer[all]`)
- `typer-cli` is the `typer` command-line tool
- `typer` is now basically a metapackage that brings in *all of the
  above*, and it no longer has an `all` extra

Pip will warn about this and proceed,

```
WARNING: typer 0.12.1 does not provide the extra 'all'
```

but there are other tools that will fail hard when asked to resolve a
(now) nonexistent extra.

Since this project doesn’t need the `typer` command-line tool, it looks
like changing the dependency to `typer-slim[standard]` is the best way
forward.

See https://typer.tiangolo.com/release-notes/#0121 and
tiangolo/typer#785 for further discussion
and details.
2024-04-07 00:34:34 -07:00
akierig
3ba42802d1 added Macports install information (#1286) 2024-04-07 00:33:57 -07:00
James R. Barlow
d6342cb8c2 Add heif/heic input image support 2024-04-07 00:33:13 -07:00
James R. Barlow
065bddbc6c Reformat with ruff format 2024-04-07 00:25:32 -07:00
James R. Barlow
067f429dde Merge branch 'main' of github.com:ocrmypdf/OCRmyPDF 2024-03-26 15:34:00 -07:00
Daniel Lovegrove
6895c2d70f Fix Broken Documentation Links (#1275)
* Update URL for PDFMARK documentation

For reference, here is a link to the old PDF:
https://web.archive.org/web/20190806035303/https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/pdfmark_reference.pdf

It appears Adobe converted the PDF into a webpage-based document, the
wording seems to almost identical b/w the PDF and the website.

* Fix cross-references to JBIG2 page

* Fix links for Fedora + Arch + HEAD revision install

Fedora 39 has been released, and the package tracker no longer includes
a release overview for Fedora 37 hence why it was removed here.
2024-03-22 14:38:52 -07:00
James R. Barlow
686481982a Fix naming of hOCR rendered files 2024-03-22 13:27:20 -07:00
James R. Barlow
a9e1d19b78 v16.1.2 release notes v16.1.2 2024-03-20 12:56:13 -07:00
James R. Barlow
f95aa63718 Merge branch 'main' of github.com:ocrmypdf/OCRmyPDF 2024-03-20 12:26:02 -07:00
James Barlow
855de287b2 Fix test suite failure with Ghostscript >= 10.3
Ghostscript is more picky about a specific case with SMask that cannot be converted to PDF/A

Details here
4dcfae36bb
2024-03-19 17:20:33 -07:00
NilsRo
feeb9f213f batch example: added archive, small corrections and optimizations (#1277)
* Added archive, small corrections

Added a function to archive originals and avoid calling ocrmypdf if they are still is PDF/A.

* Added Copyright
2024-03-18 13:22:24 -07:00
Emiel Molenaar
e7eb8fa805 Update Dockerfile.alpine (#1268)
Use Alpine 3.19 as base image to ensure we get GhostScript 10.2.1 to eliminate serious regressions that corrupt PDFs with existing text.
2024-03-13 14:49:42 -07:00
James R. Barlow
8a747f005a pixels -> megapixels
Fixes #1265
2024-02-29 15:31:07 -08:00
James R. Barlow
16ab4a8b4e Fix error message about missing Python exec
Message is
unable to start container process: exec: "python": executable file not found in $PATH: unknown.

Closes #1260
2024-02-21 23:54:41 -08:00
James R. Barlow
8d30cff4ef Undo future annotations from watcher.py till Typer fixes its issue
Fixes #1258
2024-02-20 19:14:39 -08:00
James R. Barlow
59d5b0d1bd v16.1.1 release notes v16.1.1 2024-02-15 16:56:25 -08:00
James R. Barlow
9ec0745ab8 Try pypy3.10 2024-02-14 14:25:13 -08:00
James R. Barlow
3a3635f7f9 Python 3.10 cleanup, manual fixes 2024-02-14 12:48:17 -08:00
James R. Barlow
6a746a1cbb ruff linting/Python 3.10 cleanup 2024-02-14 12:41:51 -08:00
James R. Barlow
906c130f96 Update rust toml settings 2024-02-14 12:32:26 -08:00
James R. Barlow
4a78458821 v16.1.0 release notes v16.1.0 2024-02-12 01:46:21 -08:00
James R. Barlow
fddf3ce2f4 Clarify warnings filter 2024-02-12 01:43:47 -08:00
James R. Barlow
353b34e695 Merge branch 'feature/pageboxes' 2024-02-12 01:41:56 -08:00
James R. Barlow
7d63355c3c Use hocr renderer for LTR languages 2024-02-12 01:41:41 -08:00
James R. Barlow
42ff7fc842 Fix handling of pages that are restored to correct orientation with /Rotate
Appears inversion of CTM was incorrect, introduced in commit 9898904
2024-02-12 01:32:26 -08:00
James R. Barlow
26470fe16a Suppress reportlab deprecation warning 2024-02-12 01:17:08 -08:00
James R. Barlow
3b9d4b7f0a Attempt to deal with oddball mediaboxes 2024-02-11 15:34:54 -08:00
James R. Barlow
11f53fe9a9 First cut at propagating page boxes
This would fix the immediate issue, but does not address an offset mediabox.
2024-02-11 15:34:54 -08:00
James R. Barlow
123c0c766f Mention pipx, install --user --upgrade
Closes #1249
2024-02-08 09:42:00 -08:00
James R. Barlow
6a9be2142e Advise Homebrew on Linux for Ubuntu 20.04 2024-02-07 19:52:50 -08:00
James R. Barlow
0bc350f55e Merge branch 'main' of github.com:ocrmypdf/OCRmyPDF 2024-02-06 01:28:10 -08:00
dependabot[bot]
7a6edf62ba Bump codecov/codecov-action from 3 to 4 (#1247)
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 3 to 4.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/codecov/codecov-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-05 03:55:13 -08:00