James R. Barlow
397fad249d
v16.3.0 release notes
v16.3.0
2024-05-19 00:50:59 -07:00
James R. Barlow
9a3c5a3f7c
Add progressbar for metadata_fixup
...
Might take time for big files. Pdf.open() potentially is expensive as well, but QPDF doesn't give us progress feedback for that.
Closes Show progress during postprocessing #1313
2024-05-19 00:46:50 -07:00
James R. Barlow
950c700274
Fix Ghostscript PDF/A progressbar not displaying
2024-05-19 00:44:21 -07:00
James R. Barlow
26432c38a9
Raise exception if rotate pages threshold adjusted without --rotate-pages
...
Fixes Make usage of --rotate-pages-threshold clearer #1309
2024-05-18 23:49:27 -07:00
James R. Barlow
28be50136c
hocr: If a line box's coords are invalid, log and error and don't render
...
Addresses [Bug]: Crash on multiple .pdf files #1312
Not actually a fix, but at least it will get us better diagnostics. Appears old Tesseract 4.x generates bad line boxes at times.
2024-05-18 23:32:18 -07:00
James R. Barlow
0c62f2de5d
Issue template: check for EOL OS
2024-05-17 19:51:15 -07:00
James R. Barlow
5caf654f22
Add new codecov token
2024-05-11 01:03:41 -07:00
James R. Barlow
205593445e
Change test to run on macos x64 and arm64
2024-05-11 00:13:08 -07:00
James R. Barlow
f25fb8c63a
Merge branch 'main' of github.com:ocrmypdf/OCRmyPDF
2024-05-08 00:39:27 -07:00
James R. Barlow
99c78650b6
Add better error message for PDFs with invalid CTMs
...
Closes #1303
2024-05-07 14:00:30 -07:00
Ahmed Abdou
08e89e2dbe
Adding language install docs for archlinux ( #1296 )
...
Adding language install docs for archlinux
2024-04-24 14:46:05 -07:00
James R. Barlow
0e013df161
v16.2.0 release notes
v16.2.0
2024-04-16 00:37:03 -07:00
James R. Barlow
9ba4e3ab46
Log unusual exceptions when trying to obtain a version
...
Fixes #1262
2024-04-07 14:39:08 -07:00
James R. Barlow
5fdcb7602b
Make downsampling large images that Tesseract would otherwise error on into default behavior
...
Fixes #1281
2024-04-07 13:43:20 -07:00
James R. Barlow
b4db1b741f
optimize: fix handling of [/FlateDecode none] - type images
...
Closes #1271
2024-04-07 01:44:08 -07:00
James R. Barlow
7a8cc21e31
Add support for sidecar output to io.BytesIO
...
Closes #1252
2024-04-07 01:38:55 -07:00
James R. Barlow
0674829d8f
Remove tool.black config
2024-04-07 00:36:52 -07:00
James R. Barlow
315aa0474b
Merge branch 'main' of github.com:ocrmypdf/OCRmyPDF
2024-04-07 00:34:51 -07:00
Ben Beasley
df3451e779
Update the typer[all] dependency to typer-slim[standard] ( #1287 )
...
In 0.12.1, Typer was significantly reorganized.
- `typer-slim` is the library (for `import typer`)
- `typer-slim[standard]` adds optional dependencies (currently `rich`
and `shellingham`, basically equivalent to the old `typer[all]`)
- `typer-cli` is the `typer` command-line tool
- `typer` is now basically a metapackage that brings in *all of the
above*, and it no longer has an `all` extra
Pip will warn about this and proceed,
```
WARNING: typer 0.12.1 does not provide the extra 'all'
```
but there are other tools that will fail hard when asked to resolve a
(now) nonexistent extra.
Since this project doesn’t need the `typer` command-line tool, it looks
like changing the dependency to `typer-slim[standard]` is the best way
forward.
See https://typer.tiangolo.com/release-notes/#0121 and
tiangolo/typer#785 for further discussion
and details.
2024-04-07 00:34:34 -07:00
akierig
3ba42802d1
added Macports install information ( #1286 )
2024-04-07 00:33:57 -07:00
James R. Barlow
d6342cb8c2
Add heif/heic input image support
2024-04-07 00:33:13 -07:00
James R. Barlow
065bddbc6c
Reformat with ruff format
2024-04-07 00:25:32 -07:00
James R. Barlow
067f429dde
Merge branch 'main' of github.com:ocrmypdf/OCRmyPDF
2024-03-26 15:34:00 -07:00
Daniel Lovegrove
6895c2d70f
Fix Broken Documentation Links ( #1275 )
...
* Update URL for PDFMARK documentation
For reference, here is a link to the old PDF:
https://web.archive.org/web/20190806035303/https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/pdfmark_reference.pdf
It appears Adobe converted the PDF into a webpage-based document, the
wording seems to almost identical b/w the PDF and the website.
* Fix cross-references to JBIG2 page
* Fix links for Fedora + Arch + HEAD revision install
Fedora 39 has been released, and the package tracker no longer includes
a release overview for Fedora 37 hence why it was removed here.
2024-03-22 14:38:52 -07:00
James R. Barlow
686481982a
Fix naming of hOCR rendered files
2024-03-22 13:27:20 -07:00
James R. Barlow
a9e1d19b78
v16.1.2 release notes
v16.1.2
2024-03-20 12:56:13 -07:00
James R. Barlow
f95aa63718
Merge branch 'main' of github.com:ocrmypdf/OCRmyPDF
2024-03-20 12:26:02 -07:00
James Barlow
855de287b2
Fix test suite failure with Ghostscript >= 10.3
...
Ghostscript is more picky about a specific case with SMask that cannot be converted to PDF/A
Details here
4dcfae36bb
2024-03-19 17:20:33 -07:00
NilsRo
feeb9f213f
batch example: added archive, small corrections and optimizations ( #1277 )
...
* Added archive, small corrections
Added a function to archive originals and avoid calling ocrmypdf if they are still is PDF/A.
* Added Copyright
2024-03-18 13:22:24 -07:00
Emiel Molenaar
e7eb8fa805
Update Dockerfile.alpine ( #1268 )
...
Use Alpine 3.19 as base image to ensure we get GhostScript 10.2.1 to eliminate serious regressions that corrupt PDFs with existing text.
2024-03-13 14:49:42 -07:00
James R. Barlow
8a747f005a
pixels -> megapixels
...
Fixes #1265
2024-02-29 15:31:07 -08:00
James R. Barlow
16ab4a8b4e
Fix error message about missing Python exec
...
Message is
unable to start container process: exec: "python": executable file not found in $PATH: unknown.
Closes #1260
2024-02-21 23:54:41 -08:00
James R. Barlow
8d30cff4ef
Undo future annotations from watcher.py till Typer fixes its issue
...
Fixes #1258
2024-02-20 19:14:39 -08:00
James R. Barlow
59d5b0d1bd
v16.1.1 release notes
v16.1.1
2024-02-15 16:56:25 -08:00
James R. Barlow
9ec0745ab8
Try pypy3.10
2024-02-14 14:25:13 -08:00
James R. Barlow
3a3635f7f9
Python 3.10 cleanup, manual fixes
2024-02-14 12:48:17 -08:00
James R. Barlow
6a746a1cbb
ruff linting/Python 3.10 cleanup
2024-02-14 12:41:51 -08:00
James R. Barlow
906c130f96
Update rust toml settings
2024-02-14 12:32:26 -08:00
James R. Barlow
4a78458821
v16.1.0 release notes
v16.1.0
2024-02-12 01:46:21 -08:00
James R. Barlow
fddf3ce2f4
Clarify warnings filter
2024-02-12 01:43:47 -08:00
James R. Barlow
353b34e695
Merge branch 'feature/pageboxes'
2024-02-12 01:41:56 -08:00
James R. Barlow
7d63355c3c
Use hocr renderer for LTR languages
2024-02-12 01:41:41 -08:00
James R. Barlow
42ff7fc842
Fix handling of pages that are restored to correct orientation with /Rotate
...
Appears inversion of CTM was incorrect, introduced in commit 9898904
2024-02-12 01:32:26 -08:00
James R. Barlow
26470fe16a
Suppress reportlab deprecation warning
2024-02-12 01:17:08 -08:00
James R. Barlow
3b9d4b7f0a
Attempt to deal with oddball mediaboxes
2024-02-11 15:34:54 -08:00
James R. Barlow
11f53fe9a9
First cut at propagating page boxes
...
This would fix the immediate issue, but does not address an offset mediabox.
2024-02-11 15:34:54 -08:00
James R. Barlow
123c0c766f
Mention pipx, install --user --upgrade
...
Closes #1249
2024-02-08 09:42:00 -08:00
James R. Barlow
6a9be2142e
Advise Homebrew on Linux for Ubuntu 20.04
2024-02-07 19:52:50 -08:00
James R. Barlow
0bc350f55e
Merge branch 'main' of github.com:ocrmypdf/OCRmyPDF
2024-02-06 01:28:10 -08:00
dependabot[bot]
7a6edf62ba
Bump codecov/codecov-action from 3 to 4 ( #1247 )
...
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action ) from 3 to 4.
- [Release notes](https://github.com/codecov/codecov-action/releases )
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md )
- [Commits](https://github.com/codecov/codecov-action/compare/v3...v4 )
---
updated-dependencies:
- dependency-name: codecov/codecov-action
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-02-05 03:55:13 -08:00