Commit Graph

755 Commits

Author SHA1 Message Date
James R. Barlow
db9f94de14 Ensure Noto font is installed where needed 2026-01-20 19:50:47 -08:00
James R. Barlow
37e7131a01 Drop support for Python 3.10, require Python 3.11+
Python 3.11 is now the minimum supported version. This aligns with
the codebase's use of StrEnum (introduced in 3.11) and removes
compatibility shims that were only needed for older versions.
2026-01-20 11:54:55 -08:00
James R. Barlow
4b16228a4a docs: minor adjustments 2026-01-20 10:41:55 -08:00
James R. Barlow
99f8106936 Update API documentation for OcrOptions-first calling convention
Document the new v17 API style where OcrOptions can be passed directly
to ocr(). Mark the positional argument style as legacy API for <v17
compatibility. Update examples to use modern syntax.
2026-01-20 10:30:33 -08:00
James R. Barlow
2f4280b66c Comprrehensive documentation update in preparation for v17 2026-01-16 01:38:47 -08:00
James R. Barlow
6cf9d1c6ee Update release notes 2026-01-15 23:29:29 -08:00
James R. Barlow
6a7164a76c Update release notes with branch changes 2026-01-15 23:25:51 -08:00
James R. Barlow
bf76c8270c Rationalize optional dependencies vs dependency groups
Establish clear separation between user-facing optional dependencies
and developer-only dependency groups:

**Optional Dependencies (user features):**
- watcher: File watching service for batch processing
- webservice: Streamlit-based web UI
- Installable via: uv sync --extra <name> or pip install ocrmypdf[name]

**Dependency Groups (developer tools):**
- test: Testing infrastructure (merged from test + extended_test)
- docs: Documentation building tools
- streamlit-dev: Enhanced Streamlit development tools
- dev: General development tools (mypy, ipykernel)
- Installable via: uv sync --group <name> (uv only, NOT pip)

Breaking changes for developers:
- pip install -e .[test] no longer works → use uv sync --group test
- pip install -e .[docs] no longer works → use uv sync --group docs
- pip install -e .[extended_test] removed → merged into test group

No breaking changes for end users:
- pip install ocrmypdf[watcher] still works
- pip install ocrmypdf[webservice] still works

Updated:
- CI/CD workflows to use uv sync --group test
- Docker images to exclude test dependencies
- Documentation to recommend uv with pip as fallback
- pyproject.toml with clear comments explaining both systems
2026-01-13 00:34:55 -08:00
James R. Barlow
740f67091c Rename OCROptions to OcrOptions for consistency
Technically OCROptions is more Pythonic but we have several pre-existing classes named OcrWhatever. Go with the local flow.
2026-01-12 23:37:54 -08:00
James R. Barlow
36dea181e6 Update cookbook: Replace --tesseract-timeout 0 with --ocr-engine none
Update documentation examples to use the new --ocr-engine none option
instead of the deprecated --tesseract-timeout 0 idiom for disabling OCR.
2026-01-12 23:28:14 -08:00
James R. Barlow
0d6e0c4560 Merge branch 'main' into dev 2025-12-24 00:44:18 -08:00
James R. Barlow
94d7735862 docs: missing issue ref 2025-12-24 00:14:24 -08:00
James R. Barlow
c540967429 docs: Update release notes 2025-12-23 15:44:44 -08:00
James R. Barlow
6ada11ddae docs: Update release notes 2025-12-23 15:05:49 -08:00
James R. Barlow
01a3706281 docs: Add release notes for v16.13.0 2025-12-23 15:01:22 -08:00
James R. Barlow
16c2604a07 Remove lossy JBIG2 support, retain lossless JBIG2 only
Lossy JBIG2 has been removed due to well-documented risks of character
substitution errors (e.g., 6/8 confusion). The --jbig2-lossy and
--jbig2-page-group-size arguments are now deprecated and ignored with
a warning.

Changes:
- Remove jbig2_lossy and jbig2_page_group_size from OCROptions
- Simplify optimize.py to use single-image JBIG2 encoding only
  (no symbol dictionaries/JBIG2Globals)
- Remove convert_group() from jbig2enc.py
- Deprecate CLI args with warnings for backward compatibility
- Update documentation to explain lossless-only JBIG2
2025-12-23 02:45:07 -08:00
James R. Barlow
a4ee513cd4 refactor: clean up deprecated code and update plugin docs
- Remove outdated Phase comments from _options.py and cli.py
- Remove unused methods from PluginOptionRegistry:
  - get_extended_options_model() - replaced by __getattr__ in OCROptions
  - map_legacy_options() - unused
  - validate_plugin_options() - unused
- Update plugin documentation to document register_options hook
- Add documentation for nested plugin option access pattern

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 12:21:48 -08:00
James R. Barlow
530186b468 docs: update documentation for OCROptions plugin interface migration
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4) <aider@aider.chat>
2025-12-21 12:21:46 -08:00
rugk
8d715c4157 docs: fix and clarify podman usage instructions (#1601)
* docs: fix and clarify podman usage instructions

* the full reference `jbarlow83/ocrmypdf-alpine` as in the other commands may fix an issue if you do not have `ocrmypdf` already downloaded locally
* also clarified the command at the end for usage when SELinux is enabled

* docs: clarify difference between SeLinux and rootless user mapping
2025-12-01 13:07:09 -08:00
James R. Barlow
54ce09496c v16.12.0 release notes 2025-11-11 13:48:06 -08:00
James R. Barlow
f181307e50 v16.11.1 release notes 2025-10-16 10:59:13 +02:00
James R. Barlow
9a2c0cf6ff v16.11.0 release notes 2025-09-12 00:08:11 -07:00
clach04
d07231a7aa Doc typo plugins.md (#1568) 2025-09-08 12:07:51 -07:00
Christoph Dyllick-Brenzinger
74305e8741 Update batch.md (#1552)
Add two missing available parameters for watcher.py (used with docker):
- OCR_LOGLEVEL
- OCR_JSON_SETTINGS
2025-08-05 14:11:55 -07:00
Máté Gyöngyösi
d6b069d3fa Unify --tesseract-timeout flag syntax (#1546)
As pointed out at 
https://github.com/tldr-pages/tldr/pull/17175#discussion_r2192340014.
2025-07-08 11:40:58 -07:00
James R. Barlow
194ca699a8 v16.10.4 release notes 2025-07-07 12:36:15 -07:00
James R. Barlow
7ea940a3a6 v16.10.3 release notes 2025-06-13 00:28:33 -07:00
James R. Barlow
9f6e5a48ad Deny use of pikepdf 9.8.0 due to GlyphlessFont error 2025-05-27 12:16:19 -07:00
James R. Barlow
b166e86216 jbig2 doc: mention pkg-config
Closes #1484
2025-05-26 13:04:05 -07:00
James R. Barlow
7c5bed41f1 v16.10.1 2025-04-21 01:15:29 -07:00
James R. Barlow
3304498bdc Fix some anchors and markdown quirks 2025-04-21 00:50:26 -07:00
James R. Barlow
e4a8f7a354 Remove redundant optimizer content 2025-04-17 15:10:59 -07:00
James R. Barlow
d1a45e4abc Convert remaining rst -> md 2025-04-17 15:03:21 -07:00
James R. Barlow
3b9367fc69 Continuing rst -> md 2025-04-17 02:27:59 -07:00
James R. Barlow
92a78f611e rst -> md migration in progress 2025-04-17 02:10:40 -07:00
Ikko Eltociear Ashimine
0f5ccb71ca docs: update installation.rst
instal -> install
2025-03-09 01:27:25 +09:00
James R. Barlow
7b2dd892e5 v16.10.0 release notes 2025-02-26 15:16:18 -08:00
James R. Barlow
2a55ceadd0 Merge branch 'pr/rugk/1489' 2025-02-26 14:59:06 -08:00
James R. Barlow
71991ad09b Remove podman 2025-02-26 14:58:43 -08:00
James R. Barlow
83b4469ef1 Word wrap 2025-02-26 14:57:18 -08:00
rugk
53270b8eb1 Doc: Update docker.rst to use
I prefer to write the name in full aka `jbarlow83/ocrmypdf-alpine` and I'd also suggest to document this because:
* if you use `docker tag` this AFAIK only tags the currently downloaded (=pulled) version of that image
* in case a new update comes out, the new one will not be pulled automatically and one would have to pull and tag the image locally, again
* This `docker tag`  command is easily overlooked, if users just run `docker run ocrmypdf` this may or may not work, depending on how it is resolved.
   Also, AFAIK if one could get Docker to register https://hub.docker.com/ocrmypdf then this would suddenly be used instead of your image (currently `podman pull docker.io/ocrmypdf` returns a 404 for me, though)
* It is more common to write at least the user namespace there and the project, to prevent such errors.

Also, default [Docker has many shortcuts for this and e.g. assumes Docker-Hub is always being used](https://stackoverflow.com/questions/37861791/how-are-docker-image-names-parsed). Podman usually does not, that's why I personally prefer to use the very full and clear `docker.io/jbarlow83/ocrmypdf-alpine:latest` e.g. for alpine. This makes it not only clear which version is used, but also where it is pulled from (should one have configured different Docker registries).
2025-02-26 02:43:46 +01:00
rugk
3049a10757 doc: Update docker.rst to explain how to use with podman
I've fiddled/struggled with this by myself, by getting a permission error like this one:
```shell
OutputFileAccessError: Output file location (./output.pdf) is not a writable file.
``` 

I've loosely followed and found https://github.com/containers/podlet?tab=readme-ov-file#in-a-container and explained the required flags in a similar way, but adapted for this tool (it likely won't be used so much on system files).

I've tested it and it works fine for me. The same issue may be on Docker rootless, but I guess people will get that and I cannot test it here.
2025-02-26 02:30:25 +01:00
alex
acea9529ea Correct the installation instructions for Windows 2025-02-22 10:46:22 +01:00
James R. Barlow
e4274a956d v16.9.0 release notes 2025-02-07 00:53:08 -08:00
James R. Barlow
d1fc77e1b6 docs: add imgconverter 2025-01-04 12:39:55 -08:00
James R. Barlow
17eed0529a Update notes 2025-01-04 12:21:46 -08:00
James R. Barlow
bfbe571f12 docs: fix more rst formatting issues 2025-01-04 10:59:52 -08:00
James R. Barlow
b486df7e2d docs: auto update year 2025-01-04 01:04:29 -08:00
James R. Barlow
74a84b6ae9 Fix numerous documentation build problems 2025-01-03 12:23:42 -08:00
James R. Barlow
15df9c370c Update notes 2024-12-08 12:20:40 -08:00