Python 3.11 is now the minimum supported version. This aligns with
the codebase's use of StrEnum (introduced in 3.11) and removes
compatibility shims that were only needed for older versions.
Document the new v17 API style where OcrOptions can be passed directly
to ocr(). Mark the positional argument style as legacy API for <v17
compatibility. Update examples to use modern syntax.
Establish clear separation between user-facing optional dependencies
and developer-only dependency groups:
**Optional Dependencies (user features):**
- watcher: File watching service for batch processing
- webservice: Streamlit-based web UI
- Installable via: uv sync --extra <name> or pip install ocrmypdf[name]
**Dependency Groups (developer tools):**
- test: Testing infrastructure (merged from test + extended_test)
- docs: Documentation building tools
- streamlit-dev: Enhanced Streamlit development tools
- dev: General development tools (mypy, ipykernel)
- Installable via: uv sync --group <name> (uv only, NOT pip)
Breaking changes for developers:
- pip install -e .[test] no longer works → use uv sync --group test
- pip install -e .[docs] no longer works → use uv sync --group docs
- pip install -e .[extended_test] removed → merged into test group
No breaking changes for end users:
- pip install ocrmypdf[watcher] still works
- pip install ocrmypdf[webservice] still works
Updated:
- CI/CD workflows to use uv sync --group test
- Docker images to exclude test dependencies
- Documentation to recommend uv with pip as fallback
- pyproject.toml with clear comments explaining both systems
Lossy JBIG2 has been removed due to well-documented risks of character
substitution errors (e.g., 6/8 confusion). The --jbig2-lossy and
--jbig2-page-group-size arguments are now deprecated and ignored with
a warning.
Changes:
- Remove jbig2_lossy and jbig2_page_group_size from OCROptions
- Simplify optimize.py to use single-image JBIG2 encoding only
(no symbol dictionaries/JBIG2Globals)
- Remove convert_group() from jbig2enc.py
- Deprecate CLI args with warnings for backward compatibility
- Update documentation to explain lossless-only JBIG2
* docs: fix and clarify podman usage instructions
* the full reference `jbarlow83/ocrmypdf-alpine` as in the other commands may fix an issue if you do not have `ocrmypdf` already downloaded locally
* also clarified the command at the end for usage when SELinux is enabled
* docs: clarify difference between SeLinux and rootless user mapping
I prefer to write the name in full aka `jbarlow83/ocrmypdf-alpine` and I'd also suggest to document this because:
* if you use `docker tag` this AFAIK only tags the currently downloaded (=pulled) version of that image
* in case a new update comes out, the new one will not be pulled automatically and one would have to pull and tag the image locally, again
* This `docker tag` command is easily overlooked, if users just run `docker run ocrmypdf` this may or may not work, depending on how it is resolved.
Also, AFAIK if one could get Docker to register https://hub.docker.com/ocrmypdf then this would suddenly be used instead of your image (currently `podman pull docker.io/ocrmypdf` returns a 404 for me, though)
* It is more common to write at least the user namespace there and the project, to prevent such errors.
Also, default [Docker has many shortcuts for this and e.g. assumes Docker-Hub is always being used](https://stackoverflow.com/questions/37861791/how-are-docker-image-names-parsed). Podman usually does not, that's why I personally prefer to use the very full and clear `docker.io/jbarlow83/ocrmypdf-alpine:latest` e.g. for alpine. This makes it not only clear which version is used, but also where it is pulled from (should one have configured different Docker registries).
I've fiddled/struggled with this by myself, by getting a permission error like this one:
```shell
OutputFileAccessError: Output file location (./output.pdf) is not a writable file.
```
I've loosely followed and found https://github.com/containers/podlet?tab=readme-ov-file#in-a-container and explained the required flags in a similar way, but adapted for this tool (it likely won't be used so much on system files).
I've tested it and it works fine for me. The same issue may be on Docker rootless, but I guess people will get that and I cannot test it here.