OCRmyPDF

mirror of https://github.com/ocrmypdf/OCRmyPDF.git synced 2026-04-19 21:49:57 -04:00

Author	SHA1	Message	Date
James R. Barlow	1c89cacfef	Respect host-set PIL.Image.MAX_IMAGE_PIXELS in Python API The API previously clobbered PIL.Image.MAX_IMAGE_PIXELS unconditionally on every call, so host applications (e.g. Paperless-NGX) that configured the PIL limit before invoking ocrmypdf.ocr() saw their setting silently overwritten with the 250 MP default. Make max_image_mpixels default to None and only apply the override when the caller explicitly sets it. The CLI default of 250 MP is unchanged. Fixes #1665	2026-04-19 13:44:57 -07:00
James R. Barlow	89c76b5145	v17.4.1 release notes	2026-04-05 00:23:07 -07:00
James R. Barlow	6f2b8408c1	v17.4.0 release notes	2026-03-21 01:43:03 -07:00
James R. Barlow	0c15ff594c	v17.3.0 release notes	2026-02-20 23:52:48 -08:00
James R. Barlow	a899f0d59a	Split release_notes into parts for each major release	2026-02-20 18:19:31 -08:00
James R. Barlow	c85c8941d3	Fix pdftotext word spacing by emitting single BT block per line poppler/pdftotext does not carry Tz (horizontal scaling) across BT/ET boundaries, causing words to appear on separate lines. Replace per-word BT blocks (via fpdf2's cell/set_stretching API) with a single BT block per line using raw PDF operators. Each non-last word gets a trailing space with Tz calculated to span exactly to the next word's start position.	2026-02-11 00:42:10 -08:00
James R. Barlow	5c83dab8a7	Fix fpdf text mode in multi-page renderer; add v17.2.0 release notes The previous fix (`e62e73e4`) only corrected text_rendering_mode → text_mode in the single-page Fpdf2PdfRenderer, but the main OCR pipeline uses Fpdf2MultiPageRenderer which still had the old attribute name. Since fpdf2 has no text_rendering_mode property, setting it silently created a no-op attribute while text_mode stayed at FILL — so 3 Tr (invisible text) was never emitted. Fixes #1631, #1632	2026-02-10 14:12:49 -08:00
James R. Barlow	1684982cde	Further adjustments to install docs	2026-02-06 17:17:44 -08:00
James R. Barlow	4d97dfd218	Update installation docs for modern tooling - Prioritize uv over pip throughout, with uv as the recommended installer - Update repology badges: Debian 13, Ubuntu 24.04, Fedora 40/41 - Make Python 3.12 the default (3.11 still supported) - Promote Homebrew as full-featured option for macOS and Linux - Add dependency summary table aligned with maintainers.md - Document uharfbuzz and fonts-noto requirements - Remove outdated warnings and simplify 32-bit section	2026-02-05 15:04:12 -08:00
James R. Barlow	9d8aa5a0c3	v17.1.0 release notes	2026-01-30 16:15:50 -08:00
James R. Barlow	3abe8f71c7	v17.0.1 release notes	2026-01-30 00:15:13 -08:00
James R. Barlow	c5d3ef4b17	Tighten ruff rules and modernize style	2026-01-27 14:04:52 -08:00
James R. Barlow	db9f94de14	Ensure Noto font is installed where needed	2026-01-20 19:50:47 -08:00
James R. Barlow	37e7131a01	Drop support for Python 3.10, require Python 3.11+ Python 3.11 is now the minimum supported version. This aligns with the codebase's use of StrEnum (introduced in 3.11) and removes compatibility shims that were only needed for older versions.	2026-01-20 11:54:55 -08:00
James R. Barlow	4b16228a4a	docs: minor adjustments	2026-01-20 10:41:55 -08:00
James R. Barlow	99f8106936	Update API documentation for OcrOptions-first calling convention Document the new v17 API style where OcrOptions can be passed directly to ocr(). Mark the positional argument style as legacy API for <v17 compatibility. Update examples to use modern syntax.	2026-01-20 10:30:33 -08:00
James R. Barlow	2f4280b66c	Comprrehensive documentation update in preparation for v17	2026-01-16 01:38:47 -08:00
James R. Barlow	6cf9d1c6ee	Update release notes	2026-01-15 23:29:29 -08:00
James R. Barlow	6a7164a76c	Update release notes with branch changes	2026-01-15 23:25:51 -08:00
James R. Barlow	bf76c8270c	Rationalize optional dependencies vs dependency groups Establish clear separation between user-facing optional dependencies and developer-only dependency groups: Optional Dependencies (user features): - watcher: File watching service for batch processing - webservice: Streamlit-based web UI - Installable via: uv sync --extra <name> or pip install ocrmypdf[name] Dependency Groups (developer tools): - test: Testing infrastructure (merged from test + extended_test) - docs: Documentation building tools - streamlit-dev: Enhanced Streamlit development tools - dev: General development tools (mypy, ipykernel) - Installable via: uv sync --group <name> (uv only, NOT pip) Breaking changes for developers: - pip install -e .[test] no longer works → use uv sync --group test - pip install -e .[docs] no longer works → use uv sync --group docs - pip install -e .[extended_test] removed → merged into test group No breaking changes for end users: - pip install ocrmypdf[watcher] still works - pip install ocrmypdf[webservice] still works Updated: - CI/CD workflows to use uv sync --group test - Docker images to exclude test dependencies - Documentation to recommend uv with pip as fallback - pyproject.toml with clear comments explaining both systems	2026-01-13 00:34:55 -08:00
James R. Barlow	740f67091c	Rename OCROptions to OcrOptions for consistency Technically OCROptions is more Pythonic but we have several pre-existing classes named OcrWhatever. Go with the local flow.	2026-01-12 23:37:54 -08:00
James R. Barlow	36dea181e6	Update cookbook: Replace --tesseract-timeout 0 with --ocr-engine none Update documentation examples to use the new --ocr-engine none option instead of the deprecated --tesseract-timeout 0 idiom for disabling OCR.	2026-01-12 23:28:14 -08:00
James R. Barlow	0d6e0c4560	Merge branch 'main' into dev	2025-12-24 00:44:18 -08:00
James R. Barlow	94d7735862	docs: missing issue ref	2025-12-24 00:14:24 -08:00
James R. Barlow	c540967429	docs: Update release notes	2025-12-23 15:44:44 -08:00
James R. Barlow	6ada11ddae	docs: Update release notes	2025-12-23 15:05:49 -08:00
James R. Barlow	01a3706281	docs: Add release notes for v16.13.0	2025-12-23 15:01:22 -08:00
James R. Barlow	16c2604a07	Remove lossy JBIG2 support, retain lossless JBIG2 only Lossy JBIG2 has been removed due to well-documented risks of character substitution errors (e.g., 6/8 confusion). The --jbig2-lossy and --jbig2-page-group-size arguments are now deprecated and ignored with a warning. Changes: - Remove jbig2_lossy and jbig2_page_group_size from OCROptions - Simplify optimize.py to use single-image JBIG2 encoding only (no symbol dictionaries/JBIG2Globals) - Remove convert_group() from jbig2enc.py - Deprecate CLI args with warnings for backward compatibility - Update documentation to explain lossless-only JBIG2	2025-12-23 02:45:07 -08:00
James R. Barlow	a4ee513cd4	refactor: clean up deprecated code and update plugin docs - Remove outdated Phase comments from _options.py and cli.py - Remove unused methods from PluginOptionRegistry: - get_extended_options_model() - replaced by __getattr__ in OCROptions - map_legacy_options() - unused - validate_plugin_options() - unused - Update plugin documentation to document register_options hook - Add documentation for nested plugin option access pattern 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 12:21:48 -08:00
James R. Barlow	530186b468	docs: update documentation for OCROptions plugin interface migration Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4) <aider@aider.chat>	2025-12-21 12:21:46 -08:00
rugk	8d715c4157	docs: fix and clarify podman usage instructions (#1601 ) * docs: fix and clarify podman usage instructions * the full reference `jbarlow83/ocrmypdf-alpine` as in the other commands may fix an issue if you do not have `ocrmypdf` already downloaded locally * also clarified the command at the end for usage when SELinux is enabled * docs: clarify difference between SeLinux and rootless user mapping	2025-12-01 13:07:09 -08:00
James R. Barlow	54ce09496c	v16.12.0 release notes	2025-11-11 13:48:06 -08:00
James R. Barlow	f181307e50	v16.11.1 release notes	2025-10-16 10:59:13 +02:00
James R. Barlow	9a2c0cf6ff	v16.11.0 release notes	2025-09-12 00:08:11 -07:00
clach04	d07231a7aa	Doc typo plugins.md (#1568 )	2025-09-08 12:07:51 -07:00
Christoph Dyllick-Brenzinger	74305e8741	Update batch.md (#1552 ) Add two missing available parameters for watcher.py (used with docker): - OCR_LOGLEVEL - OCR_JSON_SETTINGS	2025-08-05 14:11:55 -07:00
Máté Gyöngyösi	d6b069d3fa	Unify `--tesseract-timeout` flag syntax (#1546 ) As pointed out at https://github.com/tldr-pages/tldr/pull/17175#discussion_r2192340014.	2025-07-08 11:40:58 -07:00
James R. Barlow	194ca699a8	v16.10.4 release notes	2025-07-07 12:36:15 -07:00
James R. Barlow	7ea940a3a6	v16.10.3 release notes	2025-06-13 00:28:33 -07:00
James R. Barlow	9f6e5a48ad	Deny use of pikepdf 9.8.0 due to GlyphlessFont error	2025-05-27 12:16:19 -07:00
James R. Barlow	b166e86216	jbig2 doc: mention pkg-config Closes #1484	2025-05-26 13:04:05 -07:00
James R. Barlow	7c5bed41f1	v16.10.1	2025-04-21 01:15:29 -07:00
James R. Barlow	3304498bdc	Fix some anchors and markdown quirks	2025-04-21 00:50:26 -07:00
James R. Barlow	e4a8f7a354	Remove redundant optimizer content	2025-04-17 15:10:59 -07:00
James R. Barlow	d1a45e4abc	Convert remaining rst -> md	2025-04-17 15:03:21 -07:00
James R. Barlow	3b9367fc69	Continuing rst -> md	2025-04-17 02:27:59 -07:00
James R. Barlow	92a78f611e	rst -> md migration in progress	2025-04-17 02:10:40 -07:00
Ikko Eltociear Ashimine	0f5ccb71ca	docs: update installation.rst instal -> install	2025-03-09 01:27:25 +09:00
James R. Barlow	7b2dd892e5	v16.10.0 release notes	2025-02-26 15:16:18 -08:00
James R. Barlow	2a55ceadd0	Merge branch 'pr/rugk/1489'	2025-02-26 14:59:06 -08:00

1 2 3 4 5 ...

767 Commits