Commit Graph

  • 4d97dfd218 Update installation docs for modern tooling main James R. Barlow 2026-02-05 15:04:12 -08:00
  • a35fcc9c43 Handle Ghostscript rasterization with DPI below 10 James R. Barlow 2026-01-31 13:01:04 -08:00
  • 3dd4cde7ce Tighten plugin manager return types to non-optional James R. Barlow 2026-01-31 12:12:07 -08:00
  • 92beb474a5 Normalize unpaper_args to list at construction time James R. Barlow 2026-01-31 12:05:37 -08:00
  • 9dcd882c83 Use uv to install docs with dependency groups James R. Barlow 2026-01-31 00:05:27 -08:00
  • 9d8aa5a0c3 v17.1.0 release notes v17.1.0 James R. Barlow 2026-01-30 16:15:50 -08:00
  • e036a902ae Add --tagged-pdf-mode option to control Tagged PDF handling James R. Barlow 2026-01-30 14:01:23 -08:00
  • 0a980fb11b Add Encoding.flate_jpeg to recognize deflated JPEG images James R. Barlow 2026-01-30 12:53:59 -08:00
  • 3abe8f71c7 v17.0.1 release notes v17.0.1 James R. Barlow 2026-01-30 00:15:13 -08:00
  • 64f45b7fdb Fix pypdfium type checking James R. Barlow 2026-01-30 00:14:02 -08:00
  • 7e939ad44d Fix pypdfium rasterizer to respect raster_device colorspace James R. Barlow 2026-01-30 00:04:02 -08:00
  • 297fb786a0 Update uv.lock (for protobuf) v17.0.0 James R. Barlow 2026-01-29 18:33:00 -08:00
  • ad30dd94f7 Merge branch 'release/v17' James R. Barlow 2026-01-29 18:31:11 -08:00
  • e77f79ac6f Merge branch 'main' of github.com:ocrmypdf/OCRmyPDF James R. Barlow 2026-01-29 18:30:54 -08:00
  • c84fc56e45 Update CLI completions to match current options James R. Barlow 2026-01-29 12:41:56 -08:00
  • 0a0756b33e Tidy long lines and unnested with blocks release/v17 James R. Barlow 2026-01-27 15:28:27 -08:00
  • c5d3ef4b17 Tighten ruff rules and modernize style James R. Barlow 2026-01-27 14:04:52 -08:00
  • 6b37583674 Refactor: move ocr_element to a better location James R. Barlow 2026-01-27 13:48:38 -08:00
  • de5f2b80f0 Further patching-out of fonts v17.0.0b1 James R. Barlow 2026-01-21 11:43:54 -08:00
  • d951b4f0f7 Improve font fallback checking James R. Barlow 2026-01-21 10:38:07 -08:00
  • b386d39b3b tests: fix test_page_boxes when verapdf unavailable James R. Barlow 2026-01-21 00:22:26 -08:00
  • ec595a395b tests: little fixes James R. Barlow 2026-01-20 23:23:43 -08:00
  • bd29269c00 Various test fixes, mainly Windows issues James R. Barlow 2026-01-20 22:28:06 -08:00
  • 6fb7c5d95f Additional build fixes James R. Barlow 2026-01-20 21:49:40 -08:00
  • d57552c4f8 test: For Windows, ensure outputs are UTF-8 James R. Barlow 2026-01-20 21:33:31 -08:00
  • f017c982cf watcher: use modern API James R. Barlow 2026-01-20 21:25:12 -08:00
  • 7ac51ac1a7 Fix type alias for Queue causing runtime TypeError James R. Barlow 2026-01-20 20:38:33 -08:00
  • db9f94de14 Ensure Noto font is installed where needed James R. Barlow 2026-01-20 19:50:47 -08:00
  • 37e7131a01 Drop support for Python 3.10, require Python 3.11+ James R. Barlow 2026-01-20 11:54:55 -08:00
  • bc745d4d81 Replace magic Ghostscript raster device strings with StrEnum James R. Barlow 2025-11-11 14:03:37 -08:00
  • c818ad5e75 Drop deprecated NeverRaise exception James R. Barlow 2025-11-11 14:02:45 -08:00
  • 4b16228a4a docs: minor adjustments James R. Barlow 2026-01-20 10:41:55 -08:00
  • d40fca2590 Add verapdf to build for macOS James R. Barlow 2026-01-20 10:41:43 -08:00
  • 99f8106936 Update API documentation for OcrOptions-first calling convention James R. Barlow 2026-01-20 10:30:33 -08:00
  • ef88ba3f95 Add OcrOptions as first-class argument to ocr() function James R. Barlow 2026-01-20 10:20:52 -08:00
  • 2f4280b66c Comprrehensive documentation update in preparation for v17 James R. Barlow 2026-01-16 01:38:47 -08:00
  • 6cf9d1c6ee Update release notes James R. Barlow 2026-01-15 23:29:29 -08:00
  • 6a7164a76c Update release notes with branch changes James R. Barlow 2026-01-15 23:25:51 -08:00
  • 3f328785f0 Fix pypdfium rasterizer to match Ghostscript dimensions James R. Barlow 2026-01-14 14:37:24 -08:00
  • 5acf21651f ruff lint and format James R. Barlow 2026-01-13 01:50:57 -08:00
  • 7bfe3ecd5b Fix double-compression of already-deflated JPEGs James R. Barlow 2026-01-13 01:41:59 -08:00
  • 5371cc5e39 Update test to match new error messag James R. Barlow 2026-01-13 01:33:10 -08:00
  • 4c7086c609 Replace typer with cyclopts CLI library in misc scripts James R. Barlow 2026-01-13 00:43:14 -08:00
  • bf76c8270c Rationalize optional dependencies vs dependency groups James R. Barlow 2026-01-13 00:34:55 -08:00
  • 740f67091c Rename OCROptions to OcrOptions for consistency James R. Barlow 2026-01-12 23:37:54 -08:00
  • 36dea181e6 Update cookbook: Replace --tesseract-timeout 0 with --ocr-engine none James R. Barlow 2026-01-12 23:28:14 -08:00
  • c69f293322 Add --mode/-m CLI argument with ProcessingMode enum James R. Barlow 2026-01-12 15:23:08 -08:00
  • e9fe061c30 Format fix James R. Barlow 2026-01-12 10:25:24 -08:00
  • c9ea07e954 Reduce chattiness of fonttools James R. Barlow 2026-01-12 10:16:58 -08:00
  • 0c3745a1a4 Add OCR engine selection framework and null OCR engine James R. Barlow 2026-01-10 02:23:52 -08:00
  • 664c3e2a8e Update test cache for slow rotation tests James R. Barlow 2026-01-10 16:30:25 -08:00
  • 315d0df0e9 Fix incorrect rotation direction in pypdfium rasterizer James R. Barlow 2026-01-10 16:29:49 -08:00
  • 3c94ada857 Fix tesseract_cache plugin to properly handle cache misses James R. Barlow 2026-01-09 02:10:29 -08:00
  • fcbdbac602 Update test_page_boxes MediaBox expectations for speculative PDF/A James R. Barlow 2026-01-09 01:25:31 -08:00
  • 122450c19e Fix Ghostscript tests after default output type changed to 'auto' James R. Barlow 2026-01-09 01:02:25 -08:00
  • 0c4ee5af4e Add 'auto' output type for best-effort PDF/A without Ghostscript James R. Barlow 2026-01-09 00:56:00 -08:00
  • bdc50e9470 Add explicit word spacing for pdfminer.six compatibility James R. Barlow 2026-01-08 16:32:14 -08:00
  • 4cb488d0fc Skip speculative PDF/A when --pdfa-image-compression is set James R. Barlow 2026-01-08 15:12:35 -08:00
  • bb5238e524 Update tests to use new OcrmypdfPluginManager interface James R. Barlow 2026-01-08 13:09:19 -08:00
  • 900a60fd10 Add verapdf integration for speculative PDF/A conversion James R. Barlow 2026-01-08 10:58:01 -08:00
  • f5617ce44e Refactor OcrmypdfPluginManager to use composition over inheritance James R. Barlow 2026-01-07 17:23:13 -08:00
  • 0e946a7498 Clarify messageabout number of workers James R. Barlow 2026-01-07 16:41:18 -08:00
  • b2b6a7c4b1 Pass OMP_THREAD_LIMIT to Tesseract subprocesses instead of modifying parent env James R. Barlow 2026-01-06 18:43:29 -08:00
  • 75c664793e Don't share claude James R. Barlow 2026-01-06 13:46:40 -08:00
  • bbd263ff48 Add tests for fpdf2 renderer and font infrastructure James R. Barlow 2026-01-06 13:46:11 -08:00
  • 7a4b98974c Integrate fpdf2 renderer and remove legacy hOCR renderer James R. Barlow 2026-01-06 13:45:44 -08:00
  • d72a494979 Add fpdf2-based PDF text layer renderer James R. Barlow 2026-01-06 13:45:14 -08:00
  • 64726f97b3 Add font infrastructure and glyphless font James R. Barlow 2026-01-06 13:44:54 -08:00
  • 83a43408c2 Refactor tesseract thresholding to use enum type James R. Barlow 2025-12-27 13:32:56 -08:00
  • 2cb0973540 Improve Ghostscript API/CLI definitions James R. Barlow 2025-12-27 01:40:12 -08:00
  • 8930efe787 Update README with Fedora installation instructions (#1610) SuperCowProducts 2025-12-27 10:15:45 +01:00
  • 0d6e0c4560 Merge branch 'main' into dev James R. Barlow 2025-12-24 00:44:18 -08:00
  • 94d7735862 docs: missing issue ref James R. Barlow 2025-12-24 00:14:24 -08:00
  • c540967429 docs: Update release notes v16.13.0 James R. Barlow 2025-12-23 15:44:44 -08:00
  • 195344d307 Reinstate "Work around Ghostscript 10.6.0 JPEG encoding issue by forcing optimization."" James R. Barlow 2025-12-23 15:41:27 -08:00
  • de63d6eac9 Merge remote-tracking branches 'origin/dependabot/github_actions/actions/download-artifact-7', 'origin/dependabot/github_actions/actions/upload-artifact-6', 'origin/dependabot/github_actions/sigstore/gh-action-sigstore-python-3.2.0' and 'origin/dependabot/github_actions/actions/checkout-6' James R. Barlow 2025-12-23 15:06:50 -08:00
  • 6ada11ddae docs: Update release notes James R. Barlow 2025-12-23 15:05:49 -08:00
  • fc30cb8903 Revert "Work around Ghostscript 10.6.0 JPEG encoding issue by forcing optimization." James R. Barlow 2025-12-23 15:03:51 -08:00
  • 01a3706281 docs: Add release notes for v16.13.0 James R. Barlow 2025-12-23 15:01:22 -08:00
  • e613db6a82 Fix Ghostscript 10.6 JPEG corruption by repairing truncated images James R. Barlow 2025-12-23 14:56:24 -08:00
  • 742a4bac17 Make rotation test more robust James R. Barlow 2025-12-21 14:42:14 -08:00
  • 4c1ef0b471 Also process art and bleed boxes James R. Barlow 2025-02-12 12:49:26 -08:00
  • eace567f7b Test and fix page box issues James R. Barlow 2025-02-11 00:40:01 -08:00
  • e9bfce34f1 Fix ruff linting issues James R. Barlow 2025-12-23 03:07:48 -08:00
  • 16c2604a07 Remove lossy JBIG2 support, retain lossless JBIG2 only James R. Barlow 2025-12-23 02:45:07 -08:00
  • 9ebba91466 Use plugin namespace access pattern throughout codebase James R. Barlow 2025-12-23 02:02:21 -08:00
  • aec995aced Require plugin model registration for namespace access in OCROptions James R. Barlow 2025-12-22 15:09:55 -08:00
  • be425e7405 Refactor pdfinfo: split info.py into focused modules James R. Barlow 2025-12-22 01:27:23 -08:00
  • b4f9673364 Add unit tests for HocrParser, PdfTextRenderer, and OcrElement James R. Barlow 2025-12-21 17:05:49 -08:00
  • 9ea804aff5 Refactor hocrtransform: separate parsing from rendering James R. Barlow 2025-12-21 16:17:22 -08:00
  • e162361d28 Make rotation test more robust James R. Barlow 2025-12-21 14:42:14 -08:00
  • 22d00837e3 WIP box tests James R. Barlow 2025-02-26 14:39:32 -08:00
  • 0faba42d36 test: Don't save local files James R. Barlow 2025-02-12 14:26:18 -08:00
  • 57e2600566 Also process art and bleed boxes James R. Barlow 2025-02-12 12:49:26 -08:00
  • 41758766a1 Test and fix page box issues James R. Barlow 2025-02-11 00:40:01 -08:00
  • 3e46b039ed feat: add use_cropbox parameter to align rasterizer APIs James R. Barlow 2025-12-21 11:29:16 -08:00
  • ae783b4ae6 fix: add thread safety lock to pypdfium plugin James R. Barlow 2025-12-21 10:42:41 -08:00
  • b9f488d65c test: add comprehensive tests for --rasterizer option James R. Barlow 2025-12-21 01:23:04 -08:00
  • ed813cec67 feat: add --rasterizer CLI option to select PDF rasterization backend James R. Barlow 2025-12-21 00:23:00 -08:00
  • 938ce8e285 fix: make pypdfium plugin optional with Ghostscript fallback James R. Barlow 2025-12-20 17:03:35 -08:00