Commit Graph

108 Commits

Author SHA1 Message Date
James R. Barlow
5acf21651f ruff lint and format 2026-01-13 01:50:57 -08:00
James R. Barlow
4c7086c609 Replace typer with cyclopts CLI library in misc scripts
Migrate watcher.py and pdf_text_diff.py from typer to cyclopts for
CLI argument parsing. Update pyproject.toml to reflect the dependency
change in the watcher optional feature.
2026-01-13 00:43:14 -08:00
James R. Barlow
16c2604a07 Remove lossy JBIG2 support, retain lossless JBIG2 only
Lossy JBIG2 has been removed due to well-documented risks of character
substitution errors (e.g., 6/8 confusion). The --jbig2-lossy and
--jbig2-page-group-size arguments are now deprecated and ignored with
a warning.

Changes:
- Remove jbig2_lossy and jbig2_page_group_size from OCROptions
- Simplify optimize.py to use single-image JBIG2 encoding only
  (no symbol dictionaries/JBIG2Globals)
- Remove convert_group() from jbig2enc.py
- Deprecate CLI args with warnings for backward compatibility
- Update documentation to explain lossless-only JBIG2
2025-12-23 02:45:07 -08:00
James R. Barlow
1f493ba789 refactor: post-AI code cleanup 2025-12-21 12:21:47 -08:00
5HT2
650ca1c65b docs: Update screencast demo output to have corrected references to PDF/A compliance levels
See a7b0c0df6c for more information
2025-08-31 20:54:08 +01:00
James R. Barlow
4fc0c3a0d5 Add watcher test, such as it is 2025-08-13 01:04:58 -07:00
PunkPangolin
ee3da07710 Add appstream metainfo file + screenshot (#1462)
* Add io.ocrmypdf.ocrmypdf.metainfo.xml

* Create sample_screenshot.png

* Better screenshot

* Add screenshot to metainfo

* Move into /misc/flatpak

* Add screenshot URL

* Add icon and categories to metainfo

* Use installed icon instead of remote

* Add keywords to metainfo, change summary closer to Flathub Guildelines
2025-05-27 00:42:47 -07:00
James R. Barlow
6f16d0130a Clarify that ocrmypdf-compare is a testing tool 2025-04-15 00:03:14 -07:00
James R. Barlow
d84c47816c webservice: promote pages to primary option 2025-04-06 01:07:47 -07:00
James R. Barlow
6de6749062 webservice: fix download button downloads wrong file 2025-02-26 18:42:50 -08:00
James R. Barlow
b5bc1d209c Remove ttyd 2025-02-26 14:53:13 -08:00
James R. Barlow
f02353686d s/input/output 2025-01-04 12:18:07 -08:00
James R. Barlow
073a434ab3 Fix webservice interactions with Docker 2025-01-04 12:09:32 -08:00
James R. Barlow
55e7177dbe Present similar interface in webservice.py 2025-01-04 01:04:58 -08:00
James R. Barlow
36c82e0659 Add debugging helper scripts 2025-01-01 18:03:15 -08:00
James R. Barlow
dd6ed4c5f8 Switch to streamlit based web app 2025-01-01 17:26:22 -08:00
James R. Barlow
a1b8113d56 Add bisect script 2024-11-08 11:09:13 -08:00
James R. Barlow
d5ff7f7db9 batch: fix issues flagged by ruff 2024-05-21 01:52:57 -07:00
James R. Barlow
579cef3649 watcher: Ensure output files are .pdf 2024-05-21 01:51:30 -07:00
James R. Barlow
065bddbc6c Reformat with ruff format 2024-04-07 00:25:32 -07:00
NilsRo
feeb9f213f batch example: added archive, small corrections and optimizations (#1277)
* Added archive, small corrections

Added a function to archive originals and avoid calling ocrmypdf if they are still is PDF/A.

* Added Copyright
2024-03-18 13:22:24 -07:00
James R. Barlow
8d30cff4ef Undo future annotations from watcher.py till Typer fixes its issue
Fixes #1258
2024-02-20 19:14:39 -08:00
James R. Barlow
3a3635f7f9 Python 3.10 cleanup, manual fixes 2024-02-14 12:48:17 -08:00
James R. Barlow
f69267bb67 watcher: restore ability to read json from file or command line string 2023-11-07 18:05:29 -08:00
James R. Barlow
55566d9830 Fix watcher.py kwarg error 2023-11-05 13:58:24 -08:00
James R. Barlow
52d99732b1 Fix mistakes with watcher loglevel handling 2023-10-28 00:47:40 -07:00
James R. Barlow
c6be3ba076 watcher: Improve parameter validation 2023-10-20 20:11:00 -07:00
James R. Barlow
0565cb0b10 misc/watcher.py: use Typer and dotenv to improve ease of use 2023-10-20 19:56:39 -07:00
James R. Barlow
dc49906704 Improve wait_for_file_ready loop 2023-10-20 19:55:50 -07:00
James R. Barlow
0388c23ae7 Merge branch 'feature/jbig2thresh' into v15 2023-09-21 00:07:05 -07:00
James R. Barlow
be12f7a728 Make fish completion a bit smarter 2023-09-20 14:45:22 -07:00
James R. Barlow
e3c813fc67 Added support for changing color conversion strategy 2023-09-20 01:08:15 -07:00
James R. Barlow
330352aeed Update completions for jbig2 threshold 2023-09-17 14:47:46 -07:00
Srikar Sundaram
4bee7355e9 Change skip-ocr to skip-text (#1146) 2023-09-14 17:22:34 -07:00
James R. Barlow
a6ce35b13a Add argument to override digital signatures 2023-08-12 01:31:36 -07:00
James R. Barlow
e44a57aec0 Try a screencast/terminal demo 2023-06-20 00:48:42 -07:00
James R. Barlow
33b70be7d5 ruff: more fixes, mainly missing docstrings 2023-04-14 02:16:38 -07:00
James R. Barlow
4924b11b6b Additional ruff fixes 2023-04-14 01:25:16 -07:00
James R. Barlow
9b8d14d16e Accept most of ruff's delinting 2023-04-14 00:45:34 -07:00
comzine
2685f910b1 watcher: added setting RETRIES_LOADING_FILE to avoid giving up to early (#1063) 2023-01-25 17:36:54 -08:00
Doug Rinckes
d09f61d4fe log completion message (#1044)
This logs the "done" message if neither delete nor archive options are set.
2022-12-14 17:24:41 -08:00
James R. Barlow
7da4e6ca7f Address some linter warnings 2022-09-21 00:05:12 -07:00
James R. Barlow
4b9ea40a0c spdx: move identifiers to files that support them
If the apparent license changed, take this commit as correct.
2022-08-04 03:26:54 -07:00
James R. Barlow
80ed2117cc Change to SPDX license tracking 2022-07-28 01:10:07 -07:00
James R. Barlow
dc6f1a266a Modernize type annotations 2022-07-23 00:39:24 -07:00
Julius Bullinger
7cabbb125f watcher: Add an option to archive processed originals (#951)
* watcher: Add an option to archive processed originals

This adds a feature from existing OCRmyPDF watchdog Docker containers like meyay/ocrmypdf-batch and unze/ocrmypdf-watchdog. With this option, the input directory can be kept clean from already processed files, without losing the originals.

* docs: Improve watcher.py's Docker parameters documentation
2022-06-17 15:17:03 -07:00
James Barlow
776ada6713 Upgrade pre-commit and associated tools; various lints 2022-04-03 20:53:01 -07:00
James R. Barlow
0323738ada ocrmypdf.fish: fix indents
[ci skip]
2021-12-06 15:38:27 -08:00
FPille
aae5591f7e Update ocrmypdf.bash completion
Squashed commit of the following:

commit 974de2e8ccad7fd34694f2c3a7a17c64bb52cdab
Merge: a8d7f969 ee04aa72
Author: James R. Barlow <james@purplerock.ca>
Date:   Sat Dec 4 20:22:50 2021 -0800

    Merge branch 'update_bash-completion' of git://github.com/FPille/OCRmyPDF into FPille-update_bash-completion

commit ee04aa7225
Author: FPille <f.pille@gmail.com>
Date:   Thu Oct 14 11:09:23 2021 +0200

    update

commit 76f64537aa
Author: FPille <f.pille@gmail.com>
Date:   Thu Oct 14 11:04:10 2021 +0200

    updated and descriptions for arguments and choices added
    deprecated arguments removed
    bug fix: typo "_init_completion" instead of "_init_completions"

commit de9b93e852
Merge: c23374de 42713b77
Author: Frank <50119297+FPille@users.noreply.github.com>
Date:   Thu Oct 14 08:08:11 2021 +0200

    Merge branch 'jbarlow83:master' into master

commit c23374de81
Merge: 40b2ebcb c409fa58
Author: Frank <50119297+FPille@users.noreply.github.com>
Date:   Wed May 26 20:31:00 2021 +0200

    Merge branch 'jbarlow83:master' into master

commit 40b2ebcb37
Merge: 79c84eef 7e388f59
Author: Frank <50119297+FPille@users.noreply.github.com>
Date:   Sat Jun 1 11:09:07 2019 +0200

    Merge pull request #1 from jbarlow83/master

    update master
2021-12-06 15:38:26 -08:00
James R. Barlow
f91faf9795 Add new argument --tesseract-thresholding to control tesseract thresholding where available
Also add missing test for --tesseract-oem
2021-12-06 15:38:14 -08:00