Commit Graph

106 Commits

Author SHA1 Message Date
James R. Barlow
16c2604a07 Remove lossy JBIG2 support, retain lossless JBIG2 only
Lossy JBIG2 has been removed due to well-documented risks of character
substitution errors (e.g., 6/8 confusion). The --jbig2-lossy and
--jbig2-page-group-size arguments are now deprecated and ignored with
a warning.

Changes:
- Remove jbig2_lossy and jbig2_page_group_size from OCROptions
- Simplify optimize.py to use single-image JBIG2 encoding only
  (no symbol dictionaries/JBIG2Globals)
- Remove convert_group() from jbig2enc.py
- Deprecate CLI args with warnings for backward compatibility
- Update documentation to explain lossless-only JBIG2
2025-12-23 02:45:07 -08:00
James R. Barlow
1f493ba789 refactor: post-AI code cleanup 2025-12-21 12:21:47 -08:00
5HT2
650ca1c65b docs: Update screencast demo output to have corrected references to PDF/A compliance levels
See a7b0c0df6c for more information
2025-08-31 20:54:08 +01:00
James R. Barlow
4fc0c3a0d5 Add watcher test, such as it is 2025-08-13 01:04:58 -07:00
PunkPangolin
ee3da07710 Add appstream metainfo file + screenshot (#1462)
* Add io.ocrmypdf.ocrmypdf.metainfo.xml

* Create sample_screenshot.png

* Better screenshot

* Add screenshot to metainfo

* Move into /misc/flatpak

* Add screenshot URL

* Add icon and categories to metainfo

* Use installed icon instead of remote

* Add keywords to metainfo, change summary closer to Flathub Guildelines
2025-05-27 00:42:47 -07:00
James R. Barlow
6f16d0130a Clarify that ocrmypdf-compare is a testing tool 2025-04-15 00:03:14 -07:00
James R. Barlow
d84c47816c webservice: promote pages to primary option 2025-04-06 01:07:47 -07:00
James R. Barlow
6de6749062 webservice: fix download button downloads wrong file 2025-02-26 18:42:50 -08:00
James R. Barlow
b5bc1d209c Remove ttyd 2025-02-26 14:53:13 -08:00
James R. Barlow
f02353686d s/input/output 2025-01-04 12:18:07 -08:00
James R. Barlow
073a434ab3 Fix webservice interactions with Docker 2025-01-04 12:09:32 -08:00
James R. Barlow
55e7177dbe Present similar interface in webservice.py 2025-01-04 01:04:58 -08:00
James R. Barlow
36c82e0659 Add debugging helper scripts 2025-01-01 18:03:15 -08:00
James R. Barlow
dd6ed4c5f8 Switch to streamlit based web app 2025-01-01 17:26:22 -08:00
James R. Barlow
a1b8113d56 Add bisect script 2024-11-08 11:09:13 -08:00
James R. Barlow
d5ff7f7db9 batch: fix issues flagged by ruff 2024-05-21 01:52:57 -07:00
James R. Barlow
579cef3649 watcher: Ensure output files are .pdf 2024-05-21 01:51:30 -07:00
James R. Barlow
065bddbc6c Reformat with ruff format 2024-04-07 00:25:32 -07:00
NilsRo
feeb9f213f batch example: added archive, small corrections and optimizations (#1277)
* Added archive, small corrections

Added a function to archive originals and avoid calling ocrmypdf if they are still is PDF/A.

* Added Copyright
2024-03-18 13:22:24 -07:00
James R. Barlow
8d30cff4ef Undo future annotations from watcher.py till Typer fixes its issue
Fixes #1258
2024-02-20 19:14:39 -08:00
James R. Barlow
3a3635f7f9 Python 3.10 cleanup, manual fixes 2024-02-14 12:48:17 -08:00
James R. Barlow
f69267bb67 watcher: restore ability to read json from file or command line string 2023-11-07 18:05:29 -08:00
James R. Barlow
55566d9830 Fix watcher.py kwarg error 2023-11-05 13:58:24 -08:00
James R. Barlow
52d99732b1 Fix mistakes with watcher loglevel handling 2023-10-28 00:47:40 -07:00
James R. Barlow
c6be3ba076 watcher: Improve parameter validation 2023-10-20 20:11:00 -07:00
James R. Barlow
0565cb0b10 misc/watcher.py: use Typer and dotenv to improve ease of use 2023-10-20 19:56:39 -07:00
James R. Barlow
dc49906704 Improve wait_for_file_ready loop 2023-10-20 19:55:50 -07:00
James R. Barlow
0388c23ae7 Merge branch 'feature/jbig2thresh' into v15 2023-09-21 00:07:05 -07:00
James R. Barlow
be12f7a728 Make fish completion a bit smarter 2023-09-20 14:45:22 -07:00
James R. Barlow
e3c813fc67 Added support for changing color conversion strategy 2023-09-20 01:08:15 -07:00
James R. Barlow
330352aeed Update completions for jbig2 threshold 2023-09-17 14:47:46 -07:00
Srikar Sundaram
4bee7355e9 Change skip-ocr to skip-text (#1146) 2023-09-14 17:22:34 -07:00
James R. Barlow
a6ce35b13a Add argument to override digital signatures 2023-08-12 01:31:36 -07:00
James R. Barlow
e44a57aec0 Try a screencast/terminal demo 2023-06-20 00:48:42 -07:00
James R. Barlow
33b70be7d5 ruff: more fixes, mainly missing docstrings 2023-04-14 02:16:38 -07:00
James R. Barlow
4924b11b6b Additional ruff fixes 2023-04-14 01:25:16 -07:00
James R. Barlow
9b8d14d16e Accept most of ruff's delinting 2023-04-14 00:45:34 -07:00
comzine
2685f910b1 watcher: added setting RETRIES_LOADING_FILE to avoid giving up to early (#1063) 2023-01-25 17:36:54 -08:00
Doug Rinckes
d09f61d4fe log completion message (#1044)
This logs the "done" message if neither delete nor archive options are set.
2022-12-14 17:24:41 -08:00
James R. Barlow
7da4e6ca7f Address some linter warnings 2022-09-21 00:05:12 -07:00
James R. Barlow
4b9ea40a0c spdx: move identifiers to files that support them
If the apparent license changed, take this commit as correct.
2022-08-04 03:26:54 -07:00
James R. Barlow
80ed2117cc Change to SPDX license tracking 2022-07-28 01:10:07 -07:00
James R. Barlow
dc6f1a266a Modernize type annotations 2022-07-23 00:39:24 -07:00
Julius Bullinger
7cabbb125f watcher: Add an option to archive processed originals (#951)
* watcher: Add an option to archive processed originals

This adds a feature from existing OCRmyPDF watchdog Docker containers like meyay/ocrmypdf-batch and unze/ocrmypdf-watchdog. With this option, the input directory can be kept clean from already processed files, without losing the originals.

* docs: Improve watcher.py's Docker parameters documentation
2022-06-17 15:17:03 -07:00
James Barlow
776ada6713 Upgrade pre-commit and associated tools; various lints 2022-04-03 20:53:01 -07:00
James R. Barlow
0323738ada ocrmypdf.fish: fix indents
[ci skip]
2021-12-06 15:38:27 -08:00
FPille
aae5591f7e Update ocrmypdf.bash completion
Squashed commit of the following:

commit 974de2e8ccad7fd34694f2c3a7a17c64bb52cdab
Merge: a8d7f969 ee04aa72
Author: James R. Barlow <james@purplerock.ca>
Date:   Sat Dec 4 20:22:50 2021 -0800

    Merge branch 'update_bash-completion' of git://github.com/FPille/OCRmyPDF into FPille-update_bash-completion

commit ee04aa7225
Author: FPille <f.pille@gmail.com>
Date:   Thu Oct 14 11:09:23 2021 +0200

    update

commit 76f64537aa
Author: FPille <f.pille@gmail.com>
Date:   Thu Oct 14 11:04:10 2021 +0200

    updated and descriptions for arguments and choices added
    deprecated arguments removed
    bug fix: typo "_init_completion" instead of "_init_completions"

commit de9b93e852
Merge: c23374de 42713b77
Author: Frank <50119297+FPille@users.noreply.github.com>
Date:   Thu Oct 14 08:08:11 2021 +0200

    Merge branch 'jbarlow83:master' into master

commit c23374de81
Merge: 40b2ebcb c409fa58
Author: Frank <50119297+FPille@users.noreply.github.com>
Date:   Wed May 26 20:31:00 2021 +0200

    Merge branch 'jbarlow83:master' into master

commit 40b2ebcb37
Merge: 79c84eef 7e388f59
Author: Frank <50119297+FPille@users.noreply.github.com>
Date:   Sat Jun 1 11:09:07 2019 +0200

    Merge pull request #1 from jbarlow83/master

    update master
2021-12-06 15:38:26 -08:00
James R. Barlow
f91faf9795 Add new argument --tesseract-thresholding to control tesseract thresholding where available
Also add missing test for --tesseract-oem
2021-12-06 15:38:14 -08:00
James R. Barlow
59642a98b2 Disable --remove-background so we can remove leptonica 2021-11-12 23:56:52 -08:00
James R. Barlow
30440104ba Remove --threshold argument
Tesseract is now included better thresholding (binarization) in v5. Users that have
thresholding issues should try that first. If we find further problems
this can be brought back as a plugin.
2021-11-12 20:09:55 -08:00