Commit Graph

  • 9898904be7 Fix pikepdf PdfMatrix deprecation warning; v15.4.3 release notes v15.4.3 James R. Barlow 2023-11-15 20:27:16 -08:00
  • 27d5229842 Make logger names unique v15.4.2 James R. Barlow 2023-11-09 23:03:39 -08:00
  • 4a9a575ef0 ghostscript: better comments James R. Barlow 2023-11-09 22:39:49 -08:00
  • 52fd9a630d v15.4.2 release notes James R. Barlow 2023-11-09 22:35:51 -08:00
  • a596ccf844 Raise exception if resulting PDF might appear blank in a known in some PDF viewers James R. Barlow 2023-11-09 22:33:22 -08:00
  • e7fa97731f ghostscript duplicate filter: filter within a window of previous messages James R. Barlow 2023-11-09 22:32:39 -08:00
  • 290aa28108 Fix error on attempt to write to debug log after removing debug log handler James R. Barlow 2023-11-09 16:02:41 -08:00
  • a95640ed9e v15.4.1 release notes v15.4.1 James R. Barlow 2023-11-07 23:55:14 -08:00
  • f69267bb67 watcher: restore ability to read json from file or command line string James R. Barlow 2023-11-07 13:32:50 -08:00
  • e36d5a309f Make grafting a little bit more configurable James R. Barlow 2023-11-05 14:01:01 -08:00
  • 55566d9830 Fix watcher.py kwarg error James R. Barlow 2023-11-05 13:58:24 -08:00
  • f02ea20678 docs: plugin documentation missing key special members James R. Barlow 2023-11-05 00:10:51 -07:00
  • 372c22d42b docs: improve James R. Barlow 2023-11-04 02:43:08 -07:00
  • 949265bbd0 graft: improve typing and remove procset tracking James R. Barlow 2023-11-04 02:32:39 -07:00
  • 916106733c Skip semfree unless on Linux v15.4.0 James R. Barlow 2023-10-30 00:33:21 -07:00
  • 44bcafd3aa Add missing file header James R. Barlow 2023-10-30 00:28:09 -07:00
  • 71166f7be8 Make hocr API experimental for now James R. Barlow 2023-10-30 00:05:34 -07:00
  • 580252a1a0 Merge branch 'feature/gscan2pdf' James R. Barlow 2023-10-29 23:49:44 -07:00
  • c0b60dae6a build: add repository -y James R. Barlow 2023-10-29 23:41:36 -07:00
  • ae123fd209 Try to retain/copy xattrs v15.3.1 James R. Barlow 2023-10-28 01:42:06 -07:00
  • 454ad0acc5 build/macos: add openssl James R. Barlow 2023-10-28 01:41:28 -07:00
  • 0c306ac328 v15.3.1 release ntoes James R. Barlow 2023-10-28 00:49:37 -07:00
  • 52d99732b1 Fix mistakes with watcher loglevel handling James R. Barlow 2023-10-28 00:47:40 -07:00
  • 5b5827983b Tweak documentation of --output-type James R. Barlow 2023-10-26 23:57:08 -07:00
  • 56f9bc311d Improve verbosity of colorspace selection James R. Barlow 2023-10-25 00:38:56 -07:00
  • eb17dc1ecf Fix pdf save settings at metadata_fixup James R. Barlow 2023-10-25 00:13:17 -07:00
  • 6f8115a052 Fix import of metadata_fixup James R. Barlow 2023-10-25 00:06:26 -07:00
  • aac913c666 tesseract: EAFP James R. Barlow 2023-10-24 13:50:04 -07:00
  • b5e73ac4e4 Drop check for obsolete .dockerinit file James R. Barlow 2023-10-24 13:49:46 -07:00
  • 9e98c90891 docs: note on docker performance James R. Barlow 2023-10-24 13:34:26 -07:00
  • ca2592c1d9 Update draft release notes James R. Barlow 2023-10-21 14:37:27 -07:00
  • a31f17bb9d Update comments and make worker functions private James R. Barlow 2023-10-21 01:34:41 -07:00
  • 1cb46afa94 Update pluginspec docs James R. Barlow 2023-10-19 12:58:24 -07:00
  • 5a759947dd Update release notes so far James R. Barlow 2023-10-19 02:40:54 -07:00
  • db3df13e95 Remove ocrmypdf._sync James R. Barlow 2023-10-19 02:34:56 -07:00
  • 2a8bc03167 optimize: typing James R. Barlow 2023-10-19 01:50:06 -07:00
  • d2297b39d0 info: clarify ICC -> components checking James R. Barlow 2023-10-19 01:49:54 -07:00
  • e4cd081d4d info: clarify pageinfo context management James R. Barlow 2023-10-19 01:49:32 -07:00
  • d2dbea6cf8 Reorganize progress bars so they can be typed properly James R. Barlow 2023-10-19 00:42:10 -07:00
  • 46a279a49a Improve passing of arguments to workers James R. Barlow 2023-10-19 00:20:28 -07:00
  • 299f0c4003 Update dep5 James R. Barlow 2023-10-18 23:31:28 -07:00
  • 9ffb45f283 Remove public domain congress.jpg and replace with baiona_color.jpg James R. Barlow 2023-10-18 23:27:33 -07:00
  • cd61c4efd9 pngquant: remove unused ability to quantize a non-PNG James R. Barlow 2023-10-18 23:22:29 -07:00
  • a06ab2a1c5 unpaper: Remove format conversion James R. Barlow 2023-10-17 03:15:59 -07:00
  • dfa4ebf1a6 Simplify function signature of extract_image_filter James R. Barlow 2023-10-17 03:06:13 -07:00
  • 58f388c69d optimize: better coverage James R. Barlow 2023-10-17 02:41:40 -07:00
  • 990b462a94 Fix coverage settings and cover semfree James R. Barlow 2023-10-17 01:58:37 -07:00
  • b928dc0808 Skip fewer tests James R. Barlow 2023-10-16 00:41:25 -07:00
  • 8916955f45 Convert many run_ocrmypdf -> run_ocrmypdf_api James R. Barlow 2023-10-16 00:30:59 -07:00
  • 82bef40aa6 Eliminate more run_ocrmypdf calls James R. Barlow 2023-10-16 00:06:15 -07:00
  • 1c45f32941 tests: replace many run_ocrmypdf -> run_ocrmypdf_api James R. Barlow 2023-10-15 23:41:35 -07:00
  • fadc0cf69b Replace cryptic test error messages with more informative ones James R. Barlow 2023-10-15 23:25:35 -07:00
  • 7ce9d08b2d Define progress bar plugins formally instead of "tqdm-like" James R. Barlow 2023-10-15 23:21:15 -07:00
  • eb3a51e33a Prefer pikepdf's newer Page.mediabox accessor over .MediaBox James R. Barlow 2023-10-15 23:03:13 -07:00
  • f3dd733773 optimize: explore page container as objects instead of page helpers James R. Barlow 2023-10-15 23:02:01 -07:00
  • 4dbc5e1dba Fix some typing issues James R. Barlow 2023-10-15 22:48:02 -07:00
  • c0637c287e vscode isn't ready for black py312, revert James R. Barlow 2023-10-15 02:01:55 -07:00
  • 6127f7abd6 tqdm_kwargs to progress_kwargs James R. Barlow 2023-10-15 02:01:38 -07:00
  • a4059762e6 Fix hocrtransform test to generate blank hocr James R. Barlow 2023-10-15 01:57:48 -07:00
  • 40afcd68a7 pluginspec: spacing James R. Barlow 2023-10-15 01:37:42 -07:00
  • f238e721ed Improve documentation of new public hOCR APIs James R. Barlow 2023-10-15 01:14:15 -07:00
  • 16eb5627a7 Fix unused imports and other trivia James R. Barlow 2023-10-15 01:13:54 -07:00
  • fbf0674189 hocr_to_ocr_pdf: handle missing hocr json file James R. Barlow 2023-10-15 00:49:58 -07:00
  • 62c4f65fc3 Remove duplicate thread local storage of page numbers James R. Barlow 2023-10-14 20:39:29 -07:00
  • e400112f32 Improve ._pipelines naming James R. Barlow 2023-10-14 20:25:04 -07:00
  • 7935914f55 Use empty .hocr file instead of dummy template for symmetry with sandwich James R. Barlow 2023-10-14 14:41:11 -07:00
  • ad3a1dbbad deps: update PyMuPDF req James R. Barlow 2023-10-14 01:15:49 -07:00
  • 0655f8e7ae Add py312 to black coverage James R. Barlow 2023-10-14 01:03:28 -07:00
  • 04a9372584 docs: some copyediting James R. Barlow 2023-10-14 00:45:04 -07:00
  • b9646b6f85 Enable multiprocessing freeze_support (for Windows) and enable forkserver James R. Barlow 2023-10-14 00:02:54 -07:00
  • 53c953a561 Fix use_threads logic for get_pdfinfo James R. Barlow 2023-10-13 23:41:58 -07:00
  • c278fecb34 Rename post_process -> postprocess James R. Barlow 2023-10-13 19:50:31 -07:00
  • 23951c9e38 Working HOCR folder to PDF converter James R. Barlow 2023-10-13 03:25:12 -07:00
  • e8ae370ceb Eliminate api= kwarg and implicit creation of pluginmanager James R. Barlow 2023-10-13 02:19:08 -07:00
  • 67be4d1904 Refactor CLI exception handling James R. Barlow 2023-10-13 00:25:17 -07:00
  • 6f82097d14 Refactor setup_pipeline to decouple manage_work_folder James R. Barlow 2023-10-12 23:43:49 -07:00
  • fc6f959d21 Refactor debug log and work folder context cleanup James R. Barlow 2023-10-12 23:28:10 -07:00
  • e38d569d8f languages: kwargs are overkill James R. Barlow 2023-10-12 23:27:28 -07:00
  • 0856750ee2 Fix exit code error on Ghostscript failure James R. Barlow 2023-10-12 17:19:05 -07:00
  • 05721ba84a Fix error on no languages available James R. Barlow 2023-10-12 17:04:39 -07:00
  • 38c3422e5e Automatically set document language to OCR language James R. Barlow 2023-10-12 16:28:38 -07:00
  • d153a6f6df Refactor metadata handling James R. Barlow 2023-10-12 16:18:22 -07:00
  • 1a7738a925 Refactor -migrate metadata repair to new module James R. Barlow 2023-10-12 15:52:19 -07:00
  • 8985c0dfe9 Refactor setup_pipeline to context manager James R. Barlow 2023-10-12 16:30:16 -07:00
  • ebfe008432 Refactor logging record thread local storage James R. Barlow 2023-10-12 16:30:16 -07:00
  • 1f16eb6f50 Refactor main pipeline into discrete pipelines James R. Barlow 2023-10-12 16:30:16 -07:00
  • cbb0868ae3 Add hocr to ocr pdf pipeline James R. Barlow 2023-10-12 16:30:16 -07:00
  • 68bb38d0ad pdf_to_hocr: improve plugin handling James R. Barlow 2023-10-12 16:30:16 -07:00
  • 0443e87345 Introduce pdf_to_hocr API James R. Barlow 2023-10-12 16:30:16 -07:00
  • b3de5833d3 Refactor conversion of ocrmypdf.ocr() arguments to cmdline James R. Barlow 2023-10-12 16:30:16 -07:00
  • 95b14ee282 Refactor lossless reconstruction setter into separate function James R. Barlow 2023-10-12 16:30:16 -07:00
  • 07b89e6a19 Plugin manager: set reasonable default when called without params James R. Barlow 2023-10-12 16:30:16 -07:00
  • 8991d2cb33 Refactor main pipeline and start hocr pipeline James R. Barlow 2023-10-12 16:30:16 -07:00
  • 86a20c4130 Refactor exec_page_sync to outputs James R. Barlow 2023-10-12 16:30:16 -07:00
  • 6827a6efe8 Refactor exec_page_sync -> extract _process_page James R. Barlow 2023-10-12 16:30:16 -07:00
  • c6b5332699 Test on release py312 not py312-rc v15.3.0 James R. Barlow 2023-10-21 15:17:07 -07:00
  • 68610046c6 Correct the archive dir name in Watched folders with Docker (#1173) Michael Flagg 2023-10-21 16:59:33 -05:00
  • 880326868d v15.3.0 release notes James R. Barlow 2023-10-21 14:59:09 -07:00
  • c6be3ba076 watcher: Improve parameter validation James R. Barlow 2023-10-20 20:11:00 -07:00
  • 0565cb0b10 misc/watcher.py: use Typer and dotenv to improve ease of use James R. Barlow 2023-10-20 19:56:39 -07:00