Commit Graph

70 Commits

Author SHA1 Message Date
James R. Barlow
80ed2117cc Change to SPDX license tracking 2022-07-28 01:10:07 -07:00
James R. Barlow
dc6f1a266a Modernize type annotations 2022-07-23 00:39:24 -07:00
James R. Barlow
13d11e76e5 optimize plugin: solve linearization and "is optimization enabled?" issues 2022-06-13 00:59:41 -07:00
James R. Barlow
b17fb61389 Configure pylint in pyproject and delint 2022-06-12 00:30:44 -07:00
James R. Barlow
0ac15dd0b2 Suppress libxmp DeprecationWarning during test 2022-06-01 00:46:16 -07:00
James R. Barlow
b00fe3dc5d pytest.skip() - remove kwarg entirely, to avoid breaking older pytest and not getting warns from newer pytest 2022-04-14 20:15:00 -07:00
James Barlow
f29fe7f23e Fix Pillow deprecation warnings 2022-04-03 13:30:50 -07:00
James R. Barlow
13af3252ff tests: simplify run_ocrmypdf API 2021-12-06 17:00:25 -08:00
James R. Barlow
f51164aff8 Upgrade test version of pymupdf 2021-11-13 00:53:41 -08:00
James R. Barlow
6f58a14351 pdfa: remove deprecated pkg_resources based access and tests 2021-11-13 00:52:03 -08:00
James R. Barlow
7ba04267b1 Remove shims to support for old versions of pikepdf < 4 2021-11-13 00:43:20 -08:00
James R. Barlow
c725bf79da flake8 delinting 2021-09-21 16:37:03 -07:00
James R. Barlow
e788dde607 tests: eliminate unnecessary mmap 2021-04-07 02:11:31 -07:00
James R. Barlow
aa115a8be3 Remove pytest_helpers_namespace 2021-04-07 01:56:51 -07:00
James R. Barlow
72fa347c38 tests: skip metadata test for two pikepdf versions that warn incorrectly 2020-12-29 01:47:52 -08:00
James R. Barlow
babc76fa74 tests: assert that most patched functions are called
We were not actually checking if functions we patched we called when
expected.
2020-12-28 23:58:33 -08:00
James R. Barlow
3707af3b74 Change pdf.root to pdf.Root 2020-11-03 01:30:31 -08:00
James R. Barlow
aa0ec40102 Change license of all GPLv3 files to MPL-2.0
https://github.com/jbarlow83/OCRmyPDF/issues/600
2020-08-05 00:44:42 -07:00
James R. Barlow
ebfe4f0d29 Fix issue #582 - PDF/A acquires title "Untitled" after conversion 2020-06-20 02:01:16 -07:00
James R. Barlow
7b9025f397 Convert generate_pdfa to plugin 2020-06-08 22:28:38 -07:00
James R. Barlow
b109445215 Move Ghostscript rasterize_pdf to plugin 2020-06-08 17:10:27 -07:00
James R. Barlow
1598f2f0e5 Abolish spoof_tesseract_noop 2020-06-01 03:07:53 -07:00
James R. Barlow
9af94ac9b7 pipeline: use OCR engine abstraction instead of Tesseract 2020-05-16 01:28:56 -07:00
James R. Barlow
977665d2b6 Delint some tests 2020-05-08 03:49:33 -07:00
James R. Barlow
c85278b31d Delinting 2020-05-03 00:53:29 -07:00
James R. Barlow
5dbc080fa0 Rename PDFContext->PdfContext 2020-05-02 04:32:46 -07:00
James R. Barlow
e02f6c1e97 Support plugin invocation with API 2020-05-02 03:34:31 -07:00
James R. Barlow
b3b61c152c Handle malformed DocumentInfo (#497)
User submitted a PDF in which /Trailer /Info pointed to the XMP metadata
block instead of a DocumentInfo dictionary. Fix and add test.
2020-03-03 03:27:01 -08:00
James R. Barlow
4a27124eab Simplify metadata for invalid xml in output
Removes possibly non-free resource enron1.pdf.
2020-02-12 00:07:18 -08:00
James R. Barlow
c5edff2c2f Sort imports 2019-12-19 15:31:18 -08:00
James R. Barlow
a3726e4ce3 Fix test_metadata: use mmap in a Windows and POSIX compatible way 2019-12-04 17:13:52 -08:00
James R. Barlow
6fbeb6347d Merge api (without plugins) 2019-07-27 02:04:01 -07:00
James R. Barlow
12769b96e5 Drop support for omitting pdfminer.six 2019-07-10 13:37:01 -07:00
James R. Barlow
fb933edc0f Use newer pytest tmp_path API 2019-06-01 01:55:51 -07:00
James R. Barlow
ef1ef1cdf0 Fix test invalidated by Python 3.6 logging fixes 2019-05-17 15:20:07 -07:00
James R. Barlow
c904b430b6 Merge master into api branch; all test pass 2019-05-14 16:33:02 -07:00
James R. Barlow
482cb788ed Don't use MagicMock() as a dummy logger in pytest 2019-05-11 12:44:17 -07:00
mawi
c92ccc6134 fix: tests 2019-04-08 14:57:42 +02:00
mawi
783a128bd1 feat: move to sync (none ETL) implementation - remove ruffus 2019-04-04 21:02:38 +02:00
James R. Barlow
3f1d9ef99c Fix tests for move to Alpine dockerfile 2019-02-26 12:30:21 -08:00
James R. Barlow
f34b3015b2 Prevent Ghostscript from generating invalid XMP metadata
If DocumentInfo contains NULs Ghostscript will generate XMP with
NULs which is not allowed. Repair DocumentInfo before Ghostscript sees it.
2019-01-04 13:20:41 -08:00
James R. Barlow
7d330afd81 Delinting 2019-01-02 13:34:45 -08:00
James R. Barlow
c771938907 Convert to f-strings where it makes sense 2018-12-31 15:01:19 -08:00
James R. Barlow
cfc5cdf47d pdfa: remove a pile of deprecated code
It's now handled in pikepdf.
2018-12-31 00:05:13 -08:00
James R. Barlow
0880b16491 Sort imports with isort 2018-12-30 01:28:15 -08:00
James R. Barlow
06308a22ce Reformat with black 2018-12-30 01:27:49 -08:00
James R. Barlow
72b920eb16 Drop support for Python 3.5 2018-12-30 00:23:26 -08:00
James R. Barlow
b4a51907d6 Detect when metadata is dropped during PDF/A conversion 2018-12-30 00:13:25 -08:00
James R. Barlow
ed9bb985e2 Fix pikepdf 0.9.0 2018-12-14 23:21:13 -08:00
James R. Barlow
632dab2cc0 Replace Ghostscript DOCINFO and fix 9.25 metadata date regression
We no longer use Ghostscript to manage PDF metadata, instead
omitting the DOCINFO segment from the pdfmark file we generate.

Instead all of the relevant metadata code has been migrated to pikepdf,
and we use that API. This should be more consistent and fixes the
Ghostscript version-depedent quirks.

Also removes our python-xmp-toolkit dependency, except for
testing.
2018-12-13 18:13:30 -08:00