OCRmyPDF

mirror of https://github.com/ocrmypdf/OCRmyPDF.git synced 2026-06-11 15:36:11 -04:00

Author	SHA1	Message	Date
James R. Barlow	74d2a156c4	Update cache	2024-01-07 01:35:05 -08:00
James R. Barlow	445617a1a5	Rebuild cache for hocr default case	2023-12-03 15:16:18 -08:00
James R. Barlow	68bb38d0ad	pdf_to_hocr: improve plugin handling	2023-10-24 00:52:31 -07:00
James R. Barlow	146da79c00	Regenerate test cache	2023-09-21 00:24:55 -07:00
James R. Barlow	a5efc4af9b	unpaper: replace input pnm with png Unpaper or its underlying libraries don't seem to accept pnms with an odd integer width. Although it's not clear if this is the issue at all. In any case, keeping the image a PNG works around the issue. unpaper only accepted PNM input in the past, which is why we send it PNM. Since it now accepts PNG, we might as well use PNG. Unpaper can write PNG as output too, but this added a few seconds to the test suite was not committed. Related issues: https://github.com/ocrmypdf/OCRmyPDF/issues/887 https://github.com/ocrmypdf/OCRmyPDF/issues/665 https://github.com/unpaper/unpaper/issues/82	2022-07-03 15:32:16 -07:00
James R. Barlow	ee21bf9ef6	Update cache	2021-12-13 20:45:30 -08:00
James R. Barlow	4c1ff1086c	tess cache: don't include full platform - could be sensitive	2021-12-06 15:38:26 -08:00
James R. Barlow	f91faf9795	Add new argument --tesseract-thresholding to control tesseract thresholding where available Also add missing test for --tesseract-oem	2021-12-06 15:38:14 -08:00
James R. Barlow	036afc4d88	Update cache, related to previous apparently	2021-11-12 23:57:50 -08:00
James R. Barlow	a55ab05d16	Replace leptonica deskew with tesseract find skew and pillow rotate Also rebuild the cache.	2021-11-12 16:35:08 -08:00
James R. Barlow	aa10a70d70	Rebuild test cache due to hocr output change	2021-08-01 01:00:05 -07:00
James R. Barlow	390fdf8c05	Package OCR in Form XObject Should improve results in some situations where the initial content stream is messy or not well-formed.	2021-01-31 19:27:25 -08:00
James R. Barlow	06ab114aa8	Update test cache	2020-06-22 16:31:34 -07:00
James R. Barlow	991db17fde	Remove Ghostscript-based text extraction While faster than Python based methods, we've outgrown the limited amount of information Ghostscript provides with this feature, and it repeats an analysis we have to do anyway to learn what images are present.	2020-04-26 04:02:07 -07:00
James R. Barlow	5e2a7f8a56	tests: speed up several slow tests	2019-12-09 16:17:57 -08:00
James R. Barlow	5f00e4f9d8	Sort imports	2019-07-27 04:51:52 -07:00
James R. Barlow	eb5200d26a	Change most tests to use ocrmypdf API instead of subprocess The main benefit of this is code coverage gains can actually follow it. Also removes most ugly os.environ hacks.	2019-06-03 01:45:27 -07:00
James R. Barlow	4340ad9f12	Update test cache	2019-05-17 01:45:06 -07:00
James R. Barlow	58e6663806	Update test cache for french->german change	2019-03-03 03:23:59 -08:00
James R. Barlow	80bd7de580	Generate test cache	2018-12-30 01:02:37 -08:00
James R. Barlow	d4cbef9457	Update test cache with naming rule change	2018-06-29 12:04:20 -07:00
James R. Barlow	b81daf71d1	Regenerate test cache	2018-06-23 02:02:58 -07:00
James R. Barlow	3254315127	Update test cache	2018-05-11 12:19:50 -07:00
James R. Barlow	ba0535e3fb	Update test cache to account for unpaper --layout none change	2018-04-12 00:48:21 -07:00
James R. Barlow	49fa7f6b5c	tesseract_cache: don't reveal host system file paths in manifest file	2018-04-12 00:47:28 -07:00
James R. Barlow	ca51514046	Add test cache	2018-03-24 23:50:41 -07:00

26 Commits