Commit Graph

16 Commits

Author SHA1 Message Date
James R. Barlow
445617a1a5 Rebuild cache for hocr default case 2023-12-03 15:16:18 -08:00
James R. Barlow
68bb38d0ad pdf_to_hocr: improve plugin handling 2023-10-24 00:52:31 -07:00
James R. Barlow
146da79c00 Regenerate test cache 2023-09-21 00:24:55 -07:00
James R. Barlow
036afc4d88 Update cache, related to previous apparently 2021-11-12 23:57:50 -08:00
James R. Barlow
a55ab05d16 Replace leptonica deskew with tesseract find skew and pillow rotate
Also rebuild the cache.
2021-11-12 16:35:08 -08:00
James R. Barlow
aa10a70d70 Rebuild test cache due to hocr output change 2021-08-01 01:00:05 -07:00
James R. Barlow
390fdf8c05 Package OCR in Form XObject
Should improve results in some situations where the initial content
stream is messy or not well-formed.
2021-01-31 19:27:25 -08:00
James R. Barlow
06ab114aa8 Update test cache 2020-06-22 16:31:34 -07:00
James R. Barlow
991db17fde Remove Ghostscript-based text extraction
While faster than Python based methods, we've outgrown the limited
amount of information Ghostscript provides with this feature, and it
repeats an analysis we have to do anyway to learn what images are
present.
2020-04-26 04:02:07 -07:00
James R. Barlow
4340ad9f12 Update test cache 2019-05-17 01:45:06 -07:00
James R. Barlow
80bd7de580 Generate test cache 2018-12-30 01:02:37 -08:00
James R. Barlow
d4cbef9457 Update test cache with naming rule change 2018-06-29 12:04:20 -07:00
James R. Barlow
b81daf71d1 Regenerate test cache 2018-06-23 02:02:58 -07:00
James R. Barlow
3254315127 Update test cache 2018-05-11 12:19:50 -07:00
James R. Barlow
ba0535e3fb Update test cache to account for unpaper --layout none change 2018-04-12 00:48:21 -07:00
James R. Barlow
ca51514046 Add test cache 2018-03-24 23:50:41 -07:00