James R. Barlow
445617a1a5
Rebuild cache for hocr default case
2023-12-03 15:16:18 -08:00
James R. Barlow
68bb38d0ad
pdf_to_hocr: improve plugin handling
2023-10-24 00:52:31 -07:00
James R. Barlow
146da79c00
Regenerate test cache
2023-09-21 00:24:55 -07:00
James R. Barlow
036afc4d88
Update cache, related to previous apparently
2021-11-12 23:57:50 -08:00
James R. Barlow
a55ab05d16
Replace leptonica deskew with tesseract find skew and pillow rotate
...
Also rebuild the cache.
2021-11-12 16:35:08 -08:00
James R. Barlow
aa10a70d70
Rebuild test cache due to hocr output change
2021-08-01 01:00:05 -07:00
James R. Barlow
390fdf8c05
Package OCR in Form XObject
...
Should improve results in some situations where the initial content
stream is messy or not well-formed.
2021-01-31 19:27:25 -08:00
James R. Barlow
06ab114aa8
Update test cache
2020-06-22 16:31:34 -07:00
James R. Barlow
991db17fde
Remove Ghostscript-based text extraction
...
While faster than Python based methods, we've outgrown the limited
amount of information Ghostscript provides with this feature, and it
repeats an analysis we have to do anyway to learn what images are
present.
2020-04-26 04:02:07 -07:00
James R. Barlow
4340ad9f12
Update test cache
2019-05-17 01:45:06 -07:00
James R. Barlow
80bd7de580
Generate test cache
2018-12-30 01:02:37 -08:00
James R. Barlow
d4cbef9457
Update test cache with naming rule change
2018-06-29 12:04:20 -07:00
James R. Barlow
b81daf71d1
Regenerate test cache
2018-06-23 02:02:58 -07:00
James R. Barlow
3254315127
Update test cache
2018-05-11 12:19:50 -07:00
James R. Barlow
ba0535e3fb
Update test cache to account for unpaper --layout none change
2018-04-12 00:48:21 -07:00
James R. Barlow
ca51514046
Add test cache
2018-03-24 23:50:41 -07:00