Commit Graph

11 Commits

Author SHA1 Message Date
James R. Barlow
aa10a70d70 Rebuild test cache due to hocr output change 2021-08-01 01:00:05 -07:00
James R. Barlow
390fdf8c05 Package OCR in Form XObject
Should improve results in some situations where the initial content
stream is messy or not well-formed.
2021-01-31 19:27:25 -08:00
James R. Barlow
06ab114aa8 Update test cache 2020-06-22 16:31:34 -07:00
James R. Barlow
991db17fde Remove Ghostscript-based text extraction
While faster than Python based methods, we've outgrown the limited
amount of information Ghostscript provides with this feature, and it
repeats an analysis we have to do anyway to learn what images are
present.
2020-04-26 04:02:07 -07:00
James R. Barlow
4340ad9f12 Update test cache 2019-05-17 01:45:06 -07:00
James R. Barlow
80bd7de580 Generate test cache 2018-12-30 01:02:37 -08:00
James R. Barlow
d4cbef9457 Update test cache with naming rule change 2018-06-29 12:04:20 -07:00
James R. Barlow
b81daf71d1 Regenerate test cache 2018-06-23 02:02:58 -07:00
James R. Barlow
3254315127 Update test cache 2018-05-11 12:19:50 -07:00
James R. Barlow
ba0535e3fb Update test cache to account for unpaper --layout none change 2018-04-12 00:48:21 -07:00
James R. Barlow
ca51514046 Add test cache 2018-03-24 23:50:41 -07:00