On 6/29/2015 3:11 PM, Randall wrote:
> Feel free to reprocess it as you see fit, and repost.
>
> Just for grins, I tried converting all the scans to PNG and making another
> PDF with them. It was 157mb, even without OCR.
>
> Converting to characters (OCR), with images only for the drawings and photos,
> is a very real possibility and will greatly reduce the size. I've done that
> with other documents. But it is a LOT of work, especially if you try to keep
> the formatting something reasonable (tables, titles, headings and bold type
> all tend to get lost) and then proofreading is a pain as well. I've already
> spent far more time on this than it is worth to me, especially since I'm
> giving away the result for free and hard drive prices are down to somewhere
> around 4 _cents_ per gigabyte.
>
>
I will attest that the size can be controlled, but only with
considerable massaging. In the case of the bulletins I published, each
was individually republished on a template with all the bitmaps replaced
by vectors (except for the technical drawings, and even those were
individually reduced in size by erasing accumulated black areas created
by repeated photocopying and using just one bit per pixel instead of
grayscale), and all the bitmapped text was converted by OCR and then
pasted into the documents, then converted to an Interleaf font close to
the original, or inserted by hand if it didn't convert well. All that
required the manipulation of four documents per bulletin--the primary
scan, the OCR version, the one based on one of several templates and the
converted PDF. But, it made the entire collection searchable and kept
the size manageable (~ 20-25K per page). A big pain in the ass to do,
though.
Cheers.
--
Michael Porter
Roswell, NM
Never let anyone drive you crazy when you know it's within walking distance....
** triumphs@autox.team.net **
Archive: http://www.team.net/archive
|