PDA

View Full Version : digitizing books



parkerjfil
03-05-2010, 07:03 AM
I am going to digitize my book collection and have concerns questions about the process: clip the binding, scan the pages, use OCR to convert raster to text.

1. is it really is necessary to destroy the book in order to get a scan that the OCR can convert reliably. I realize that depends on the binding, scanner, print size/font etc...But generally speaking are OCRs good readers. As much as I appreciate the transportation paradox, I would rather like to give the physical book away after digitizing it...

2. Many of my book are rich in graphic content. any suggestions on how to minimize file size while maintaining good resolution of graphic elements? I guess i am wondering if OCRs handle text, images, and white space correctly and if they are all created equal.

3. OK, lastly, many of my books are loaded with annotation, doodles, notes in the margin, etc. I would seriously pull a face like this::w00t: if this content can be parsed.

Thank you very much for your help,

MorphysGhost
06-22-2010, 02:31 PM
Many OCR tools already come with compression algorithms, so you don't have to worry about that. I'm not familiar with the actual digitization process, so I can't help you much beyond that.

As an alternative to your strategy, you could go to the library, rent an ebook, remove the DRM, and rip it from the CD.