Tuesday, December 05, 2006

Simple OCR

If you are a slow typer like me & want to put text from a document into your genealogy you might want to think about an OCR (Optical Character Recognition) program. The commercial versions can be pricy for people on a limited income. All I have found so far for free programs that works with Windows is SimpleOCR http://www.simpleocr.com/ .

One thing I DON’T like about this program is there no way to select a text area you want, just select to ignore an area. But you can choose that area you want when scanning the document, which is probably easier. “Selecting picture region” doesn’t give very good results, so if you need to include a picture scan it separately & insert it into your document later. When converting the image into text it does show the original image very well, I do like that. It does support TIFF, BMP & JPG if you already have the document as an image file.

If you can afford a good OCR program I do recommend that, if you want free this is the best I have found so far.

Note: Good news Google has released Tesseract OCR into Open Source. I don’t see anyway to run it in windows yet, let’s hope soon. More info here http://google-code-updates.blogspot.com/2006/08/announcing-tesseract-ocr.html

1 comment:

Anonymous said...

SimpleOCR is now rather obsolete. You can now get a MUCH more advanced OCR system called TopOCR. TopOCR can be used with a scanner or even a digital camera. It can produce text output in searchable PDF or HTML or plain text, and even has a Text To Speech interface. This is the by FAR the best Free OCR available today. Check it out at www.topocr.com