May 3, 2010 by Christoff Truter C#
The past few weeks we've been looking for a suitable OCR solution to
integrate into our document management system.
One option we came across involves MODI (Microsoft Office Document Imaging) -
a tool available within Microsoft Office 2003 - 2007 (not available in Microsoft Office 2010).
Simply include the MODI Type library (COM Interop) and convert image(s) to text like this:
using MODI; using System; class Program { static void Main(string[] args) { DocumentClass doc = new DocumentClass(); doc.Create(@"some.tiff"); doc.OCR(MiLANGUAGES.miLANG_ENGLISH, true, true); foreach (Image image in doc.Images) { Console.WriteLine(image.Layout.Text); } } }
OCR API September 17, 2011 by OCR API
Well coding for all fonts and languages is not easy.I think using OCR Cloud 2.0 platform is a good idea.It can convert virtually any image (TIF, JPG, PNG, BMP) or PDF to any standard text-based document type (TXT, DOC, RTF, XLS, PPT, XML, HTML) or searchable PDF.It also has auto-language detection and support for over 200 languages including: Latin based languages Cyrillic based languages Chinese, Japanese, Korean, Thai, and Hebrew. For free developer account signup here-http://www.ocr-it.com/ocr-cloud-2-0-api