C#: OCR (Optical Character Recognition)

May 3, 2010 by C#  

The past few weeks we've been looking for a suitable OCR solution to integrate into our document management system.

One option we came across involves MODI (Microsoft Office Document Imaging) - a tool available within Microsoft Office 2003 - 2007 (not available in Microsoft Office 2010).

Simply include the MODI Type library (COM Interop) and convert image(s) to text like this:

using MODI;
using System;

class Program
{
    static void Main(string[] args)
    {
        DocumentClass doc = new DocumentClass();
        doc.Create(@"some.tiff");
        doc.OCR(MiLANGUAGES.miLANG_ENGLISH, true, true);

        foreach (Image image in doc.Images)
        {
            Console.WriteLine(image.Layout.Text);
        }
    }
}

Its quite a powerful OCR engine, but the engine behind MODI isn't microsoft based - it is licensed under ScanSoft inc - currently Nuance.

There is one part I do find a bit dodgy though, we found quite a few rather expensive OCR tools out there (from $600), that integrates with MODI - which obviously requires Microsoft Office.

I almost feel that those application belong in the freeware realm - since you already bought a license to the core OCR functionality (via MS Office) and most of the non-OCR (part you will be paying for) seems rather mediocre.

My personal opinion though... ;)


Leave a Comment


Captiva September 30, 2017 by Josh

Is it possible to convert scanned images using Captiva into PDF files to pull data? Otherwise, it takes time to hand enter based on the data shown, whereas, a PDF version that it text selectable, can be copied into Excel, sorted, and using Text to Columns, push certain data into another column to group the data that's needed.

a suitable OCR solution for C# May 29, 2015 by Susanna Moore

This .net ocr library source is a nice share. My project is in c#, <a href="http://www.rasteredge.com/how-to/csharp-imaging/ocr-sdk/">c# ocr examples</a> are provided.

french script mt OCR June 14, 2014 by Anonymous

BeyondOCR OCR Hi, You can try our new software http:// beyondocr.cloudapp.net . It is specially build for converting French script fonts. Though It is in beta stage now, it converts most of the French Script images properly

french script mt scanned tiff images December 27, 2013 by justin

if the scanned image text is French Script MT font to word converter.If anyone has solution, please reply as soon as possible.

February 6, 2012 by Christoff Truter

Isnt this exactly what the example code does?

February 6, 2012 by Moiz

can anyone tell me that how i can do this .. how should i start the code in C# for image to text conversion.. ???

February 6, 2012 by Moiz

can anyone tell me that how i can do this .. how shoould i start the code in C3 for image to text conversion.. ???

February 1, 2012 by Zamir

I have a similar sample here: http://zamirsblog.blogspot.com/2010/12/ocr-using-ms-office.html

OCR for French Script MT convert successfully January 2, 2012 by Sahil

hi frends i m converting french script mt scanned tiff images successfully with 99% acrsy. for more detail contact: dirtymind635@yahoo.com or: 08290729527

Professional Services September 18, 2011 by EMC Visitor

Companies like EMC (Captiva or InputAccel) can convert paper into digital images. I know one company - alicka inc, has professional services team to configure a high-volume scanning/ocr/data capture system at rates much cheaper than EMC's PSG. Their direct link: http://www.alicka.com/professionalservices.html <a href="http://www.alicka.com/professionalservices.html" target="_blank">Captiva Development</a>