Office automation: Converting doc to docx

August 18, 2008 by C#   Microsoft Office  

With the advent of Office 2007, Microsoft switched over to its OpenXML standard for office documents - which is quite a subject in itself, one which I will blog about sometime in the future.

This post however, is about converting older word documents to this new format. I've seen a few sites that actually offer a conversion service (for a fee) - wonder if that's even legal, seeing as Microsoft provides a free tool (ofc.exe) as part of its migration planning manager, which is available from this link.

It's a rather funny utility, which works in conjunction with the Office 2007 compatibility pack.

The compatibility pack mainly enables us to open OpenXML documents in older versions of Office; minus all the new functionality in Office 2007. Click here

Getting back to the ofc tool, you will notice a file called ofc.ini; this file contains a number of settings which you will need to set. Most notably the following highlighted options.

[ConversionOptions] section.
[ConversionOptions]
; FullUpgradeOnOpen: if set to 1, Word documents will be fully converted to the OpenXML format
;                    if set to 0 (default), Word documents will be saved in the OpenXML format in compatibility mode
; Not applicable to Excel or PowerPoint files.
FullUpgradeOnOpen=0

[FoldersToConvert]
; The Converter will attempt to convert all supported files in the specified folders
; (do not include if specifying FileListFolder)
;fldr=C:\Documents and Settings\Administrator\My Documents
fldr=c:\abc

We can alternatively do this programmatically using the Office 2007 Interop assemblies, available here if we want to do a bit more than merely convert it to new standards.

In this example, we're simply going to convert a folder containing older documents, to the new docx:

using Word = Microsoft.Office.Interop.Word;
using System.Reflection;
using System.IO;
class Program
{
    static void Main(string[] args)
    {
        Word._Application application = new Word.Application();
        object missing = Missing.Value;
        object fileformat = Word.WdSaveFormat.wdFormatXMLDocument;
        DirectoryInfo directory = new DirectoryInfo(@"c:\abc");
        foreach (FileInfo file in directory.GetFiles("*.doc", SearchOption.AllDirectories))
        {
            if (file.Extension.ToLower() == ".doc")
            {
                object filename = file.FullName;
                object newfilename = file.FullName.ToLower().Replace(".doc", ".docx");
                Word._Document document = application.Documents.Open(ref filename,
                    ref missing, ref missing, ref missing, ref missing, ref missing,
                    ref missing, ref missing, ref missing, ref missing, ref missing,
                    ref missing, ref missing, ref missing, ref missing, ref missing);
                document.Convert();
                document.SaveAs(ref newfilename, ref fileformat, ref missing, ref missing,
                    ref missing, ref missing, ref missing, ref missing,
                    ref missing, ref missing, ref missing, ref missing,
                    ref missing, ref missing, ref missing, ref missing);
                document.Close(ref missing, ref missing, ref missing);
                document = null;
            }
        }
        application.Quit(ref missing, ref missing, ref missing);
        application = null;
    }
}

Notice "document.Convert()", this method tells the interop assembly that the documents need to be fully converted to the new OpenXML format - something you might want to omit if you're planning to provide support for previous versions of office using the compatibility pack.


Update 2010/09/18
In C# 4.0 there is certain improvements with regards to COM interaction, thanks to the improvements the preceding snippet can be rewritten like this:
static void Main(string[] args)
{
    Word._Application application = new Word.Application();
    object fileformat = Word.WdSaveFormat.wdFormatXMLDocument;
    DirectoryInfo directory = new DirectoryInfo(@"c:\abc");
    foreach (FileInfo file in directory.GetFiles("*.doc", SearchOption.AllDirectories))
    {
        if (file.Extension.ToLower() == ".doc")
        {
            object filename = file.FullName;
            object newfilename = file.FullName.ToLower().Replace(".doc", ".docx");
            Word._Document document = application.Documents.Open(filename);

            document.Convert();
            document.SaveAs(newfilename, fileformat);
            document.Close();
            document = null;
        }
    }
    application.Quit();
    application = null;
}


Leave a Comment


August 20, 2011 by Christoff Truter

This snippet relies on the ms office interop assemblies, which requires ms office to reside on the same machine. So if you're planning to run this code on a server you need office to be installed on the server. You'll also possibly need to set DCOM security settings if you're planning to use this in some kind of service context.

Converting doc to docx August 20, 2011 by Sushant

Hi Is this code will work on Server?If yes then what changes we need to do on server to work?

Some help September 24, 2010 by Silvana

Hey! Congratulations! It's a good article! and the code works... I'm trying something similar, the diference .docx to .doc I need some help about this. I read your coment about document.Convert()... I have understood that method is for docx format... and doc format?? Can you help me? What do I have to do to convert docx into doc??? I did some variants on your code but I don't know if I have to use the same interface _Document, for example. Can you give me a link to read about this topic??? Thank you so much.