Thursday, May 21, 2009

Convert PDF documents to Word

I occasionally get inquiries about how to convert a PDF document to the Microsoft Word format. This can be a tricky problem.

Going in the other direction (Word to PDF) is easy: upload the document to iSITE and then right-click on the document and select the Convert > PDF option on the Site Builder screen.

But if you're starting with a PDF file, then it's not so simple:
  • First, was the original document (before it was first converted to PDF format) a word-processed document (i.e., an electronic file) or was it a paper document that was scanned and saved in the PDF format? If the latter (a paper original), you are out of luck. This type of PDF file is only an image, and the only way to get word-processable text out of it is by using optical character recognition (OCR) software. (Word 2007 does have built-in OCR capabilities, but that is a topic for another article.)

  • If the PDF was created from a word-processing file (like a Microsoft Word doc), you can use the Adobe Acrobat application (either Standard or Professional versions) to convert the PDF file back to the .doc format. (Please note: the full Adobe Acrobat application is not the same as the free Acrobat Reader utility that is installed on most computers. Reader will only enable you to open and read PDF files, but it will not allow you to convert to a different format.)

If you don't have the full Adobe Acrobat application on your computer (and it's expensive, so don't bother requesting it unless you have an ongoing need for its added features), there is another option available: Go to www.pdftoword.com/ and use their free online service. Just upload your PDF document and give them your email address and they will email the converted Word version back to you.

I tried this with a 103-page PDF file, and the results, although not perfect, were better than a similar conversion performed by Adobe Acrobat. Here are some of the issues I encountered:
  • First, it took some time: I uploaded my file in the middle of the day and did not receive my converted file until the following morning.

  • Then, there were some font substitutions. There were checkboxes in the original (typically rendered using the Wingdings font), but the conversion utility apparently did not recognize the font or character and substituted something else.

  • Finally, the page breaks didn't match, resulting in more pages in the converted document than in the PDF. This happened because the original author had used line spaces to move text to the next page (instead of using Word's page break feature); since the number of lines per page is a printer-specific setting, the extra lines did not fit the page (on my computer) resulting in the addition of extra blank pages.

No comments:

Post a Comment