How to Convert Scanned PDF Documents to Plain Text using Google Docs

If you have lots of invoices, business letters, contracts and other important paper documents that take up your office and work space, then the scanner is your best friend. You can scan any paper document to convert it to an electronic format and save it to your computer. The best and most popular format for archiving and storing documents is PDF.

But, if you need to search or edit these scanned PDFs, you will need OCR software to convert them to formats such as Word, Excel, PowerPoint. Sometimes, you may want to extract text from images or PDF documents and OCR software is the only way to extract or convert the image into a plain text file. There are many powerful paid OCR software solutions out there, but if you need OCR from time to time, for a random conversion, you may want to consider some free solutions that will be able to get the job done acceptably.

One simple, free way to convert your scanned PDFs to text is by using Google Docs.

Convert Your Scanned PDF’s To Plain Text With Google Docs

To make your scanned PDFs usable and searchable, follow these  steps:

1. Visit Google Docs and login to your Google account.

2. Click on the upload button, choose Settings and make sure to check the following two options

  • Convert documents, presentations, spreadsheets and drawings to the corresponding Google Docs format
  • Convert Text from uploaded PDF and image files.

Convert Text From Uploaded Text And Image Files

3. Then simply upload your file and wait a few seconds to get the converted file. Look for the file in All Items; it will probably be at the top.

That’s all. If you want to see the result, here is a screenshot of the scanned PDF that I needed to convert to text format:

Example of a scanned paper document

And here is the image of my converted file into plain text:

Convert a text file into Google Docs

 

As you can see, the result of conversion is not perfect, but is certainly worth the money I paid for it.  Actually, it’s free to be more precise

You can download your document from Google docs in a variety of formats such as Microsoft Word, text, rich text, open document (.odt) or even PDF.

Once downloaded in the desired, editable format, you can rework and format it, as well as copy and paste from it. Or, you can simply archive it as searchable PDF. Archiving important business documents, textbooks or interesting articles in electronic format helps you to unclutter your desk and shelves, but at the same time it also saves you time when you have to search for an exact contract or invoice.

 Using  Optical Character Recognition (OCR) In Google Docs To Extract Text From PDF And Scanned Documents

If you are using optical character recognition (OCR) in Google Docs to extract the text from your PDFs, it is good to know the following 5 things for achieving the best possible results:

  • The maximum size of the uploaded image file is 2MB. Google Docs extracts text from the first 10 pages of the PDF document.
  • Google Docs OCR engine support for non-Latin character sets is pretty limited, as it is still in its experimental phase.
  • Files that include common fonts like Times New Roman and Arial will produce better results.
  • The best conversion results are achieved with high-resolution files (at least 10px of height for each line of text is highly recommended).
  • If your image PDF includes bulleted and numbered lists, tables, footnotes, text columns and similar formatting elements, they are probably going to be lost in the conversion process.

This is a guest article by David Lazar who blogs at PdfConverter.com. With a background in journalism, he enjoys writing and following all the latest trends related to technology and new media. 

Related articles

Top 10 Live TV Streaming Sites To Watch News, Sports, Movies

Best Wireless Routers With 5 GHz Band

Best 6 Wireless Routers With 5 GHz Band 

Best Mesh Wi-Fi Routers For Home

Best Mesh Wi-Fi Routers For Home Of 2020

Internet Is Connected But Unable To Use Error On Windows

Internet Is Connected But Unable To Use Error On Windows 10

Leave a Reply

Your email address will not be published. Required fields are marked *