Recently a coworker of mine is dealing with some filing work and need some text in several PDF documents extracted. He has just been hired for 2 months so basically this is the first time he encounters this type of work and as an inexperienced novice he is a bit confused. He did some lessons and tried to extract these text with an OCR software. But obviously OCR cannot meet the demands and all the text he got was full of spelling mistakes, ridiculously wrong words and weird marks. Finally, he turned to me for help.
Based on my experience, most of the OCR tools are not that reliable and when dealing with PDF documents that look like they are made of editable text instead of scanned images, OCR is the last option. Here are some bits of advice I gave him and maybe you guys can pick something useful out.
There are many PDF editor apps (or PDF reader with editing ability) out there. If you are just seeking a few passages in a PDF to use, go to these tools first. Just open the PDF with the PDF editor you choose, use the searching feature (usually CTRL + F) to find the target text, and select them with your cursor to copy them. Now you can create a new Word file, open it and paste the copied text and save. You may check download.com to search PDF editors.
PDF to Word Converter Software
If you need to convert a bunch of PDF files into Word format. PDF editor/reader may not be efficient enough because mostly you have to open them one by one to copy the content you want. PDF to Word converter is a proper choice for your heavy conversion task. Watch out for the malware if you want to use freeware and for payware, do not pay without trying them first.
Online PDF to Word Converter Service
Free Online PDF to Word Converter services are more and more popular these days. Using them requires no installing of any software so there are no local security issues to concern. Usually there some kinds of limitation; A few of them also provides a downloadable version. Just Google online PDF to Word converter and you will get enough results to handle your PDF.
As I said, OCR is the last option, indeed. It is inefficient and time-consuming when it comes to proofreading and correction. However, it is still irreplaceable if your PDFs are created from images that include text. There are also online OCR tools and offline OCR software. No OCR tools, online or offline, free or paid, can assure the 100% accurate text extraction, so if you have to use OCR, make sure you check the result word by word and make some correction according to the original file.