Conducting a Keyword Search inside PDF files

Nov 22
17:17

2008

Sam Miller

Sam Miller

  • Share this article on Facebook
  • Share this article on Twitter
  • Share this article on Linkedin

PDF files are common mediums used for uploading documents on the web. But how do you conduct a keyword search inside PDF files as in DOC files?

mediaimage
PDF actually stands for “Portable Document Format”. It was created by Adobe Systems in 1993 for use with their Adobe Readers. It makes uploading documents on the Internet much easier and protects the intellectual property rights of the owner. Being documents,Conducting a Keyword Search inside PDF files Articles there also comes a time when you need to make a keyword search inside PDF files. How exactly do you do that, since PDF files are read only?

Surprisingly, searching for keywords within a PDF document is the same procedure as searching for keywords in a DOC format document, or in any other document format for that matter.

In order to search for keywords, just click Edit on the command bar. You can also press ctrl+shift+F if you feel like doing a little nimble finger work. If you have a sharp eye, you will notice that there is a shortcut search command on one of the bars in the reader window, although this shortcut only appears in certain versions of Adobe Reader. Anyway, getting a new version is easy over the Internet and can be downloaded for free from the website of Adobe Systems.

Sounds simple, right? Wrong. It is possible to conduct keyword searches in PDF files, however only if the text is clear enough. This is because any document, when scanned, can be saved as a PDF file. These documents would include photocopies of hard copy documents, handwritten papers, pictures, pages of books, and a lot more. Text clarity and legibility is important when searching for keywords in PDF files.

The search engine in PDF documents uses a text recognition system that recognizes the characters and letters appearing in the document. This engine can locate, identify and recognize characters and letters appearing in various fonts, sizes, and styles because these fonts have been pre-programmed into the search engine. This is the reason why the search engine of Adobe Reader programs cannot recognize handwritten text. Handwritten text always varies from person to person and to recognize them all would be impossible because that would mean that the search engine will have to recognize over 8 billion handwritten texts. Okay, not really, but it would mean that the engine will have to recognize the handwriting of every user.

Also, since it is impossible to edit documents in a PDF format. There will be a need to convert the file into another format, such as a DOC format for example. There are programs that you can download for free all over the Internet for this. However, you can also save the PDF file as a text document by clicking File on the command bar and selecting “Save as Text”. This will convert the PDF file into a text document you can save and later edit.

When making a keyword search inside PDF files, make sure that the document is clearly copied, if it is a scanned document. Make sure that the letters and characters appear clearly because if they are not, you will just be wasting your time and would be better off making a painstakingly manual search for your keyword.