What is “robots.txt” and what does it do?

Oct 15
07:23

2010

John M Arthur

John M Arthur

  • Share this article on Facebook
  • Share this article on Twitter
  • Share this article on Linkedin

Learn about the top secret robots.txt file. It is a very important file for search engine optimization.

mediaimage

“Robots.txt” is probably one of the simplest things you will confront in your search engine optimization journey. At its most basic level,What is “robots.txt” and what does it do? Articles it is a text file (as can be seen from its extension) that resides in the root folder of your web hosting service. This file gives search engines a legal permission to access only specific pages of your website. Since you want greater part of your website to be indexed by search engine and only smaller part to be kept hidden, therefore, you mention only those pages in Robots.txt file that you want to keep hidden. Editing or creating “robots.txt” file is extremely easy. This is probably the easiest file that you will edit for your search engine optimization journey.

It happens a lot of time when you don’t want general people (search engine users) to see some pages of your website. You may be selling some premium content on your website that you don’t want anyone to get through search engine. Or there can be some printer friendly pages that you don’t want everyone to see in search engines. In this regard, you will need to create or edit your ‘robots.txt’ file. If your hosting service already includes this file, you will need to edit it. But this is not default. Therefore, you will have to create a file in notepad named “robots”. Save this file with this name and upload it to your hosting service.

Now, what you need to write in that ‘robots.txt’ file in order to keep the search engine away from general searchers. Following is the most basic pattern of ‘robots.txt’ file:

User-Agent: *

Disallow:

The asterisk in front of “user-agent” indicates that this robots.txt file is for every search engine and not specific to any one search engine. There is nothing in front of ‘Disallow’. This means that you do not disallow anything from your website to be indexed by search engine. If you want any specific pages to be not accessed by search engine, you’ll need to edit your file to the following format:

User-Agent: *

Disallow: /directory1/specificpage.html

You should make sure that you put a forward slash at first and then the directory or page name. The whole thing after “Disallow:” is attached to your website domain name and that is not sought by Google or Yahoo search engines.

If you want to address any specific search engine you will have to write the name of the search engine “robot” in front of “User-Agent”. The name of search engine spider robot is “googlebot” and that of Yahoo is “slurp”. In this way, you can handle the behavior of search engine towards the protected and hidden regions of your website.