Web Crawlers – How do they know what you’re looking for?

Jan 19
08:11

2008

Moe Tamani

Moe Tamani

  • Share this article on Facebook
  • Share this article on Twitter
  • Share this article on Linkedin

The internet has truly revolutionized the communications industry. It’s mind-boggling how much is available at the touch of a keyboard. It’s fun to use a search engine and see the list of websites available, websites that originate both in the U.S. and around the world.

mediaimage

  Have you ever wondered how search engine ranking one site over another? They use something called web crawlers. It’s a very descriptive term but it’s only partially accurate. While “web crawler” implies that a little critter scurrying around your computer,Web Crawlers – How do they know what you’re looking for? Articles it is really only an electronic signal that asks a web server for a specific page. Then the web crawler passes the web page data to the search engine’s indexer.

In other words, Web crawlers are programs that methodically browse the internet looking for specific content. They create copies of visited pages. Later, these pages are used by search engines, which index the pages into a huge database so the information can be found quickly. A query processor then uses the database to compare a search term to the information in the database and returns with a list of the websites that theoretically list the most likely matches to the search term.

Web crawlers are used sometimes to perform automated tasks, such as gathering email addresses or other information; checking links or validating URL codes.

A web crawler generally starts by visiting the URLs in a list. It identifies all that links within that URL and add them to its list. It’s easy to see that a web crawler has a massive list of URLs and one web crawler cannot possibly visit all the URLs that exist. Thus each search engine has developed a method for its web crawlers to visit those URLs in an efficient manner, so as to visit as many as possible. Among the factors considered are how many links a URL contains and how popular the site is among web users. There is also a procedure that determines how often a web crawler visits a web site to monitor changes to the website.

 

Of course, each search engine has millions of web crawlers at work at any one time. With all these web crawlers exploring and revisiting websites, it was necessary for search engines to develop methods to avoid overloading specific websites. This is called a “politeness policy” and is intended to keep websites up and running despite the amount of traffic it has. Some web crawlers are also programmed to gather multiple types of data at one time.

 

If you do not want a web site crawled (for example, if it has personal or private information), it is possible to design a “firewall” through which the web crawler won’t go.

While the actual workings of web crawlers are often high technical and confusing to casual web users, it can be fun to learn what the various search engines “name” their web crawlers. For instance, Yahoo calls its web crawler “Slurp” and Altavista calls its crawler, “Scooter” and Google’s is called “Googlebot”.

If you have a website, it is worthwhile to research each search engine to determine how to increase the ranking of your website there. By reading and following each site’s tips and guidelines, you can improve the chances of a web crawler visiting your site and improve how your site meets each search engine’s rating parameters.

While search engines won’t divulge each of the parameters it uses (Google says it uses more than 200 of them), each engine has pages, blogs and other materials to help you improve your rankings. After all, search engines are businesses too. By helping you improve your website, they are also helping their business as well.

 There are also professional consultants and companies who also help to improve your website’s rankings. While they will consider a variety of factors, they will also help compare your site to each web crawler’s operating parameters. Sometimes fixing a simple but technical problem can help improve how a Web Crawler reacts in your site.

 Isn’t it incredible that electronic signals can be used to compile information you need in a matter of seconds? Slurp, Scooter and Googlebot are electronic friends that help each of us do our work each day. And you don’t even have to feed them.