Search Engine Indexing and Crawling

Feb 8
15:47

2010

Youssef Edward

Youssef Edward

  • Share this article on Facebook
  • Share this article on Twitter
  • Share this article on Linkedin

Search engines are tools that search for certain keywords entered by users in their index. The process of building the index are done by sub-processes called crawling (or spidering) and indexing. Here are an overview of such processes and how are done.

mediaimage

When user enters keywords in the search engine,Search Engine Indexing and Crawling  Articles a list of results are found that best-matches the keywords of the user. These results are found based on a huge amount of pages stored in the database of the search engine. To be able to obtain such results, some processes must be done first to build the database which will be searched then when the user does the search. Mainly two basic sub-processes are done to build the database:

1. Crawling: In the crawling (also called spidering or robotics) process, the search engine begins to discover the web or the pages of the overall sites on the web. It performs this by beginning to download pages from some sites. Regularly, it will begin with some small sites stored in its database. When it crawls some initial site, it will observe the links in those sites. If the search engine discovered new link that is not stored in the initial database, it will append it to the list to be crawled later. In the new links discovered, it may also discover new links that will be appended also to the list of the sites that will be crawled soon. Note that as the sites are crawled, it will update its list of sites that are discovered to be new.These processes are repeated continuously without stopping to discover the changing content on the web. Thus every new link is discovered will lead to crawling of this page and may lead to crawling the entire site. This is because when the spider crawl a page from a site, it will looks also for a links to other pages in the same site as well as links to external sites. Thus, it is important for website owners to build such links to get visibility to search engine. The more links they build, the more frequently their sites will be crawled and updated if it was indexed.2. Indexing: once the search engine collects the pages from the sites crawled in the crawling process, it will feed them to the indexing algorithm. Mainly, the indexing algorithm compares or ranks the related pages with each other so that when users make search for a keyword, it will extract then the pages with the highest rank. Each search engine has its own algorithm for indexing and ranking. when ranked I will be put in the database with the specified rank.One may imagine that only the keywords on the page controls the rank but recently there are a key factor that controls hat ranking which is the backlink. Mainly the concept of backlinks is related to voting and reachability keywords because an existent link to a page mean that the page is good for that site and this it effectively votes for it. Also the page will be reachable from that site. Recently, the search engines are concerned with the concept of reachability, they say that if one browse randomly through the sites on the web, what is the probability of reaching to certain site. This probability mean a higher ranking if it is high and the reverse is true.Thus we find that the ranking depends on the backlinks also.3. Searching: once the database is built, the user can now search and the results are extracted from the search engine database.How to make your website reach the top of the search engines