Poor Search Engine Rankings Caused by Duplicate Content Issues

Jul 25
19:08

2007

Carsten Cumbrowski

Carsten Cumbrowski

  • Share this article on Facebook
  • Share this article on Twitter
  • Share this article on Linkedin

What are Canonical URLs? What is duplicate content? How does it happen? How can you detect it and what can you do about it? A short primer for website owners and webmasters.

mediaimage

Introduction

It is a problem to have identical or near duplicate versions of a webpage available at more than one URL,Poor Search Engine Rankings Caused by Duplicate Content Issues Articles on one web site or many.

This can be an unintentional issue, because of site architecture etc., or something the webmaster is well aware of, such as syndicated contend and scraped/illegally copied content from other websites.

Canonical URLs

The problem with Canonical URLs, which refers to the issue where the site URLs with and without "www." Are valid and return the same content, is not too much of an issue today, because many webmasters are aware of it.

Google even has a tool in its Webmaster Central application for webmasters to specify which version is the primary one -- the one with "www." or the one without. The webmaster does not have to make any changes to a site code for that. He should, though, because other search engines do not provide a mechanism like Google's.

The solution for this is to specify one of the two versions as the "primary one" and 301 redirect requests to the other. You can accomplish this by code within the web site application or by using special ISAPI filters, such as Helicon's "ISAPI Rewrite" on Microsoft IIS web servers or specifying URL rewrites (mod rewrite) in the ".htaccess" file on Apache web servers.

Bigger concerns are things when it comes to pages with the same content but more than one URL because of the sites architecture. There are various reasons for causing duplicate URLs for the same content. I recommend consulting with a SEO firm for an evaluation of your site, if you suspect duplicate content on your own website.

Website Scrapers and Content Theft

It also became much easier for black-hat SEO to create scraper-sites with the increased popularity of RSS, the ease of syndication and aggregation and "mash-ups".

Scraper sites are sites that are thrown together as quickly and as automated as possible to either rank well directly or get users to click on contextual Ads like Google AdSense and generate revenue. The chances are high, because the Ads are the only text that often makes some sense compared to the gibberish produced by the scraper.

Another goal of a scraper site could also be to boost indirectly the ranking of a more hidden site. The scraper site simply links to that other web site from multiple pages.

Those sites are a bad experience for the user in most cases and Search Engines try the get rid of them in their index as good as they can. Because of this struggle, become legit webmasters more often a victim of the circumstances than they should what increases fear and mistrust between search engines and webmasters.

Duplicate Content Detection

There are some duplicate content detection tools for webmasters available. Not just to detect duplicates on your own site, but also stolen content from your site by scrapers and other webmasters. One popular and free web based tool is Copyscape.com.

Legal Steps against Content Theft

There is not very much you can do about scraped and stolen content from your site, but it makes sense in some of the cases to have your lawyer send a DMCA notice to the copyright infringing webmaster and also his hosting provider (if known).

You can download the details of the federal Digital Millennium Copyright Act (DMCA) at the following URL at Loc.gov: http://www.loc.gov/copyright/legislation/dmca.pdf