Duplicate content filter: read how to avoid it

Apr 1
07:07

2010

Fabio Pizzolitto

Fabio Pizzolitto

  • Share this article on Facebook
  • Share this article on Twitter
  • Share this article on Linkedin

With this article I provide some practical tips to avoid that your webiste pages could be penalized by Google due to not original content. Duplicate content is not only illegal but it also could produce a heavy penalty for your website. Follow these tips and your site will be listed by all search engines and it will receive more genuine traffic

mediaimage

More and more webmasters create sites based on publicly available information,Duplicate content filter: read how to avoid it Articles like data feeds, articles, news or blogs. The result is the proliferation of sites with duplicate content, which provide information already available on the web.

In case that these sites are based only on news or data feeds via RSS feeds, their content can be completely equal to the source sites, except for the website structure and design.

Many copies of the same material in the search engine archive don’t represent a good thing, so Google has decided to delete duplicate content in order to provide search results more valuable for its users.

Entirely copied and duplicated sites were most affected. If a webmaster publish the same content on multiple domains (even inadvertently, with a simple activated domain alia), he could be penalized or all the duplicate pages of his website will be deleted by search engines. Sites like affiliate programs have suffered a significant decline in their positioning in Google results. Specialized forums on the This phenomenon began to worry the webmasters in all specialized forum, most of whom agreed that new filters were applied against duplicate content.

The duplicate content are not always illegal or unnecessary. Reproduction of awareness material, press releases and the spread of news via rss are entirely lawful procedures. Google has a dedicated advertising program to be used for feeds; so Google has a direct interest to not penalize the aggregated pages. The experience is rather different: the pages with news feeds are hard to be index-linked, they are often penalized, taking from overall ranking of the website.

How do you know if a page has been flagged as "duplicate page"?


There is a simple way to know if your pages have been flagged as “duplicate page”: copy a whole paragraph of the page and paste it into the search box of Google. If you find the original text among the first results, but your page does not appear, either your page has not been indexed yet or the page has been removed. Click also on the link "repeat the search included the omitted results". There's a good chance that your duplicate page appears among these results; this means that your page has been removed from the index, it is penalized and hidden due to the anti-duplication filter.



How to avoid duplicate content filter?


Without defending the plagiarism, there are several ways to acquire and reuse existing content to create new content; this can be useful not only for the reader, but also effective against duplicate content filters.


1. Add unique content to pages that include duplicate content


On pages where you use duplicate content you try to add unique content. Not only few keywords or a navigation bar with links, but you need to add information to cover about 20-30% of the article/post with original information. This may reduce the risk to receive duplicate pages flags.


2. Random Content


You have surely seen those boxes with the "Tip of the Day." Indication. This is a script that adds a random content at fixed periods: when the crawler returns to your page, it will find new content. A script like this can be used to display many other information. Changing few lines of source code it’s possible to use this script to give the idea that this page is frequently updated. This could be another solution to avoid Google duplicate content filter.



3. Unique content

Certainly, the unique content is the surest way for the success of your website, surely for its placement in search engines. Even if sometimes you can not avoid adding information already present on the web, the target must be to have at least half of your site characterized by original content. If the proportion between original and duplicate content is well balanced, then the anti-duplication filter penalization will have less effects. Some pages may be penalized, but their negative effect will not influence the website too much.



Finally, not only to avoid duplicate content, but also to improve the Google PageRank of your pages, don’t try to index your website page both with www version and without www using mod_rewrite or other redirection methods.


Will this ensure having a good ranking? It is not known. A good website has to provide only original content. Anything that can be found elsewhere has no particular reason to be online. And if your target is not to receive search engines visit using duplicate pages, then remember to keep these pages in a separate area, and use the robots.txt file to avoid indexing them