Duplicate web content: 4 views on why you should care.

Mar 14
22:49

2006

Martin Avis

Martin Avis

  • Share this article on Facebook
  • Share this article on Twitter
  • Share this article on Linkedin

Should you be concerned about having duplicate content on your web pages? How clever are the search engines in spotting duplicate web content and what can do about it? This article examines the latest theories.

mediaimage

One of the biggest questions in Internet marketing at the moment is what exactly constitutes duplicate web content,Duplicate web content: 4 views on why you should care. Articles and how people using private label articles, can avoid being penalized.

As more and more people come to realize that content truly is king online these days, the issue of content and whether it's been used before by other websites has become much more critical. Nobody knows for sure just how much of a penalty Google and the other search engines place upon what they interpret to be duplicate web page content, but the fact that there is some penalty is without question.

Many people fear that by putting articles or content on their web site without making any changes to them -- in other words, risk in that the same web page content is duplicated elsewhere -- will cause the search engines to ban their site, blacklist their domain or impose other drastic measures. The reality appears to be less severe, but still damaging if search engine traffic is important to you.

It would seem that the way the major search engines currently tackle the issue of duplicate web content is, when they find it, to either downgrade the page that it is on in their index (in other words, cause your page to appear lower in their rankings) or, certainly in the case of Google anyway, they simply don't display it at all in the normal search results list, but lump it together with all other similar sites under a catchall "25 other sites with similar content."

So what can we do about it?

Just as nobody is certain exactly how much of a penalty the search engines will apply to web pages carrying duplicate content, equally, there is no absolute consensus on how we can go about avoiding such a penalty. There appear to be currently different approaches. These are:

1. Ignore the problem.

Not, perhaps the most proactive of solutions, but still a fairly practical one. There does appear to be some evidence to suggest that even though the search engines are on the lookout the duplicate web content, they still take some time to find it. On this basis. There are many people who choose to ignore the issue altogether and are happy to put duplicate content on their web pages on the understanding that although they will eventually, almost certainly, be delisted or downgraded, they're still likely to enjoy several months of free traffic before that time comes.

2. Make sure that around 30% of your web page has content that is different to anybody else's.

This theory holds that the search engine isn't particularly interested in the article on your web page, per se, but is more interested in the totality of the copy that appears on the page. This means that you can create introductory paragraphs, conclusion paragraphs and other copy embedded within the article to boost the number of words on the page so that the article itself represents 70% or less of the page's total.

This idea has many followers, not least because it is far easier to add new, and often randomized, content to a web page than it is to rewrite a whole article. There are several popular pieces of software available that automate the process.

3. Search engines check each sentence for duplication.

The idea here is the search engines are rather more sophisticated, and check each sentence on a page to see if it appears elsewhere on the Internet, and if it does, or if sufficient sentences within a page match other web pages, the entire page is deemed to be duplicate content.

The only way to combat this is to make radical changes to the article itself. This involves substituting synonyms for as many of the words and phrases within each sentence as possible. While there are many programs available that offer synonym substitution, none of them currently can create human-readable versions of articles, totally automatically. The English language is rich in alternative words, but very very poor in true synonyms, and blindly substituting words without reference to their context almost always results in gibberish.

There are other, far better, programs that provide for user input to choose appropriate synonyms, and, by and large, these work very well. However, it is often quicker to simply rewrite an article by hand.

4. Phrase and gap analysis.

Those who believe the search engines have unlimited resources, both in computing and in programming, take the view that algorithms exist that can create a fingerprint for the content of each web page based on an analysis of unique phrases that appear on it and the number of characters that appear between them. If this is true, then changing only small parts of a page will not avoid duplicate web content filters and penalties.

It is by no means certain whether the search engines are at present this sophisticated, but there can be no doubt that in the future they will be -- and more so.

It would appear that substantially rewriting and rewording articles either by hand or semiautomatically is the only way to avoid the penalties that the third and fouth theories suggest.

Is this a lot of work?

The truth is it needn't be. Rewriting an article that somebody else has already created, to create totally unique web content, is merely a question of paraphrasing the original authors intent. It can be done far more quickly than writing a fresh article from scratch.

Two approaches that I find particularly useful are to open the original article in Notepad or TextPad, and then open a new Notepad or TextPad screen next to it, and to simply copy each sentence -- rewording it on the fly.

The second approach that I have been experimenting with recently, and which is proving to be even quicker, is to use Dragon NaturallySpeaking 8 to dictate a changed version of the article. By this method, I'm able to create a completely revised 500 word article in under 10 minutes.

In conclusion, whichever theory, you choose to follow, it is clear that you do risk penalties in the long-term, unless you make every piece of content that you display on your web site uniquely your own. There is a small amount of work involved in doing this, but the rewards are worth it.