Duplicate Content or the Original: Which Should Rank Higher?
by Antone Roundy | 4 Comments | SEO
When an original webpage is outranked by a page that copied its content, the author usually cries foul. At first glance, it seems they're right -- the original source should get the credit for the content. But when you look deeper, it's not nearly so clear cut a question as you might think.
Let's talk about what the search engines should do with duplicate content in a perfect world, and what they're likely to do in the real world.
In the simplest case, one site rips off content from another, duplicating a webpage exactly.
What should happen: The duplicate should be ignored completely.
What's likely: How can the search engines tell who's the original and who's the duplicate? If they found one page substantially earlier than the other, the first might be assumed to be the original. But while that will most often be correct, it won't always. What if the content originated offline, and the copier simply put it online before the originator? Unless there are clues in the content itself to tie it to its source, it's a toss up. And even then, will a search engine be able to analyze the text and find the clues?
What if the duplicate has more incoming links/better SEO?
What if somebody publishes some content and never bothers to get more than one link to it, and then someone copies it, formats the HTML better, and gets a whole bunch of links to it?
Does it matter whether they got the links using "black hat" or "white hat" methods? For example, if they somehow got traffic to the page, and a lot of people liked it and linked to it, have they earned the search engine ranking?
What should happen: If the copy is an authorized copy, then there's really no ethical issue, and the one with the best SEO can win. If it's unauthorized, then in a perfect world, the number of inbound links or other optimizations wouldn't let it get the upper hand.
What's likely: If it's not clear who's the original, then whoever's best optimized is probably going to win.
Excerpt vs. Full Content
What if someone publishes an excerpt from someone else's content. If the search keywords are all in the excerpt, which should rank higher? Your first reaction is probably that the original deserves to rank higher. But this is where things start to get muddy.
What if the search keywords don't appear anywhere else in the original content, but do appear elsewhere on the page containing the excerpt? For example, what if the duplicated content was just an aside in the original, but someone quoted and discussed it in depth? Which do you think the searcher would be more interested in?
In a case like that, the search engine user probably wants to see the duplicate. The originator deserves credit for their work. But in this case, the most appropriate "credit" is to be properly attributed and linked to by the copy, not the higher search engine ranking.
Want to get even muddier? What if both the original and the duplicate contain their own original content containing the search terms? Should the duplicate get credit for containing the duplicated content, or should the duplicate portion be excluded from the index so that the duplicator is ranked only on the original portion of its content?
That may seem like an attractive option. But what if the search query contains words that appear only in the excerpt, and words that appear only in the original portion of the page? Omitting the duplicate content would prevent the page from being found -- not what the searcher would want.
Again, at this point, the duplicator should be properly crediting the originator. But other ranking factors should probably be used to determine which page would best serve the searcher. Since both contain unique content related to the query, both pages could be ranked high. If the duplicator does a better job of earning a high ranking, the originator can still benefit from click-through traffic, PageRank flow, etc.
Does a site that's doing "thin curation" deserve to be indexed? Thin curation is another way of saying aggregation without significant commentary. In other words, the curator is hand-selecting content on some topic and republishing excerpts from it, without adding more than a sentence or two of original content to introduce the excerpt.
If the curator is performing a valuable service by selecting the best content on the subject, have they not earned the right to be considered for a high ranking? Unless the search engine algorithms are doing just as good a job of selecting content, a curator's site may be the best place for the searcher to go.
As you can see, the simple question of duplicate content isn't so simple after all. As long as the amount of duplication is appropriate and proper attribution is given, there are legitimate uses for duplicate content. And when done properly, pages containing duplicated content are just as worthy of search engine rankings as the pages where the original appeared.
Which will get ranked higher in practice is a different question.
March 2nd, 2011 at 2:38 pm
I have been thinking about this issue for a while now and have even been thinking of getting a script that would not allow any of my pages to be copied.
Do you think this is a good idea, and if so would the script impact the SEO of the page?
Any other thoughts would be appreciated.
March 2nd, 2011 at 2:48 pm
I personally wouldn't worry about it -- not saying that you shouldn't, but I wouldn't.
The reason is that, even if you install a script to prevent people from manually copying and pasting content from your page, they can still view the source of the page and copy it that way, or get it from your RSS feed (if you have one for the site in question).
I would guess that most people who steal content use tools that wouldn't be affected by the script anyway. They're probably not visiting your site in a web browser, selecting content, copying it, and pasting it into their sites.
March 3rd, 2011 at 2:03 am
I guess your right ...
Anyone scrapping, for example is using a piece of software to do it.
Thanks for the reply ... I decided the answer is just not to worry :)
March 19th, 2011 at 12:36 pm
Well ! I agree with the point that duplicate content does not matter if there are good High PR backlinks pointing to the page. If the original content is not optimized for SEO as good as the duplicate one , then SERPS will prefer to show the duplicate page on top of the original.