When an original webpage is outranked by a page that copied its content, the author usually cries foul. At first glance, it seems they're right -- the original source should get the credit for the content. But when you look deeper, it's not nearly so clear cut a question as you might think.
Let's talk about what the search engines should do with duplicate content in a perfect world, and what they're likely to do in the real world.
In the simplest case, one site rips off content from another, duplicating a webpage exactly.
What should happen: The duplicate should be ignored completely.
What's likely: How can the search engines tell who's the original and who's the duplicate? If they found one page substantially earlier than the other, the first might be assumed to be the original. But while that will most often be correct, it won't always. What if the content originated offline, and the copier simply put it online before the originator? Unless there are clues in the content itself to tie it to its source, it's a toss up. And even then, will a search engine be able to analyze the text and find the clues?
What if the duplicate has more incoming links/better SEO?
What if somebody publishes some content and never bothers to get more than one link to it, and then someone copies it, formats the HTML better, and gets a whole bunch of links to it?
Does it matter whether they got the links using "black hat" or "white hat" methods? For example, if they somehow got traffic to the page, and a lot of people liked it and linked to it, have they earned the search engine ranking?
What should happen: If the copy is an authorized copy, then there's really no ethical issue, and the one with the best SEO can win. If it's unauthorized, then in a perfect world, the number of inbound links or other optimizations wouldn't let it get the upper hand.
What's likely: If it's not clear who's the original, then whoever's best optimized is probably going to win.
Excerpt vs. Full Content
What if someone publishes an excerpt from someone else's content. If the search keywords are all in the excerpt, which should rank higher? Your first reaction is probably that the original deserves to rank higher. But this is where things start to get muddy.
What if the search keywords don't appear anywhere else in the original content, but do appear elsewhere on the page containing the excerpt? For example, what if the duplicated content was just an aside in the original, but someone quoted and discussed it in depth? Which do you think the searcher would be more interested in?
In a case like that, the search engine user probably wants to see the duplicate. The originator deserves credit for their work. But in this case, the most appropriate "credit" is to be properly attributed and linked to by the copy, not the higher search engine ranking.
Want to get even muddier? What if both the original and the duplicate contain their own original content containing the search terms? Should the duplicate get credit for containing the duplicated content, or should the duplicate portion be excluded from the index so that the duplicator is ranked only on the original portion of its content?
That may seem like an attractive option. But what if the search query contains words that appear only in the excerpt, and words that appear only in the original portion of the page? Omitting the duplicate content would prevent the page from being found -- not what the searcher would want.
Again, at this point, the duplicator should be properly crediting the originator. But other ranking factors should probably be used to determine which page would best serve the searcher. Since both contain unique content related to the query, both pages could be ranked high. If the duplicator does a better job of earning a high ranking, the originator can still benefit from click-through traffic, PageRank flow, etc.
Does a site that's doing "thin curation" deserve to be indexed? Thin curation is another way of saying aggregation without significant commentary. In other words, the curator is hand-selecting content on some topic and republishing excerpts from it, without adding more than a sentence or two of original content to introduce the excerpt.
If the curator is performing a valuable service by selecting the best content on the subject, have they not earned the right to be considered for a high ranking? Unless the search engine algorithms are doing just as good a job of selecting content, a curator's site may be the best place for the searcher to go.
As you can see, the simple question of duplicate content isn't so simple after all. As long as the amount of duplication is appropriate and proper attribution is given, there are legitimate uses for duplicate content. And when done properly, pages containing duplicated content are just as worthy of search engine rankings as the pages where the original appeared.
Which will get ranked higher in practice is a different question.