We know that duplicate content is bad, and that Google can and has punished sites for duplicate content. Furthermore, the recent Google Panda algorithm update hit sites harder than ever for duplicate content, even if that duplicate content exists only on a minority of pages on the site.
But, there are two different kinds of duplicate content — duplicate content within a domain, and content duplicated across domains — and it’s very important to know the difference, because Google treats each kind of duplicate content differently.
Neither is good, but only one is truly bad.
What is Not Duplicate Content
First off, it’s worth noting that by a strict definition, every single site out there has duplicate content. You have your top and/or side navigation that is the same on every page, or the footer and copyright notice, etc.
Also any time you quote an excerpt of another page — including a quote that someone else made — you’re duplicating content from another website.
These kinds of duplicate content are a normal part of a website structure and, in fact, improve the user experience (users like things like navigation) and Google is not going to punish your site for minor amounts of duplicate content in those ways.
In general when we’re talking about duplicate content, we’re talking about substantial amount of duplicated content. No one’s sure what the exact percentage is — some places will say you can have as much as 50% duplicate content and be fine, other place that number lower. Certainly the secret is to have as little duplicate content as possible while still maintaining a good user experience.
Duplicate Content Within Your Site
The first kind of duplicate content is duplicating your own content across several pages within your domain. This most commonly occurs within ecommerce sites that have, for example, a dozen of the same product in different sizes, or colors — or some other minor difference that for whatever reason isn’t set up to be a drop-down option. Each of those products has its own page, and the descriptions are nearly identical.
This kind of inner-site duplicate content is not something that Google is going to punish you for (as long as it’s all on the same domain). Google will not down-rank your site because of this kind of duplicate content.
Generally what’s going to happen is Google will gather up all the pages that it thinks are duplicates and put them in a stack. Then Google will choose which of those pages it thinks is most relevant, and that’s the only one it will return as a result.
Again, Google is almost certainly not going to hurt your site’s rankings (though, in theory, if Google thought you were doing it in a way to manipulate search engine results it might — but I’ve never seen any reported case of that).
Not bad, but not good
While this kind of duplicate content isn’t going to get you punished by Google, it’s still not ideal. Since Google is going to pick just one of those pages to rank, that means you’ll only hold one position in Google’s search engine results. If the text on all of those product pages was substantially different, it’s likely that you’d see several of those show up one after the other.
The only thing better than holding the number one search result in Google is holding the number one, two, and three search results!
Duplicate Content Across Multiple Domains
The other kind of duplicate content is the scary kind that attracts the Google banhammer like the inexorable force of gravity: content duplicated across domains.
Here we’re talking about taking the content from another site, copying most or all of it, and putting it on your own site. Sure you might have some introductory text or your own ads/sales items there, but if it’s substantially duplicated from another domain, you could be in big trouble.
In this case Google fears that your site might be a scraper site, automatically scraping the content from other sites and reposting it as your own in hopes of generating ad revenue. And even if you aren’t, Google figures that the user is better off getting that information from the source, rather than reprinted in whole somewhere else.
This is the kind of duplicate content that can hurt your rankings, and even get your site removed from Google’s index. Furthermore, in the Panda update Google has told us that having this kind of duplicate content on even a minority of pages on your site can hurt the rankings of your site as a whole.
Danger for ecommerce sites
The danger of this for ecommerce sites is that a lot of them actually have this kind of cross-site duplicate content without even thinking about it. Specifically, in their product descriptions.
If you’re a distributor or retailer that purchases products and resells them, do not just copy and paste the manufacturer’s descriptions!
Doing this will quickly make your product pages nothing but duplicate content — not just duplicated from the manufacturer (which Google will likely know is the original source) but also duplicated on every other reseller’s page that’s taking the lazy way out of writing product descriptions. You could end up getting your entire site downranked if you have too many of these pages.
Be sure that your content is unique, and that means you have to be sure that your product descriptions are unique, and not just copied from the manufacturer — lest your whole site suffer.