While Google’s Matt Cutts has certainly provided a wealth of helpful tips via the company’s Webmaster Central YouTube channel, he is not the only one to do so. Greg Grothaus of the Search Quality Team has posted a video (along with a presentation on the Webmaster Central Blog) covering duplicate content and multiple site issues that webmasters continue to face when trying to rank well in Google.
Greg begins by clearing up a popular myth about duplicate content, and that is that Google penalizes sites for having duplicate content. This is not the case. That’s not to say that duplicate content can’t have a negative impact on your rankings, but Google itself is not penalizing you for it.
Have you believed that Google penalizes sites for having duplicate content? Comment here.
Greg says people see messages like the one below and think their content is getting omitted from Google’s results, when in fact it really may just be being omitted for that particular query. Greg stresses that duplicate content is simply a factor on a "by query" basis.

"What’s actually happening, is that we’re looking at the query that the user’s doing, and we’re saying that we want diversity in the results we’re going to show a user," says Grothaus. He says those who think their content is being omitted because it is duplicate, will likely find that if they adjust their query to more specifically reflect the missing piece, they may just find that it shows up in results after all.
Google recognizes that most duplicate content is not created to be deceptive. There are of course exceptions, which are considered spam. Grothaus says even spam sites aren’t being penalized for having duplicate content though. They’re being penalized for being spam. Just like some spammers use bold tags, he says. They don’t penalize people just for using them. And they don’t penalize people just for having duplicate content.
Duplicate Content:
- example.com/
- example.com/?
- example.com/index.html
- example.com/Home.aspx
- www.example.com/
- www.example.com/?
- www.example.com/index.html
- www.example.com/Home.aspx
The above list from Grothaus’s presentation shows examples of URLs that are different, but show the same content. Google will recognize that they’re the same, and will try to pick the right one, (although sometimes they pick the wrong one). Greg says Webmasters are the best people to know which one is best, so it helps to only use one.
You will not be penalized for using more than one, but there are some issues that can arise that may negatively affect your rankings. For one, your link popularity will be diluted. Backlinks pointing to several different URL versions of the same content, will make it harder to accumulate link juice for one URL. Greg says that user-unfriendly URLs in search results may offset branding efforts and decrease usability as well. Plus, with multiple versions of the same thing, Google will spend more time crawling the same content, meaning it will have less time to go deeper into your site, and you run the risk of having content not get indexed.
Fixing the Issues
To avoid such issues, Grothaus suggests using a "canonical" version of the URL, meaning the simplest, most significant form. He says to pick one for each page and link consistently within your site. You can also use the rel="canonical" link element as explained by Matt Cutts in the following clip:
Rules for rel="canonical"
There are rules for the rel="canonical" link element to consider. For one, it should be used between pages that are on the same domain. It works across different hosts. For example, blog.webpronews.com could suggest www.webpronews.com as a canonical URL, but it doesn’t work across domains. So www.webpronews.com couldn’t suggest www.smallbusinessnewz.com.
You can use the element for protocols, such as http:// vs. https://, and you can use it for ports. Pages don’t have to be identical, but they should be similar. Slight differences are ok. You don’t have to use the rel="canonical" link element. It is just another option, or "another tool in your arsenal," as Grothaus says.
Another option is to make all non-canonical URLs do a permanent (301) redirect to the canonical (or preferred) URL. In addition, in Google’s Webmaster Tools, you can specify www. vs. non-www. 301 redirects are commonly used when moving sites.
Multiple Domains
Lastly, Grothaus discusses multiple domains. This is in reference to when you have content for different audiences, such as by country, language, etc.
There are concerns here. You have to consider your reputation being distributed across multiple domains, and Google will only show what it perceives to be the best page for a particular query.
One interesting factor of this to also consider, that may often go overlooked, is that with multiple domains, you’re potentially losing the advantage Google’s tabbed user interface. You know how sometimes search results are expandable and point you to different links within the site? If your content is spread out across multiple domains, you may be missing extra clicks, because Google can’t link to another domain here.
Grothaus explains all of the above and elaborates on each point in the following fifteen -minute video. The information is based on his presentation from the recent Search Engine Strategies conference in San Jose.
See our own interview from SES with Grothaus here as well:
Did this information clear up any misconceptions you had about duplicate content? Let us know.








Sounds like a penalty to me… Not like a “Blag Flag” just a website which has duplicate content.
Exactly how can you start an article calling it a myth and then continue to explain how it negatively effects a website and it’s potential?
What’s next? Are you going to try and tell me that low page rank will not cause my carbon copies to group together as similar in the search results?
I think the point Grothaus is trying to make, is that Google is not penalizing you, but you are penalizing yourself if you don’t take the proper precautions, which are available to you.
The “myth” is that it is a “penalty.”
You know I think I finally understand… It might have been my lack of sports lingo comprehension…
Of course it’s a not a penalty, as in… it’s not like the guy in striped who judges hockey games is going to blow the whistle and put you in the penalty box…
You’re just not living up to your full potential. To me… I always thought that was a penalty… but I think my terminology or understanding has been off…
I think these debates need to start being framed by a business ethics consideration as well as a purely technical framework. What do we want our web presence to do? – We want to attract potential customers who will then buy from us if they are satisfied by our proposition. How do we best do this – we set our stall with passion but without misleading claims. I have been running my own commercial websites for 12 years now. I have always found that good design, relevant information, relentless but accurate promotion through writing, and keeping an eye open to fresh, legitimate and differentiated openings has consistently kept me very visible. “Duplicate Content” – sounds like a scam tactic to me; don’t do it!
This helps put my mind at rest. Our primary business is as a letting agent but, like a number of property portals, we draw in real estate from a large number of sources and it is inevitable that several kinds of duplication are going to occur in this type of business. It’s nice to no that this business model is not automatically down scored. Richard Francis.
Yes, as Grothaus says in the video, there are plenty of times when duplicate content is legitimate, and Google recognizes that more often than not, it is actually not being produced to be deceptive. This is right in their webmaster guidelines, which he provides a slide for in his presentation.
Nice Post,
Thank you very much for this use full information.
I think that’s still the same as Google penalize those with duplicate contents, that just proves that content is king. Should always use unique and useful content in order to rank higher in search engines. canonical is new to me, thx for the great info..will bookmark this!
“That’s not to say that duplicate content can’t have a negative impact on your rankings, but Google itself is not penalizing you for it.”
Doesn’t ‘negative impact on rankings’ = penalizing
…it is good to know they’re not penalizing directly.
The issue still remains, if your rankings are suffering from duplicate content, it’s definitely an issue to resolve.
Another element of duplicate content is the joys of scraper sites that use your content to outrank you (ex: in the case of a new site publishing…up against a slightly aged scraper site).
Ok, that’s my 1.5c on the topic.
“That’s not to say that duplicate content can’t have a negative impact on your rankings, but Google itself is not penalizing you for it.”
Although technically speaking, Google may not be applying a formal penalty, the fact that duplicate content can negatively affect a site’s rankings implies that there is at least a pseudo-penalty, or the equivalent of a penalty, is applied to a page.
The best thing that you can do is to treat duplicate content as if it will create a penalty. If it walks like a duck, and quacks like a duck…
negative impact = not good.
do you want not good?
Many people at Google have said over and over that there is no duplicate content penalty. The site that’s first or the one with the most authority will get the PR credit and the rest will get little or none. I suppose this will never go away.
If there were a duplicate content penalty Google would have almost nothing to list.
In my experience I have 2 sites that have about the same content some sentences are exactly the same.The older of the 2 sites had the content on first and I build the 2nd one the newer site with the same content I then change the content on the older site and it dropped in PR and the new one got a better PR strange example but the age of the site that have it first should get credit.But not in this example.
This is no surprise to me.
If anything Google should do a better job of recognizing this.
I often repeat information or even quote myself in my websites or blogs (often to help the reader understand a subject better ), however I often link back to the source of page with the origin of most of what is written rather than the snip.
I also link back to others I quote both to give a reader better information, but also simply because this is the legal/right thing to do
Unfortunately Google does not recognize this (by choice in my view), so as to place Google Adsense/Adwords on Plagiarized content of mine and others.
Agreed – It’s not a penalty as much as its a lack of optimization.
Multiple pages with the same content may not be penalized, but they will not perform as well as if each has unique content.
Sorry, but if a page is poorly optimized it would never rank well. If it is optimized enough to rank well for a series of different search terms, it should still rank for those same search terms after someone steals the content verbatim, either in whole or in part. Optimization has little to do with it. I’ve seen pages with no coding errors rank #1 one day and be pushed to the bottom of the Google deck the next after the content is stolen. After this happens, the original document ranks lower than the duplicates — even for searches restricted to specific, lenghthy strings of text from the original article.
I had a site banned from Google for duplicate content. I had to get every single instance of duplication out of google, including sites that had duplicated cache showing up as part of their linking process. I fought for 6 months to get google to unban the site. I lost trusted status and have never gotten back to the #1 position that I was in before. It was a combination of people stealing my pages to usurp my position, a travel site partner that used the same title about 500 times, the weird cache linking, and a few lazy souls that just used my content. It was an awful mess. Keep track of your site.
I’ve been doing this since 1997 and frankly all any of you “experts” do is convince people their local webmaster is stupid when the truth is none of you ever do any better when you charge 100 times more.
I keep posting most of the same comments mostly everywhere of my http://www.travelmastery.com IS THAT BAD?!
This is very interesting information and I’m glad to hear it but how does Google find the original if there is duplicate content out there? I have over 200 articles on my safety and security web site forum that anyone can copy and paste to their blog or web site and then my original content would be duplicated without my knowledge. Do they go by published date and exclude the subsequent copies? Just curious. Thank, Chris.
I have to admit that I’ll always use the word penalty when talking to customers because they understand this usually – I did have one guy who I spent about a year talking into getting rid of duplicated content before he got it and now he’s started getting rid of it he’s reaping the rewards.
Damned right that a hell of a lot of non-original material on the web isn’t done intentionally. Just had another customer who populated his website with content provided by his franchise. Even his domain name dropped out of the SERPs, but after a content re-write and a re-inclusion request he was back in first place for what he wanted in about 10 days.
Most often its companies with two websites that are clones and both websites only end up performing half as well as they should or worse. in one pretty bad scenario recently a customer started off with 2 websites wth unique content one of which was a suddomain of the other and they were strongly interlinked – during a website migration their techy decided to make them clones of each other because he thought this might stop them dropping out of the SERPs during the migration. The one with the older domain is getting picked up for everything, but this was the weaker site and the newer site had all the main target keywords on it, not to mention that there is no longer any interlinking. They’ve fallen of the radar for everything that matters and I’m still picking up the peices. Aggh.
In the context of penalty (not link juice, seo effectiveness, etc) this duplicate content scare never made sense to me. Not given what I know about information science and what search engines are ultimately trying to do.
At the lowest level the alarm goes out that submission of the same article to numerous sites will cause the submitter (presumably) to be penalized. Okay… why not everyone else who has the duplicate content? Additionally, it is contrary to the logic behind syndication.
Reuters publishes a phenomenal body of content that’s republished on news sites all over the globe. Surely they should be penalized before the webmaster who has stuff in 12 directories.
In it’s simplest form, Google’s mission to index the web mimics that of the US Library of Congress– just with a greater (impossible?) magnitude of complexity. That’s why a fourth grader attains functional knowledge of the library system’s information architecture as soon as s/he learns to use the card catalog– and we’re still confused about Google.
Imagine the Chicago library being penalized in some way because all the Milwaukee branch libraries have duplicate copies of the same book it has. Google sounds to me like they are only penalizing libraries that have more than one copy of the same book.
In a web context it makes sense because 1000s of readers can check out the single copy simultaneously so there’s no need for duplicate copies. Duplicate content on the same domain, obviously, has the potential for abuse.
If I’m missing something, or I’ve oversimplified this, I’m open to instruction. Until then… I still don’t buy it.
With many internet marketers writing articles and submitting them to article directories, it’s assumed that their will be tons of duplicate content. All the search engines are doing is awarding the content that has the most juice linked to it, so its actually a competitive filter rather than a duplicate content penalty.
Excellent article. I’ve been a little confused about the entire ‘duplicate content’ issue up until now but this articel puts it all in perspective.
From my experience, they do not penalize. They will just de-index the duplicate content if it’s blatant. As soon as you fix it, you’re back in immediately.
I’m sorry – but this doesn’t really bring anything new to the duplicate content question. It certainly doesn’t ‘bust’ any myths.
Basically if you clone a site and try to present it on multiple domains Google will only rank one iteration of that content. If Google feels like you are trying to fool the SE they may choose to ban all iterations. If it’s a simple matter of duplicate content additional versions will be placed in the supplemental index and will get no visibility. If one chooses to say that this is not a penalty – so be it, but clearly duplicate content is penalized – otherwise a single well optimized set of content, duplicated across multiple domains would result in multiple top SERP’s with each site scoring separately for the same content. As we all know, that doesn’t happen.
One note – this holds true for competitive keyword terms – but does not hold true for content specific searches. A for instance? Searching for the phrase ‘Google Busts the Duplicate Content Myth’ will show multiple iterations of this article – a search for the term ‘duplicate content’ produces 10 different page 1 results.
auto rank alexa http://thegioinet.org
Hi , i saw a service called article automation where we can have our articles distributed to over 10k blogs in slow interval naturally , for a fee which we pay per month..so , in this case , having the same article with say 10 diff. versions on 10k blogs make the sites linked inside it penalized by google?
please reply and help me , Thanks
HI Dude,
The question your raised is a good one. I also need the answer for the same question. I am excepting that google will consider each link differently, but I am now sure. Has I can see in the above article, it suggests that google will not penalize if we have links from same duplicate content. But my suggestion is that its better not to go with 10K similar article. That may lead to over SEO which may do some harm to your website.
My best suggestion for getting links to your website is that, you create your own websites and host them in different places and use those websites for link building. And one more thing, google likes dynamic websites, so try to develop more dynamic websites rather than a basic HTML websites.
This is a very big surprise, we have three real estate sites and trying to change the content, keywords etc of each is quite a task. However it is necessary to provide the internet searcher with more choice.
I often read articles on the blog or on the website that explains, that when we use the blog with content duplicate will be removed from Google search index. not only that, the blog and domain can also get a penalty from search engines if proved to have content duplicate .I often find blogs with content duplicate ,ranking and its Pagerank also higher, compared with the blog that use the original content. some say, the blog with duplicate content assessed by Google’s low in results search. http://www.suround.com is for sale with fixed price. you seriously want to buy, contact me as soon as possible.
I’ve been hassling the people over at Google’s support forum for months about this issue (as well as posting here on the subject).
The website I manage published original content (articles) which are almost immediately stolen and republished on other websites. Within months they can be appearing on scores if not hundreds of websites. And, as the site has been running for over 10 years, we can only guess at how many of our articles have been plagiarised in this way.
Now, here’s the thing: the website in question (I won’t name it in case I’m accused of spamming) w was a PR6 site until it began to publish articles. Then it dropped to PR5. Then PR4. And now it is a PR3 site – despite the fact that it has been improved and added to over the years, and the number of links in to it has increased (to more than 2,000).
I could never figure out why the PG was going down, while the site itself was getting better and better, with more and more content and an increasing number of inward links.
Then it occurred to me that if Google is penalizing sites for duplicate content, does this mean that it is also penalizing the sites on which this content originated? In other words, does G know how to distinguish between an original article and the ripped-off version?
No-one at Google appeared to be able or willing to answer this question for me.
So now we have this “clarification” – which, in Google’s typical obfuscatory style, is anything but clear.
We are now being told, after years of being told – or at least led to believe otherwise – that G does not penalize sites for publishing duplicate content. If DC is not penalized, G certainly did a good job of convincing us that this was the case.
And this denial (of sorts) is even more interesting, bearing in mind the numerous websites that have been de-indexed (let’s be honest, put out of business in many cases) by G on the grounds that they carried duplicate content.
As usual, G is now “finessing” (or as we say here, “Obamatizing”) the issue. Duplicate content is not penalized, but “that’s not to say that duplicate content can’t have a negative impact on your rankings”.
Now this is very interesting, because what G is essentially saying here is that the default position is to have zero G ranking. Any ranking that G gives a site is a bonus. A reward for being obedient to G. Which means that if G takes away this reward, it isn’t actually imposing a penalty. It is merely “de-rewarding” the website in question for “failing” to comply with G’s rules.
In the G scheme of things, websites have no intrinsic value other than that attributed to them by G!
So, if your website is de-indexed by G because you published duplicate content (even if, as far as we know, you wrote this content yourself), and as a result the traffic to your site ceases overnight and you go out of business within a few weeks, you can console yourself with the fact that, sure, you and your family may be starving to death, but at least you weren’t actially *penalized* by G.
Aplogies for the spelling errors… rushing out to keep an appointment.
We’ve been facing the exact same issue. Our site also is quite old. Our content is regularly stolen from us. We spent several years trying to figure out why we didn’t rank in Google any more. We posted in their forums and didn’t get any kind of answer whatsoever. Then we discovered that massive amounts of our content had been replicated elsewhere. We spent a lot of time getting the duplicates removed. Our site came back. Then a bunch of things were stolen. Our site disappeared. We’ve been in this insane seesaw ride for a few years now. It just repeats endlessly every few months. Meanwhile, pages of new original content (our site is ALL original content) don’t accrue page rank while we are in exile. Because of this, our overall site page rank is going down. It just doesn’t end. We stopped publishing our site feed because people, including what appear to be legitimate aggregator business, were scraping it and republishing our content without our permission. Even when we went from full feed to just a link/description, they just followed the links and scraped from our site directly. We cut back our e-mail newsletter to summaries because people were taking the partial article publication straight from our e-mail newsletter and putting them up on blogspot blogs and other spam sites. The duplication hurt US, not them. It doesn’t even have to be our full articles that are stolen. Often just a paragraph or two does the trick. With Google, it’s impossible to be a content-producing small business and succeed. Google will screw you at every turn.
Thanks Webpronews,
I always watch & read all your new news pages etc as they are always up to the minute fresh on ideas & especially when your interviewing people like Matt Cutts above.
Great stuff!
Thanks alot.
ps: My web guy doenst like you though as he has a full time job changing & correcting things for me!
Cheers
GFS
garden furniture sets
That was a great article and clearly answered a number of nagging questions and filled in a lot of gray areas for me.
I agree About this site with Mike. actually I m just looking other stuff but I just read and read and now I m here. thanks a lot guys.
1) Is it better to block Google from indexing category pages, tag pages, archives because of the chance that google will pick the wrong version of the data? Or is it better to let google crawl through and figure it out
2) Is it better to: Use RSS/etc to put your content on other sites for more inbound links/website traffic? Or is it better to eliminate/minimize where your RSS feeds go so that it doesn’t appear to be duplicate content?
Kenneth Udut
http://free.naplesplus.us – naples florida news, jobs, for sale, business directory, events videos
— who wants to experiment with Semantic Linking some more but is afraid of dup-content problems.
It is right, it is running on Internet the myth that Google punishes sites having duplicate content. So, many webmaster ( me!) avoid to make duplicate content with their own sites. Then, it’s long work to make fresh and unique content for each page and each site.
Thanks a lot to have broken this stupid myth.
It is a little bit ambiguous at the best of times. How do I figure out what I am supposed to do. Some guy on one site says make sure a lot of words are bold typed, now i read above not to do that. The site thing and the canonical naming structure I was aware of but the rest is a mystery. How does Google expect a real business with a valid product to sell to get ahead of the pack if the pack know all the blackhat ways to get around everything since they have been in the business so long so are experts at getting there website’s listed.