Quantcast

Google Penguin Update: Don’t Forget About Duplicate Content

A classic quality guideline

Get the WebProNews Newsletter:
Google Penguin Update: Don’t Forget About Duplicate Content
[ Search]

There has been a ton of speculation regarding Google’s Penguin update. Few know exactly what the update specifically does, and how it works with Google’s other signals exactly. Google always plays its hand close to its chest.

“While we can’t divulge specific signals because we don’t want to give people a way to game our search results and worsen the experience for users, our advice for webmasters is to focus on creating high quality sites that create a good user experience and employ white hat SEO methods instead of engaging in aggressive webspam tactics,” Google’s Matt Cutts said in the announcement of the update.

He also said, “The change will decrease rankings for sites that we believe are violating Google’s existing quality guidelines.”

“We see all sorts of webspam techniques every day, from keyword stuffing to link schemes that attempt to propel sites higher in rankings,” he said. To me, that indicates that this is about all webspam techniques – not just keyword stuffing and link schemes, but also everything in between.

So it’s about quality guidelines. Cutts was pretty clear about that, and that’s why we’ve been discussing some of the various things Google mentions specifically in those guidelines. So far, we’ve talked about:

Cloaking
Links
Hidden text and links
Keyword stuffing

Another thing on the quality guidelines list is: “Don’t create multiple pages, subdomains, or domains with substantially duplicate content.”

Of course, like the rest of the guidelines, this is nothing new, but in light of the Penguin update, it seems worth examining the guidelines again, if for no other reason than to provide reminders or educate those who are unfamiliar. Duplicate content seems like one of those that could get sites into trouble, even when they aren’t intentionally trying to spam Google. Even Google says in its help center article on the topic, “Mostly, this is not deceptive in origin.”

“However, in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic,” Google says. “Deceptive practices like this can result in a poor user experience, when a visitor sees substantially the same content repeated within a set of search results.”

Google lists the following as steps you can take to address any duplicate content issues you may have:

  • Use 301s: If you’ve restructured your site, use 301 redirects (“RedirectPermanent”) in your .htaccess file to smartly redirect users, Googlebot, and other spiders. (In Apache, you can do this with an .htaccess file; in IIS, you can do this through the administrative console.)
  • Be consistent: Try to keep your internal linking consistent. For example, don’t link to http://www.example.com/page/ and http://www.example.com/page and http://www.example.com/page/index.htm.
  • Use top-level domains: To help us serve the most appropriate version of a document, use top-level domains whenever possible to handle country-specific content. We’re more likely to know that http://www.example.de contains Germany-focused content, for instance, than http://www.example.com/de or http://de.example.com.
  • Syndicate carefully: If you syndicate your content on other sites, Google will always show the version we think is most appropriate for users in each given search, which may or may not be the version you’d prefer. However, it is helpful to ensure that each site on which your content is syndicated includes a link back to your original article. You can also ask those who use your syndicated material to use the noindex meta tag to prevent search engines from indexing their version of the content.
  • Use Webmaster Tools to tell us how you prefer your site to be indexed: You can tell Google your preferred domain (for example, http://www.example.com or http://example.com).
  • Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details. In addition, you can use the Parameter Handling tool to specify how you would like Google to treat URL parameters.
  • Avoid publishing stubs: Users don’t like seeing “empty” pages, so avoid placeholders where possible. For example, don’t publish pages for which you don’t yet have real content. If you do create placeholder pages, use the noindex meta tag to block these pages from being indexed.
  • Understand your content management system: Make sure you’re familiar with how content is displayed on your web site. Blogs, forums, and related systems often show the same content in multiple formats. For example, a blog entry may appear on the home page of a blog, in an archive page, and in a page of other entries with the same label.
  • Minimize similar content: If you have many pages that are similar, consider expanding each page or consolidating the pages into one. For instance, if you have a travel site with separate pages for two cities, but the same information on both pages, you could either merge the pages into one page about both cities or you could expand each page to contain unique content about each city.

Don’t block Google from duplicate content. Google advises against this, because it won’t be able to detect when URLs point to the same content, and will have to treat them as separate pages. Use the canonical link element (rel=”canonical”).

Note: there are reasons why Google might skip your Canonical link elements.

It’s important to note that Google doesn’t consider duplicate content to be grounds for penalty, unless it appears that it was used in a deceptive way or to manipulate search results. However, that seems like one of those areas, where an algorithm might leave room for error.

Here are some videos with Matt Cutts (including a couple of WebProNews interviews) talking about duplicate content. You should watch them, if you are concerned that this might be affecting you:

This one comes from Google’s Greg Grothaus rather than Cutts. Also worth watching:

If you think you’ve been wrongfully hit by the Penguin update, Google has a form you can fill out to let them know.

More Penguin update coverage here.

Tell us about duplicate content issues you’ve run into in the comments.

Google Penguin Update: Don’t Forget About Duplicate Content
Top Rated White Papers and Resources
  • http://www.computerhowtoguide.com computer how to

    Google is doing everything to penalize black hat SEO. I think this is a consolidating stage, sites that are affected by penguin will recover soon if there are not involved in black hat seo.

  • Dao

    It’s an unfire update. The results are far from relevant. Lots of old sites with broken links are shown first. Also, if I search for photos on some subject, google offers some separate photos from blogs, flick, tripadvisor (it’s everywhere) while big photo albums with many photos (=internal links) on subject are set aside. But it’s not a link scheme or something. I don’t understand…

  • http://mathatube.com denzil

    One of my page lost traffic because of the penguin but I am not complaining. Some times the good guys have to suffer because of the badguys. But like said I am not complaining.

  • http://www.bestaffiliateniche.com Eva

    Yes, they fine tune everything (I got fine tuned to the max), but then they kick you out of adwords if your site is too much of an affiliate site and is “selling” something. Doesn’t make ANY Sense! I got kicked out of Adwords a year ago because they wanted me to make a site I didn’t want to advertise anymore compliant before I could use Adwords for anything. What???!!

  • http://cebuanddavao.com Nonoy

    I’ve heard of this Penguin update but didn’t know what exact issues the update are targeting. Now I know.

    I have 600+ visitors for the past few weeks before the update. Now I got 500+. I don’t have an idea if my site has been hit with this small decrease.

  • https://plus.google.com/u/0/115326789223681674036 Rajesh Magar

    Awesome article.
    And yes here I have one website in optimization activity which is just effected with this Penguin update.

    And the reason behind is that posting same content on multiple domains.
    Thanks Again

  • http://www.loanscomplete.com brian

    “Google doesn’t consider duplicate content to be grounds for penalty”

    Ah, someone want to tell Panda that? Having duplicate content is a big red flag to Panda – one of the biggest – and as anyone who’s been pandalised can attest losing 90%+ of your rankings certainly feels like a penalty.

  • http://www.pixalweb.com/seo/ SEO Company India

    .I’m all about making money from Google adsense so don’t blame them for hitting me in terms of my site being not that informative, but from making around 400 euros per month, I’m not at about 65 euros…not good. Kids are now eating supermarket own brand for tea……:(

  • http://suckmytrend.com Ash

    Good stuff. I have been looking for some good news regarding the penguin update and I find this very useful.

  • http://www.colinmcdermott.co.uk/ Colin

    Would be interested to know why it was called the Penguin update. Anyone any ideas? I’m guessing from the Awkward Penguin meme.

  • Joe

    Amazing how most of you guys keep dancing to Google’s tune. It is sickening really. I don’t play by Google’s tune since before Christ. White had SEO is for idiots. Google doesn’t care about your good SEO conduct. They care only about its shareholders.

  • RP

    So what do you do when the content of your site gets scraped or other sites pick up your content from your RSS feeds? Sites tend to get a lot of links from these things that they have no control over. Many of the sites obtaining content this way are spam sites. Get too much of this and you are going to get Penguined even though you actually did nothing wrong…

    The problem as I see with Penguin is that there are ways to do everything right and still get penalized.

    Also, what is to prevent someone from buying a ton of bad links to a competitor’s site to take them out?

    Google is making assumptions about intent, assumptions that can be very wrong. They should not create an environment where the actions of others can cause you harm.

    If I cheat, I should get punished. If others cheat or use my content I should NOT get punished! I have no control over this.

  • Joe

    Google needs to go back to the 2003 algo. if they don’t want to disappear soon with the upcoming Facebook search engine. It’s coming, oh yeah! Facebook being a public company means the search engine is coming. They have no serious business model and there are rumors of quite a few engineers building the engine. Either that or destroy Facebook with ads all over due to pressure from the shareholders.

    Google better get its act up. This real time Twitter, G+ BS and ads all over plus the irrelevant search results will kill them. Users need relevant search results and they are not getting it. This big and sudden algo. changes to punish the bad guys, who are a minority, is killing millions of good websites every year and making the serps irrelevant. They need to change their algo slowly like before and the algo. changes can’t be so radical, non-sensical. I guess they’ll never do it. They benefit from the irrelevant serps with Adwords. Only Facebook can kill them or straighten them up.

  • http://www.bikelightsled.com Bicycle Lights

    Did not notice much change with this update on my sites. We went up in the rankings because of all the other foul play of others.

  • http://www.carsolutiononline.com/ Arbudi168

    Thanks for the explanation Chris, I have to many changes to my sites lately, like cloaking link, reasonable number of keywords and links, so I am glad find the answers here.

  • http://dululainsekaranglain.com/ Dulu Lain Sekarang Lain

    i can see why some people getting frustrated with google. I bet the panda got something to do with it :wink:

  • http://www.wptm.net/ Becca

    Penguin Update is bad for many. I think that Google is trying to push down the small businesses in hopes of getting them to turn around and start a PPC campaign so that Google can make more money after missing target revenue guidelines. That make sense

    Some of my site hit by this updates. Reconsideration maybe a solution but how long the process will be :)

  • http://www.livingportugalproperty.com troia

    My main site Living Portugal Property has a property for sale in Portugal page where all my listings are located. I also have homes for sale Portugal com where I also showed all my listings (same content). My property for sale in Portugal page still ranks very well, however homes for sale Portugal com has disappeared off the rankings.

    A question if I am allowed, should I change the content on my homes for sale Portugal com and instead of having duplicate content link to my property for sale Portugal page?

  • william

    Unless AMAZON is Google+, FB and Twittering “sex toys”
    don’t believe the hype.

    First, all you SEO companies trying to advertise on this post that your clients did not suffer are total posers. Unless you work for Amazon, Zappos(amazon owned), Overstock, Sears ect your clients got hit.

    Google PENGUIN absolutely favors the big e-com and kills the average guy.

    1. Amazon now controls all English speaking words
    Amazon also owns imdb.com
    according to SEMRush – Amazon ranks #1 for “sex toys”
    - do the same search in Bing
    IMDB.com ranks #1 for XXX (you can figure out the traffic count for yourself)

    Amazon and Google were the top speakers at the ChannelAdvisors event days before PEGUIN

    Amazon, eBay, Sears ect get feeds from 3rd parties like CA
    Amazon stock jumped $30 a share same day as PENGUIN

    Google also just funded $5million for lobbyist in DC

    PENGUIN is all about product
    PENGUIN is all about $$$$$

    It’s Google and Amazon

    after researching over 100 sites and product lines it is clear that this update was about “product”

    Google uses the excuse that they do not owe anyone a living or a ranking position – but they have made it clear that they are going to put Amazon 1st for all e-com and both are going to make a lot of money doing so.

    PEGUIN = PORN + PRODUCT = $$$$$ to AMAZON

  • http://hghhealthfacts.com John Z

    It’s never right whatever you do, since years there are always changes at Google and it’s very hard to follow up with all their updates even for webmasters. You always have to pay for it if you have no or only little experience setting up your own small business. If you can do adwords and spend a lot of money to promote your business Google will be happy.

  • Tetsushi

    I feel that Google is addressing only to the webmasters and how to keep them thinking about google. Business, right? When google talks about users’ experiences, I find it funny because there is no value addition. Rather a value substraction. Heil Google :). Pardon my english as I am not native english speaker

    • A.I.

      Agreed. I used to use Yahoo for any searches till 2005 and since then started to go to Google as someone said that it was the best. But Yes, it was the best but now becoming a trash can. If I have to use all my IQ to try to find what I want to find then I need to go manual and try to type at keywords in the URL bar. Useless.

  • Pat

    All of my sites have lost traffic and I do not use black hat or deceptive methods. I also hate keyword spam. I’ve had it with Google, they reckon that they are trying to get better search results yet their search pages are riddled with ads, many of which are for thin, rubbishy sites. The top spots which are the most likely to be clicked on are paid for ads, so the only real way to ensure good positioning is to pay the fat man.

    Bing and Yahoo give better results and their search pages aren’t crowded with ads.

  • http://thesupermanstamina.com/ Emily

    I just wish I knew all the Panda factors, it makes my head spin.

  • max meier

    It makes me laugh what this “clever” guys from Google try to tell us, they are the ones who create the disaster and wont admit it so they tell the others are no good. Goggle is NOT willing to create any software to figure out what is the original and whats the copy, to do this is no big deal, they just don’t want. The fact is in the last 3 month I found around 200 pages and around 60 Photos in various websites with stolen content they took from us, the catch is google elevated the thief and trashed us, etc. The same is with this SEO stuff they think they are so clever and trash pages where there is no SEO etc. All this helter skelter was not existent until the last Google boss got another job, after disaster striked. Its a total disaster what this guys are doing every day in the search department. My advice to google is 1. get some qualified people and 2. dont change you system every month everyone who worked for longer time in IT knows that the worst thing is to continuously change a working system. It looks to me that relevant Google people dont know much about complex IT systems.

  • http://www.zombielogicpress.com Thomas L. Vaultonburg

    So, Google is whacking my peepee because they’re not smart enough to know the difference between two or more entries in my blog that are similar. Yes, they’re similar, dilwegs, because I return to the same five categories my site SAYS it’s about, sports, politics, movies, poetry, zombies. Too hard for you, guys?

  • http://www.njpropertytaxappeal.net Peter Jordan

    Since having one of my sites deindexed because of duplicate content issues, I have been forced to rewrite any content I intend to post elsehwere and ensure it has as least an 80-90% uniqueness. An arduous task, but one I believe will result in google perceiving my sites as offering a better user experience.

  • http://www.njpropertytaxappeal.net Peter Jordan

    My site was deindexed by google on account of duplicate content issues. Since then I have worked to ensure every and every piece of rewritten content achieves at least an 80-90% uniqueness before it is posted anywhere. An arduous task but one I am hopeful will lead to the removal of google’s curse.

  • Clinton

    Google holds all the cards and they are letting us know that they are in charge of everything on the web and there is not a thing that we can do about it. Their arrogance is terrible and is getting worse. Power corrupts!

    Also, bear in mind that the many continual changes are not to improve search. The changes are to improve revenue from paid adverts. At the end of the day Google doesn’t want anyone to be an expert on organic search.

    I am not sure why the open source community tolerates Google’s arrogance and just takes all of this lying down. Maybe because we don’t really have any choice.

  • Clinton

    Thanks John,
    Clearly the admin of this article didn’t think the point was good because he/she removed my article – so much for an open and free web.

    • Clinton

      Sorry – article is back.

  • http://www.ezau.com Blogging Ed

    As a Google user, I am happy to see that much duplicate web spam has been filtered from Google search engine results pages.

    There is also the issue of spun duplicate content which really annoys me. After clicking-through a Google Search Result and reading this web spam, I am no more enlightened than I was before. Spun content appears original to crawlers but makes no sense because a human did not proof-read it.

    I heard a rumor that Google might be checking grammar, punctuation and style filters to try and improve search results by filtering out some of this spun content. Not sure about that because many useful original articles will also be filtered out of the search results.

    Does anyone have information about this rumor?

  • Radu

    Let’s use other Search Engine, instead of becoming used to google and depending on it.

  • http://sanjeevseoservices.blog.com/ Sanjeev Rana

    Appears site house owners are having a tough time understanding the penguin update, think about simply common enterprise house owners that don’t have this technical data. This entire hand-wringing is ridiculous. Why doesn’t google simply arrange their very personal webdesign store and for a value do our web sites in accordance to their bible and we’ll all be higher off. each web-master I’ve hired, mentioned they have been a “guru” of their field, solely to be debunked just just a few months later and just just a few thousand poorer (me)…I’m sick and bored with all this double-talk and geek lingo that does nothing to additional the true objective, which is to assist enterprise folks or informative websites of their greatest gentle. everybody is skirting the true issue, which is—google high quality tunes every little thing to make sure you do adwords as an alternative of getting a “free” journey with natural listings.

    Bookmarking List

  • http://www.knead.co.nz Jeff

    Great post.

    I agree that Google is looking to reward positive (read unique and engaging content) over backlinked content. However i see that the trend is different depending on what part of the globe you are from.

    Our company based in NZ have dealt with several companies who were penalised in the penguin update. 3 were for keyword stuffing, the other had their links removed. Tellingly the backlink problem was from .co.uk tlds.

    It seems that across Australasia the online industry it us filled with ‘keyword = good’ amateur webmasters. This has lead to poor quality sites and results, which makes justifying the good work SEO professionals do, that much harder.

    I hope that this update scares the “keyword = good’ mentality out of the industry, but im fearful for all those companies who have no idea about the ongoing management of their site and the penguin effects that they havent even noticed yet.

    Keep up the great work!

  • http://www.voltagenewmedia.com matt magi

    Thanks for this, been studying the Panda update and this has to be one of the bet put together blogs I have read yet, bookmarking it now so I can finish watching all those videos. Thanks.

  • http://www.musicmastilovers.com Aahana Kashyap

    thanx for this useful information provided by you…. thanx alot

  • http://www.thinkcoffeelovers.com dinesh sharma

    Google Changes his Policy and new update for time to time… and useful information found here ! thanks alot

  • http://www.website2conversion.com King Rosales

    Hey Chris, thanks for the post. Lots of content and I especially appreciate all the Matt Cutt videos at the bottom; definitely enriches and reinforces your post.

    • http://www.slots4play.com Chris

      really man i really don`t know what was good and what was bad, anyway i know what problems i had and now they are fixed, but a 2 years work will not come back so fast and i`m pretty sure of this, i was in some good and strong keywords for long time, now i can see that only the luck will bring me back, can`t bulive this shit, think on building again 500-600 pages and in HTML , you will be crazy to think this is possible in 2-3 days … now i`m copying step by step things from my older version .. and improve even better where is possible …. well i hope they will upgrade as soon as possible ….

      • http://www.slots4play.com Chris

        let me give you an example : keyword: free slots uk demo

        yahoo : http://uk.search.yahoo.com/search;_ylt=Aq5MbYCzU19OUOCwsNUm8Is4hJp4;_ylc=X1MDMjAyMzM5MjMxMgRfcgMyBGZyA3lmcC10LTcwMgRuX2dwcwMwBG9yaWdpbgN1ay55YWhvby5jb20EcXVlcnkDZnJlZSBzbG90cyB1ayBkZW1vBHNhbwMx?p=free+slots+uk+demo&toggle=1&cop=mss&ei=UTF-8&fr=yfp-t-702

        google : none

        thank you google for dropping 140.000 impresions to 250 or less …

  • http://www.slots4play.com Chris

    i don`t know man, but this update messed not me but many others, so let me give u a big example : and if this is fairly true what is happening with this updates,

    ok i admit that i had a big problem with double content, well i don`t know how this was possible but google indexed my site not twice , 3-4 times www.slots4play.com , ww.slots4play.com, w.slots4play.com , reason.slots4play.com …,

    anyway assuming that i made a mistake , but the question is why this shit http://www.freeslotsonline.me.uk/ stays on google, in some keywords, with no content, and ranks, people like me or others with absolutelly nothing…

    anyway i saw that i still have pages and i`m working almost non stop, i make my site from scratch and only in HTML , i moved my site as well on another hosting company, and i hope i`ll come back soon, and hope goolgle will update this shit as soon as possible …. no more comments ….

    • http://www.slots4play.com Chris

      how this happent , before google announced the last update i`ve noticed in that morning something is wrong with my stats, i always check my stats every day …. i waited few hours and then BOOM – i got an email : the google penguin update … , i made a fast reserch and saw exactly what was the problem, hope this affect us temporarelly, and i hope if changes are made, i hope there is a chance for coming back … will see , only time will say …

  • http://www.getnewsinfo.com/ Reshhia

    wow that is so impracticable tool code of google co thank you

  • http://www.slots4play.com Chris

    ok guys – let me show you what is happening with google (please rate if you are agree with me ! )

    example 1: this guy ranks with my content (stolen from me as you can see)

    i shared the picture here : www.slots4play.com/stolen-content.png

    and the proof here :

    https://www.google.co.uk/#q=free+slots+uk+demo&hl=en&prmd=imvns&ei=ifjLT_3vAorg8APosInJDw&start=20&sa=N&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&fp=8c94bba531e6309e&biw=1600&bih=673

    so as we can see bad guys goes up, hard workers goes doen … if this seems right … well bravo GooGle you are the best from the worst thing possible…

    thanks again for ruining good and hard working people , this is not the only site that copied my site!!! there are more ….

  • http://weworkforcheese.com mike

    Since the update, it looks like the number of links we had indexed with Google was drastically reduced. I don’t think we’ve violated any terms since our content is 100% original and not duplicated. It’s overall annoying, but not a huge deal. It affects our search traffic yes, but we have a reader base that continues to visit directly because of the content and interaction.

  • http://jaipursightseeing.wordpress.com Mahendra Singh

    Hello Chris Crum,

    I am a Mahendra Singh from India. I have a small tour business of Jaipur, India tourism but I am facing very critical condition in this time. After penguin update my blog(http://jaipursightseeing.wordpress.com) ranking is dropped and i did not do any black hat seo or unethical work for my blog so please suggest me that what should i do for recover? Please it’s urgent…………

  • http://gardenbay.net garden greenhouses

    Not much time has passed since most webmasters became familiar with Google Panda. However, as opposed to what most people expected, the industry giant did not stop here and on April 24 a new update was launched. Google Penguin comes with no less than 52 algorithm changes and many more penalties for web pages using over-optimization. Like in the Panda update, representatives of the company stated penalties would only be applied to pages with problems, without the risk of affecting genuine sites. A difference would be that Google Panda reassesses all sites on average once a month, while with Penguin the optimization is evaluated continuously. As a consequence, it is now important for webmasters to adapt to the new requirements in a very short time, because penalties will be canceled immediately.

  • http://www.kernmedia.com Dan

    LOL, outside of the video code, 50% of this page’s body content is duplicate content pulled from Google’s Webmaster Guidelines. Good reminder in this article, but thought this was funny.

  • http://katpro.in katpro

    thats an importent thing, duplicate content. we have to give canonical tag for the duplicate content.

  • http://www.aboutfaces.co.in bridal makeup in delhi

    we recently recovered from penguin update after removing quite links from other website and changing internal links

  • mohmmed alsaadi

    what about RSS feed??
    there r many of sites takes from my rss to publish on there forumes
    is that good for me or not ??

  • http://trendstatic.com Ramon Acot

    Good thing there is a helpful tips like this. As saying tells, prevention is better than cure

  • http://www.trendstatic.com Ramon Acot

    Thanks for good insights. My company will get better and better with this