Google Penguin Update: Don’t Forget About Duplicate Content

A classic quality guideline

Get the WebProNews Newsletter:

Google Penguin Update: Don’t Forget About Duplicate Content
[ Search]

There has been a ton of speculation regarding Google’s Penguin update. Few know exactly what the update specifically does, and how it works with Google’s other signals exactly. Google always plays its hand close to its chest.

“While we can’t divulge specific signals because we don’t want to give people a way to game our search results and worsen the experience for users, our advice for webmasters is to focus on creating high quality sites that create a good user experience and employ white hat SEO methods instead of engaging in aggressive webspam tactics,” Google’s Matt Cutts said in the announcement of the update.

He also said, “The change will decrease rankings for sites that we believe are violating Google’s existing quality guidelines.”

“We see all sorts of webspam techniques every day, from keyword stuffing to link schemes that attempt to propel sites higher in rankings,” he said. To me, that indicates that this is about all webspam techniques – not just keyword stuffing and link schemes, but also everything in between.

So it’s about quality guidelines. Cutts was pretty clear about that, and that’s why we’ve been discussing some of the various things Google mentions specifically in those guidelines. So far, we’ve talked about:

Hidden text and links
Keyword stuffing

Another thing on the quality guidelines list is: “Don’t create multiple pages, subdomains, or domains with substantially duplicate content.”

Of course, like the rest of the guidelines, this is nothing new, but in light of the Penguin update, it seems worth examining the guidelines again, if for no other reason than to provide reminders or educate those who are unfamiliar. Duplicate content seems like one of those that could get sites into trouble, even when they aren’t intentionally trying to spam Google. Even Google says in its help center article on the topic, “Mostly, this is not deceptive in origin.”

“However, in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic,” Google says. “Deceptive practices like this can result in a poor user experience, when a visitor sees substantially the same content repeated within a set of search results.”

Google lists the following as steps you can take to address any duplicate content issues you may have:

  • Use 301s: If you’ve restructured your site, use 301 redirects (“RedirectPermanent”) in your .htaccess file to smartly redirect users, Googlebot, and other spiders. (In Apache, you can do this with an .htaccess file; in IIS, you can do this through the administrative console.)
  • Be consistent: Try to keep your internal linking consistent. For example, don’t link to http://www.example.com/page/ and http://www.example.com/page and http://www.example.com/page/index.htm.
  • Use top-level domains: To help us serve the most appropriate version of a document, use top-level domains whenever possible to handle country-specific content. We’re more likely to know that http://www.example.de contains Germany-focused content, for instance, than http://www.example.com/de or http://de.example.com.
  • Syndicate carefully: If you syndicate your content on other sites, Google will always show the version we think is most appropriate for users in each given search, which may or may not be the version you’d prefer. However, it is helpful to ensure that each site on which your content is syndicated includes a link back to your original article. You can also ask those who use your syndicated material to use the noindex meta tag to prevent search engines from indexing their version of the content.
  • Use Webmaster Tools to tell us how you prefer your site to be indexed: You can tell Google your preferred domain (for example, http://www.example.com or http://example.com).
  • Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details. In addition, you can use the Parameter Handling tool to specify how you would like Google to treat URL parameters.
  • Avoid publishing stubs: Users don’t like seeing “empty” pages, so avoid placeholders where possible. For example, don’t publish pages for which you don’t yet have real content. If you do create placeholder pages, use the noindex meta tag to block these pages from being indexed.
  • Understand your content management system: Make sure you’re familiar with how content is displayed on your web site. Blogs, forums, and related systems often show the same content in multiple formats. For example, a blog entry may appear on the home page of a blog, in an archive page, and in a page of other entries with the same label.
  • Minimize similar content: If you have many pages that are similar, consider expanding each page or consolidating the pages into one. For instance, if you have a travel site with separate pages for two cities, but the same information on both pages, you could either merge the pages into one page about both cities or you could expand each page to contain unique content about each city.

Don’t block Google from duplicate content. Google advises against this, because it won’t be able to detect when URLs point to the same content, and will have to treat them as separate pages. Use the canonical link element (rel=”canonical”).

Note: there are reasons why Google might skip your Canonical link elements.

It’s important to note that Google doesn’t consider duplicate content to be grounds for penalty, unless it appears that it was used in a deceptive way or to manipulate search results. However, that seems like one of those areas, where an algorithm might leave room for error.

Here are some videos with Matt Cutts (including a couple of WebProNews interviews) talking about duplicate content. You should watch them, if you are concerned that this might be affecting you:

This one comes from Google’s Greg Grothaus rather than Cutts. Also worth watching:

If you think you’ve been wrongfully hit by the Penguin update, Google has a form you can fill out to let them know.

More Penguin update coverage here.

Tell us about duplicate content issues you’ve run into in the comments.

Google Penguin Update: Don’t Forget About Duplicate Content
Top Rated White Papers and Resources
  • don

    what about service oriented business that serve multiple cities, they lack consistency on enforcing this if this is really a “Penguin” update

  • http://blogsnewsreviews.com/2011/the-great-swiss-army-knife-debate-best-swiss-army-knife-tinker-swiss-army-knife-which-swiss-army-knife-is-best-biggest-swiss-army-knife Astro Gremlin

    Nice penguin photo. I think I know the one in the 14th row, 45th from the left. Seriously, can a search engine jump the shark? Going from panda to penguin . . . . Epic content, Chris, you worked hard to put this together.

  • http://www.vertexplus.com/ Custom Software Development India

    It is definite good update of Google. I really appreciate from penguin updates and I think keywords stuffing, duplicate content, Hidden text and links, Cloaking will be closed quickly and It was necessity for natural linking and quality linking…. Please keep it up excellent updates.

  • Batman OfGotham

    This update is bull_sh*t. I look upon the data closely as well as the site. Yet I cannot find any issues those mentioned above. The site gets a 45% traffic loss because of this stupid update. Time to get armed vs. Google!

    Much worse I could a lot of error pages ranking above truthful ones. This is how Google execute their idiotic work.

    • http://www.iquantum.com.au/search-seo/pay-per-click.php Adwords Management

      First i would like say thanks for this great explanation regarding google pangui update.But google had done approxmatly same updates in google panada.One thing is keywords stuffing.
      In your detail one thing is mentioned that about canonical url.Can u please magnify this stuff?


  • http://www.noelcunningham.net Noel Cunningham

    I think this latest update has given us all some serious “food for thought”…. I dunno about anybody else but I think the days of solely relying on SEO for all your traffic needs are over. Multiple streams are needed if you are to really protect your online business..

    Just my 2.0 cents


  • http://www.lovepromos.com Jason

    Seem to be several components of Penguin. If you’ve been penalized, take a look at your anchor text. Avoid high % of anchor text = your keywords. Make sure LOTS of your links are things like “click here” and your URL.

  • http://www.ilianalola.com Iliana Lola

    I think penguin update will prove very helpful to sites with great and unique content. Duplicated content was always bad, but I hope now more webmasters will understand this.

    • henkl

      JC, 2 popups on your homepage Illiana!

  • http://www.theclixgroup.com/ James McMinn

    My question is how do you explain two sites that are similar, covering different topics and with unique content. Both used similar SEO tactics, however one is flourishing after the Penguin update and the other one has fallen off the map. I wrote about the Penguin Update as well. If you want to take a look and comment, please do. I would love to start some conversation about what I wrote.

  • Steve Westing

    Before I could search Google for a local business using the name Tottenham and the service description and it would show local businesses in Tottenham, Ontario. Now when I search the same I get listings for Tottenham, England. What ever Google changed today is absolutlely stupid and they are really going to the dogs. Time to start trying good ol’ Yahoo.

    • http://www.itrustgodonly.com John P

      I stopped using Google search a long time ago. Could not stand fishing through all the garbage sites. Bing is much better. I get a lot better results.

    • http://www.dovertocalais.org.uk/ Brian

      I’ve found the same, Steve, but from UK. Recently Google results seem to be including plenty of Australian and US sites, even though I’m using Google UK etc.

  • http://makemoneyonlinegetstarted.blogspot.in Javed Shaikh (make money online)

    I think there is not much to fear about penguin peoples are afraid because they don’t know much about it, as the time goes and a more clearer picture arrives people will find the way and everything will be back to normal.

  • http://www.inboundmarketingexpert.org Ganesh Behera

    Exactly, i came across the many post saying this, that my site got affected because of penguin update, but they are not actually considering what the black activities they have formed on their.

    Now some people use to say make practice of Black hat techniques after penguin update, today i read a news from search engine round table.

  • http://www.konstantinfo.com konstantinfo

    Here is a reason that why i believe on this update, i regular watching a site ranked #1st in google search results with a high competitive keyword from last 4-5 months, but after this update this site is not in search results now.

  • http://www.konstantinfo.com konstantinfo

    that site placed their invisible links on irrelevant sites.

  • http://www.windows7port.com/ Ranganath Basha

    Google Penguin Update is for sites which are into too much “spamming methods” like keyword stuffing. I really love the videos of Matt Cutts and you’ve wonderfully presented in an order.

    Now there’s confusion whether one has been hit by Google Panda or Google Penguin update. Several SEMs have proposed various views and tips to encounter these updates. Read these recovery tips for Google Penguin update.

  • Jeno

    You write: Don’t block Google from duplicate content.

    Any reference for this? Why would it be bad?

    • http://www.webpronews.com/author/chris-crum Chris Crum

      From Google’s help center:

      Google does not recommend blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can’t crawl pages with duplicate content, they can’t automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel=”canonical” link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools.

      • Jeno

        Thanks for the link Chris! Still I am having a hard time understanding this. So even if I block Google bot from accessing some pages that are low quality, it may still be bad for me? Even though the bot has no idea what is on that page because I have blocked it?

      • BatmanofGotham

        Funny how you recommend onsite SEO technicals when in fact this penguin update punishes site that implements SEO. This non-sense is getting confusing.

        I do have sites that implement as you have stated, yet slapped mercilessly by penguin and there are no-black hat in the site while links are natural.

        Why? Simply because I implement SEO on the site and this update doesn’t like it.

        This update should be reverted!

  • http://sanjeevseoservices.blog.com Sanjeev Rana

    Thanks so much for this post. You’ve shared so much insight and information that is greatly appreciated.

  • http://thebrickblogger.com Thita

    Google is one of the most arrogant companies I have ever seen in my life! Quote from Matt Cutts from above: “The change will decrease rankings for sites that we believe are violating Google’s existing quality guidelines.”

    Violation is an extremely strong word! It means that there is some rule or law that somebody breaks. Does Google own the Internet? Do they own every single website? Or are they the Internet police? By who’s appointment?

    They may say something like a website is not in line with their quality guidelines, but to say that a website VIOLATE their guidelines means that somebody did something illegal. Google comes up with a bunch of random guidelines (which they constantly change) in their plush headquaters then force those guidelines down the throats of millions of webmasters and web-users as they were law.

    What the saddest part is that people tolerate this BS and continue bowing to this out of control company. :(

    • http://goodcontentwebsites.com Paul

      Thita…Excellent observation. I too have those same thoughts. Webpronews gives me some great insight as to the latest google updates, and algorithm changes. My head spins trying to implement those changes on our site.

      For the most part these changes work, but I often wonder why? Why are we under constant pressure from the google police.

      Most of us are trying to provide great content and at the same time keeping up with these changes is a BIG chore.

      The “Panda, Penguin Update”? What will the next update be called? Maybe the…”Skunk Update”?

    • http://guzmansgreenhouse.com Paul

      Wow…your thoughts are almost identical to mine. As it stands now there is no alternative! Or is there? Google does not own my website and should not “Violate” my rights as to how I conduct my online business.

      I commend webpronews.com for sending me the latest google updates. Even tho it means another round of editing, researching and finding SEO experts who might agree or disagree.

      First there was panda, then penguin…what’s next – Skunk!

      I would like to know what enforcement agency they belong to?

      • Denise

        Skunk should have been the 1st update name not Panda. Funny how they use cute innocent animals to name their updates. They pretend their updates are to combat spam techniques. I’ve heard that story before. I have some other names they should call their updates like Snake, Satan, Black Widow Spider, Lucifer, and so forth.

  • http://google-guide.blogspot.co.uk/ Jonathan

    I’ve found one major issue with a site that it hosted on Adobe’s Business Catalyst system.

    It’s for a site at http://www.globalaircharter.co.uk. Google seems to have decided to index the default ‘test’ url, treating the real domain as a duplicate, except for (oddly) the home page.

    This has caused the site to lose a lot of places in Google over the last 4 days – from second page to out of top 10 pages. It’s a major issue for a commercial site.

    • Richard

      100% agree,
      Mr Cutts has a very arrogant attitude in one of those videos basically he says like it or lump it.

      Don’t like how google is forcing their products on people in the last couple of months! is it related to recent updates??

      1. Google+
      2. Crappy Youtube videos on page 1.
      3. Ad words also picked up business recently.

      Previously I was page 1 for a key word search on google.co.uk
      for rentals in Thailand but now 2 UK based pages sites displaying my OWN LINK TEXT are above me!! Must admit that’s funny but also a bit stupid.

      • Richard

        Oops this should be answer to Thita above

  • http://www.organikseo.com DJ

    Chris, thank you for the info. I get nervous every time I hear a new update is out but after studying what the updates are focused on targeting there is never really anything to worry about. For the most part I think this update is only geared towards targeting the users who are truly abusing the system.

  • john

    Any idea about linking to other sites eg: If we sell cameras instead of having all the tech details if we say check out canon.com with a link for further details?


  • http://whiztoy.com Conan H

    If we use the “no follow” tag in the link from a page with unique content to a page with duplicate content, will that be penalized? It seems a bit too much work to “fiddle around” with “canonical” and URL parameters in the webmasters panel.

    Sometimes for webmasters who are not clear about how “URL parameters” work in the webmasters panel, they can make mistakes and improperly configurate their links.

    I hope I make sense?

    It would be simpler to just use a “no follow” tag.

  • http://seoblogger4.blogspot.com subhan

    Google must be careful, if they still update like this idiot algorithm it will make another SE big. Seo’er, blogger and internet marketer give big contribute to make google so big right now.

  • http://www.insurance4all.co.uk Anna

    Hit and hit again….I’m all about making money from Google adsense so don’t blame them for hitting me in terms of my site being not that informative, but from making around 400 euros per month, I’m not at about 65 euros…not good. Kids are now eating supermarket own brand for tea……:(

  • http://www.venuepages.co.za Conference venues South Africa

    Thanks, great article. I run a database site so each pages is the same with different data for each entry. Approx 2000 entries. Would this constitute duplicate content?

    Also the site also duplicates content every 3 days by taking each dynamic entry and turning it into a static pages. This was done to try to get Google to index the static pages rather than battle with trying to index the dynamic pages. Any direction here would be appreciated as I am afraid that this excecise might be considered dupkicating it for black hat purposes.

  • http://www.peoplesinnovation.com zia

    Nice article with good info..

  • http://www.ukblackdating.co.uk Reece Coleman

    I’ve noticed MORE out of date, spam my sites appearing in my search results since this update than before! I don’t want sites that haven’t been updated or used since 2007 in my results, unless I’m looking for historical information! My own site, previously ranking at 4th for the phrase “black dating”, has vanished out of the top 200, despite no guideline warnings from Google, no errors on Webmaster tools and no issues that I can detect. As a result, I now have to spend more on PPC to keep my traffic up and who benefits from that? Why, Google of course! Can anyone else see the problem with the company that controls search, with no transparency how it manipulates results, also making vast amounts of money from advertising on said search engine? Call me cynical, but I don’t trust them one little bit!

    • http://www.itrustgodonly.com John P

      You’re right about that. I needed some up to date info and I had a heck of a time trying to find some that was up to date. I forget that I was in Google search (was looking to see where some of my sites were) because I usually use Bing.

  • http://www.techvelvety.com Soniya Kapoor

    It seems that again google is going to give a high voltage shock to seo companies using black hat techniques…


    it will definately impact on ranking of existing websites.

    thanks !

    Warm regards,

  • http://netmedia-uk.com Alan Johnston

    I just keep writing fresh new content and I am chuffed to bits where my sites appear on Google!

  • http://www.sixthsensemarketing/ Bradley

    Interesting that the post is about quality SEO and there is keyword rich names/spam in the comments.

  • joe david

    Seems webmasters are having a hard time understanding the penguin update, imagine just regular business owners that don’t have this technical knowledge. This whole hand-wringing is ridiculous. Why doesn’t google just set up their own webdesign shop and for a price do our websites according to their bible and we’ll all be better off. Every webmaster I’ve hired, said they were a “guru” in their field, only to be debunked a few months later and a few thousand dollars poorer (me)…I’m sick and tired of all this double-talk and geek lingo that does nothing to further the real objective, which is to help business people or informative sites in their best light. Everyone is skirting the real issue, which is—google fine tunes everything to ensure you do adwords instead of getting a “free” ride with organic listings.

    • John P

      “google fine tunes everything to ensure you do adwords instead of getting a “free” ride with organic listings”. Never thought of that, Good Point!

  • http://softnwords.com Sam John

    I wonder why duplicate contents make such noise in the Search engine ranking

  • http://www.UseFreeSMS.com Send Free SMS

    Listen, at the end of the day, everyone who wants to be a legit website or business, needed to do these things way before they became “actual” updates to Google’s algo.

    Think about it, would you want to see the same identical store in the same area on the same street in the same mall 25 times. JcPenny for example. NO!

    Its the same thing with search results.

    Just follow their rules and regs, be patient and you will come out on top.

    Thanks for your insight man!

  • http://www.decktropic.com Ömer Feyyazoğlu

    dont try to fix it if its working: unnecessary move by google;
    first they encourage linking then try to discourage spam linking as a result , then people go against google no wonder, time for a new search engine to be born and will be pretty quickly ,

  • http://dollarstips.com Krishna

    White hat SEO for all, and black hat SEO, and other questionable methods for those who have monopolistic and autocratic rule over the internet.

  • bRad

    What I’m surprised noone is talking about is how this update is going to effect antitrust proceedings in Europe and potential antitrust violations in the USA. To me its pretty clear, if someone was developing a search engine specifically looking for the flags that good is “guidelining” down now and say that new “seo search engines” results were harmed because google is forcing everyone to play by there rules, to me, that is clearly and antitrust anti-competitive behavior.

    how will these guidelines affect ask and bing results?

    google is playing fast and loose on this update.

    Personally, if someone is paying to seo a site, then that tells me that site is at least marginally relevant to the keywords being seo’d for.

    after this new update, google’s serps are a freaking mess. so much so, ive been wandering over to bing and liking it.

  • http://cek-info.blogspot.com Asmara Susanto

    oh, my visitors down 90% finally my blog quiet from visitors.

    so, it make me disappointed :'(

  • http://www.mpbcontest.com/ssylvia Sylvia

    ok.. I am getting sick about this CIRCUS …

    it kills the “Normal IM People” and favors the BIG GUYS !!

    by the way.. I get “pissed off” about that Google try to think for me and control my acting’s !!!

    … OK.. spamming with BOTs is not a good thing.. BUT all this rules are Killers too !!

    WHO can really from “normal online workers” afford buying 100 DOMAINS ??? for try to make a “independent living ” and not living on “Gouvernements Mercies ” ???

    In this days, they should support the “little guys” and not the BIG ONCE !!

    anyway… I don’t make myself headache anymore for this BIG G !

    I got more focused now to YAHOO .. and hope they stay fair and clean for ever !

    … a lesson from History: in any “wars” when 2 Big Powers fighting (like Google + FB !) nobody will will on end…
    and SOME NEW BETTER ONCE will growing out from this …!

    RESULTS are already on the Horizont like Pinterest and others !

    wish everyone a relaxing and stress-free life !

    • Richard

      The Old God was Microsoft the new one is GOOGLE with much bigger muscles.
      So Bow down!!
      They have the power to effect the Worlds economy should they wish.
      Can’t believe I linked to these guys 15 years ago wouldn’t do any more. In fact give me a -1 button!!

  • http://www.captaincyberzone.com Cap’n Cyberzone

    I employ none of the stated new Penguin update targets yet have seen a fall-off within the last week.
    What is with this roller-coaster ride we webmasters have been on for the past year? Does the Goo really know what it’s doing or is it “trying” (beta) different tweaks at the content providers expense?!

  • sam goten

    I’m a bit confused, how does this affect link building and ancor text exactly? Anybody?????

  • Richard

    I don’t think I got penalized directly as it were by the update but hit by the localization update dropping international hits by about 40% did anyone else notice this?

    Just 1 example previously we were page 1 in the UK for the keywords “Houses For rent in MyCity Thailand” now such sites as holiday lettings.xx.uk, Tripadviser.xx.uk etc.. are page 1 google.co.uk

    Not an expert myself just wondering if that is possible?

    Regarding duplicate content we run a CMS site in 4 languages so duplicate content is unavoidable from what I have been told.

  • http://compsoul.com Jack Malone

    This definetely is much to look into. Webmasters have to worry even more about other sites copying there content and lose ranking through no fault of there own. Hopefully Penguin can tell which one is original and which is not but I just don’t have that much blind faith.

  • http://neilyamit.com/ neil

    I wonder if Authorship Markup makes a duplicate content acceptable.

    What do you think guys?

  • http://www.blogsonblogger.com/ Blogger Blogs Directory

    The Internet is filled with so many spammy sites and i think this new update and actions will be able to address this and help real unique content sites to get their good share of traffic from Google’s search engine.

  • T

    These punks should get girlfriends or something, or did google lock them into a matrix tank. So much fukin bullshit on do’s and don’ts that we’re supposed to keep up on? How about just letting us run our goddamned businesses the best we can and google coming to us for a change! Hard enough to keep our noses above the waterline, but this is just more bullshit!

  • http://www.seonorthamerica.com Tom Aikins

    I thought Google was always trying to get rid of low-quality, spammy pages? Why is this any different from what they’ve been doing for years?

  • http://www.thepowersellersunion.com/ Jason Gammon

    What about redirects to the home page, for example my webmaster tools was showing I had 2 websites one with the www and one without. I did a 301 redirect through my cpanel and my site has vanished was this a bad idea?

  • http://game-stalkers.com/ Stalker

    Note: “For example, a blog entry may appear on the home page of a blog, in an archive page, and in a page of other entries with the same label.”

    Still, isn’t Google the one that also advises in its forums “a webpage should be linked from more than one single place” for a better indexing? That is an internal SEO whitehat method. I also admit that archives should be no indexed, but from the home page and a category of a blog it really should be linked a certain article/page.

  • http://soft-navigator.com/ Navi

    Indeed Stalker, but there is a problem of duplicate content if you’re using the same page from multiple categories. There should be only one from a category and from the home page with an excerpt, taking WordPress for example.

  • http://www.nortad.com/ Alex

    It’s very hard, may be impossible(unfortunately) for google to win its battle against unethical seo practices without (1) harming a good number of decent sites and (2) allowing poor results to float atop, thus providing the opposite user experience they aimed at in the first place. I wonder why google isn’t taking notice of Bing’s performance, they seem to strike that balance.

  • http://mastokkenari.page4.me mastok

    google is more and more uneasy, increasingly many stupid things he does. google is not the internet police. Internet is a free world. anyone who wants to do business therein must respect this freedom. including Google. though it includes one of many large companies on the internet though.

  • Join for Access to Our Exclusive Web Tools
  • Sidebar Top
  • Sidebar Middle
  • Sign Up For The Free Newsletter
  • Sidebar Bottom