Duplicate Content on Google, Bing & Yahoo

Google: Cross-Domain Canonical Tag This Year

Get the WebProNews Newsletter:

[ Search]

Duplicate content is a common occurrence on the web and in many cases can hurt search engine rankings. While the search engines may not always technically penalize webmasters for duplicate content, there are still a lot of ways it can hurt.

WebProNews is covering the Search Marketing Expo (SMX) East in New York, where representatives from the three major search engines (Google, Yahoo, and Bing) discussed how their respective web properties handle duplicate content issues. Following are some takeaways from each.

Duplicate Content in Google

Duplicate Content on Google - Joachim KupkeThe way Google handles duplicate content has been discussed a lot in recent memory. This is largely due to a video Google’s Greg Grothaus uploaded, in which he discusses at length, the way Google handles a variety of different elements of the duplicate content conversation.

Joachim Kupke, Sr. Software Engineer of Google’s Indexing Team reiterated much of what Grothaus said. He also said that Google has a ton of infrastructure for content duplication elimination:

- redirects
– detection of recurrent URL patterns (the ability to ‘learn’ recurrent url patterns to find duplicated content)
– actual contents
– most recently crawled version
– earlier content
– contents minus things that don’t change on a site

Kupke said to avoid dynamic URLs when possible (although Google is "rather good" at eliminating dupes). If all else fails, use the canonical link element. Kupke calls this a "Swiss Army Knife" for duplicate content issues.

Google says the canonical link element has been tremendously successful. It didn’t even exist a year ago, and is has grown exponentially. It has had a huge impact on Google’s canonicalization decisions, and 2 out of 3 times, the canonical tag actually alters the organic decision in Google.

Google says a common mistake is designating a 404 as canonical, and this is typically caused by unnecessary relative links. So, avoid changing rel="canonical" designations, and avoid designating permanent redirects as canonical.

Also, do not disallow directives in robots.txt to annotate duplicate content. It makes it harder to detect dupes, and disallowed 404s are a nuisance. There is an exception however, and that is that interstitial login pages may be a good candidate to "robot out," according to Kupke.

Kupke says that canonical works, but indexing takes time. "Be patient and we WILL use your designated canonicals." Cleaning up an existing part of the index takes even longer, and this may leave dupes serving for a while despite rel=canonical, Kupke adds.

At SMX, Google announced that cross domain rel=canonical is coming within this year. So for example, if the Chicago Tribune has an article on the New York Times, and the rel=canonical points to the Chicago Tribune then Google will only credit the Chicago Tribune with the content.

Duplicate Content in Bing

Sasi Parthasarathy

As far as how Bing views duplicate content, intention is key. If your intent is to manipulate the search engine, you will be penalized.

Sasi Parthasarathy, Program Manager of Bing says to consolidate all versions of a page under one URL. "Less is more, in terms of duplicate content." If possible, use only one URL per piece of content.

Bing isn’t supporting the canonical link element (as a ranking factor) yet, but it is coming. They do say to use it, but it’s just not really a ranking factor in Bing yet. Bing says that there has been an increase in the usage of canonical tags in the past 6 months, but adoption issues still exist. According to Parthasarathy, 30% of canonical tags point to the same domain (which is fine), and 9% use it to point to other domains. This could be a mistake or it could be manipulative. Bing says they will look for other factors to try and determine which it is.

Bing says canonical tags are hints and not directives. "Use it with caution," and not as an alternative to good web design.

With regards to www vs non-www, just pick one and stick with it consistently. Remove default filenames at the end of your URLs. Bing also says 301 redirects are your best friend for redirecting, use rel="nofollow" on useless pages, and use robots.txt to keep content you don’t want crawled out.

Duplicate Content in Yahoo

Cris Pierry

If everything goes according to plan, you’re going to need to worry about how Bing handles duplicate content if you’re worried about how Yahoo handles it, but Yahoo’s Cris Pierry, Sr. Director of Search, offered a few additional tips.

Pierry says descriptive URLs should be easily readable, and it’s not a good idea to change URLs every year. In addition, use canonical, avoid case sensitivity, and avoid session IDs and parameters.

Pierry also says to use sitemaps, and submit them to Yahoo Site Explorer. Improve indexing by proper robots.txt usage, and use Site Explorer to delete URLs that you dont’ want Yahoo to index. Finally, provide feeds to Yahoo Site Explorer, and report spam sites linking to you in Site Explorer.

Yahoo says metadata and SearchMonkey are enhancing presentation.

WebProNews reporter Mike McDonald contributed to this article from SMX East.


Duplicate Content on Google, Bing & Yahoo
Top Rated White Papers and Resources
  • http://www.lexolutionit.com Maneet Puri

    Thanks! This was indeed very informative. The issue of duplicate content is a serious one. Often you end up being a culprit even when you did not intend any malicious tricks.

    • http://careawaycarrent.plazathai.com weera

      Something may be mistake,an excuse me please with yours sincerely

  • http://www.marketsitepro.com MNK

    What constitutes duplicate content? Is there a percentage of similar content that is then tagged as duplicate content? and how important is it, if it is on the same website? For example, you write original content for something like “golf shoes” . Then you write another page for “ladies golf shoes”. What percentage of the content must be different?

    • http://www.homebusinessmall.biz Johannes

      This is just chinese words to me. I am thankfull to really find something like this and people like us get educated and will be carefull or best use professionals to handle my site.

      • http://careawaycarrent.plazathai.com weera

        the duplicate content with both thai and english in order to understand whom be thai people in your country but don’t chiness langauge, an excuse me if make you conflict.

    • http://careawaycarrent.plazathai.com weera

      I don’t know truely. Why ‘s it happened? may be I don’t understand, an excuse me please

    • http://www.google.com/profiles/StanleyMathis Stanley Mathis

      Does your Congressman, or woman, message change because they are in different market? The way I view each account I’ve established throughout the internet, is, “These members, here, must here the same exact successful messege I’ve shared with others in the environment where they dwell most often”. There’s nothing wrong with duplicating a successful messeg. That’s what news medias do all the time. And reporting (with outbound links) where the major search engines are indexing your website use to be referred to as “Truth in advertising”. I’m not penalized by Google, Yahoo!, or Bing, because, they receive many opportunities for their advertisers to prospect my prospects, and that translates into more revenue opportunities for everyone.

      It’s about staying on messege with what works for me. But if I didn’t make sense in the expression of my ideas, and didn’t have a few credentials (2002 Marquis Who’s Who in America) in my pocket, my websites wouldn’t be as relevant to Google, Yahoo!, and Bing. Bottom line.

      That’s free enterprise.

    • http://www.hargate-hall.co.uk Hargate

      I have a similar issue. On our website I talk about self catering weddings on one page and self catering holidays on another and also self catering accommodation, and of course this leads to a lot of duplication and repetition between the pages. However I found that the information needed to be on separate pages as they are targeting different audiences as when it was all on 1 page people found it confusing.

      However i assume that in doing this I will be diluting my search ranking for “self catering” on its own. the concept of being able to direct the search engines and spiders to one page using the canonical tag seamed like a possibility for me at first, but the content is not identical and also needs to be found seperately anyway. Still I found 2 pages that were the same and had different URLs due to ease of navigating around the site so I will use it there.


    I don

    • http://careawaycarrent.plazathai.com weera

      I can’t read your langauge that you write

  • http://www.suround.com suround

    Duplicate content currently still happens on the blog beginners and professional blog. I often find this, when visiting at a friend’s blog. its frequent duplicate content in the blog posting, indicating blog owners a less creative and less appreciate for the work of others. Duplicate content in the eyes of search engines greatly affect the search results to be inaccurate, because sometimes duplicate content better its position. Duplicate content for the website can sometimes lower the loyalty of visitors to a website. until now, I’m still learning on the big blogs to increase knowledge.

  • Gelly

    Hi, What steps, if the site will get banned in Google?

  • publisher

    In the first place, Google lies to everyone on a regular basis. Every time I’ve tried to implement the latest “you gotta do it this way” statements from Google I’ve gotten burned.

    I’ve given up paying any heed to their statements and my primary site continues to be #1 in their results solely because it’s aged in.

    As for site maps, I have sites with and without them. I see no benefit at all. As near as I can tell, site maps are basically useless. With or without them, the spiders still need to crawl and follow the links to index the content.

    Most important though, what right does Google have to dictate to publishers what content is on their sites, how that content is organized, or how many times it might appear? In sales, repetition is often the most important element to making a sale. You need many pages to present much of the same information to reinforce your message to readers and to reach different market segments. We don’t all have Google’s billions of dollars in marketing money or the ability to have the media write a story every time we sneeze. Google’s actions equate to an unreasonable restraint of trade.

    Personally, I think some greedy enterprising law firm should file a class action against Google and demand that they either 1) specifically state clearly and openly what their indexing methods are; or 2) index everything and exclude nothing.

    Their “not evil” motto is just marketing crap. They are just as evil as all corporations that get a little too big and think they can dictate to the world for the benefit of nobody but themselves.

  • http://www.seovisions.com Todd

    “Kupke said to avoid dynamic URLs when possible (although Google is “rather good” at eliminating dupes). If all else fails, use the canonical link element”

    “It has had a huge impact on Google’s canonicalization decisions, and 2 out of 3 times, the canonical tag actually alters the organic decision in Google. ”

    How can these two statements be true at the same time? If two out of three times the canonical link tag impacts and organically alters Google’s decision, and if we presume most people are using the tag properly, then how can statement a also be true – that Google is rather good at eliminating dupes?

  • http://gehspace.com/meusitenaprimeirapaginadogoogle Alexis Kauffmann – SEO (Otimiza

    They want me to add nofollow here, canonical there, according to rules they never clearly explain because they don’t want us to figure out their algos.

    Nonsense. My job is to create content, Google’s job is index and rank that content. I don’t ask Google to create content, they don’t ask me to help index and rank the web. If they wanted my help, they would clearly state what they want me to do, like: “Nofollow your category and tag URLs in WordPress blogs”, “add rel=canonical to product search pages in your OS Commerce store” and so on.

    To 99% of bloggers and editors out there, statements like “detection of recurrent URL patterns (the ability to ‘learn’ recurrent url patterns to find duplicated content)” is meaningless.

  • Guest

    I found this presentation so full of BS that I am tempted to cancel. When are you going to actually address SEO???

  • The Truth

    What a load of crap coming out of Google what about RSS xml feeds how many sites use the same feed is that NOT duplicate content?

    Google is only good at one thing scare tactics

    1 SandBox scare tactics (Not to optimize Sites) No Such thing

    2 Cloaking (10 years not detected) catch if you can

    3 Looking at CSS simple

    # All Bots Disallow
    User-agent: *
    Disallow: /CSS/

    All rubbish

  • http://www.howtogetsuccess.com Yehuda R.

    My site www.howtogetsuccess.com has system that bring all the articles from several directories about How to Get Success and Google is not indexing it. Just to let you know. Im thinking to start with another directory but with only unique articles because the domain is very good to be developed.

  • blaker19

    You write an article for an industry trade magazine – they put it on their website but we also include it on our website. Or is this for duplications within a website?

  • http://storecomp21.blogspot.com Guest

    i think many Dc in the google but not realy to index may be has any edditing with owners

  • http://ajemailmarketing-software.blogspot.com Email Marketing Tools

    Thanks for sharing this useful post. I like to visit your blog and it is have interesting writings about how to find duplicate content. Keep posting!

  • http://www.seoshop.org seo

    Chris,thanks for sharing this useful post.

  • http://www.gocompareremovals.co.uk Removals Company

    You lost me after the sixth paragraph! Ideally a less techy version would be appreciated regarding duplicate content.

  • http://www.vicktrade.com Fania

    Thanks for your posting. It is really useful.

  • http://www.wc-news.com/ Igor

    If we sell computers or other tech stuff, we need to publish technical details which will be similar on every web site?
    Are we going to be penalized for such “duplicate content”??
    What if we sell similar technical product which are the same product, but only with more memory, or stronger engine, processor, bigger screen?

  • Guest

    In the above article you mention the following;

    So for example, if the Chicago Tribune has an article on the New York Times, and the rel=canonical points to the Chicago Tribune then Google will only credit the Chicago Tribune with the content.

    That is a great tool for article marketing as many of my articles are syndicated.

    Would I have to ask everyone who syndicates my article though to ad a canonical link element for me? or is there an easier way?

    Hope someone can answer this for me.



  • http://www.seoworkblog.com/ SEO Services| SEOWorkBlog

    Search engines are meant to help people and we should also respect their guidelines and move accordingly rather then confusing them. Content should always be used original and canonical tags must be used if its really important to use them. Yeah, thats also right that we should use sitemap also as it helps the search engines to easily crawl the website and generate great results.

    Thanks for the useful information but I personally suggest to go for the ethical way and original content, may be its a hard way but your work will definitely respected by search engines. Take help of quality SEO services if you have problems with writing original content.

  • Join for Access to Our Exclusive Web Tools
  • Sidebar Top
  • Sidebar Middle
  • Sign Up For The Free Newsletter
  • Sidebar Bottom