Beware The Duplicate Content Curse

Cached content may draw Googlebot's wrath

Get the WebProNews Newsletter:

[ Search]

One webmaster found Google unwilling to index pictures located in an images directory, but some extra content apparently left the site afoul of Google’s guidelines.

Here’s the short version: don’t stick cached content in a directory you want Google to index. Chances are the Googlebot will freak out and run screaming from your server.

Michael VandDeMar wrote at Smackdown how a simple test of indexing images in a subdirectory ended up with Googly accusations of webmaster malfeasance.

Opening a discussion on a Google Groups webmaster help discussion eventually attracted the attention of a Google staffer, John Mueller, who took a peek into VanDeMar’s images subdirectory and found some terrifying creepy-crawlies therein:

In particular regarding your /images/ subdirectory I noticed that there are some things which could be somewhat problematic. These are just two examples:

– You appear have copies of other people

Beware The Duplicate Content Curse
Top Rated White Papers and Resources
  • http://ez-onlinemoney.com/blog/ Josh Spaulding

    There is no penalty for duplicate content. Cutts has confirmed it as well as others at Google.

    I’ve seen PLR articles rank well for some relatively competitive terms. I’ve seen WordPress category pages (100% dupe) rank for some very competitive terms.

    Duplicate content MAY not get quite as much weight as unique content, although that’s not even for sure, duplicate content definitely won’t hurt you.

    It sounds like the example in this article was caused by shady practices. Why did they guy have cached copies of other people’s pages? He should have been penalized!

    • Guest

      Yeah, Cutts and the Google staff call it a filter instead of a penalty.  And I think they believe that’s what it’s supposed to be.  However, I beg to disagree.  When your site tanks because other people take your content, it’s not a filter.  It’s a penalty.  If it walks like a duck and talks like a duck….



    • Guest

      And yes, I do think this guy was up to something shady, so it’s a bad example to bring this issue to light.  I would never defend him for what he did.

    • http://www.beforeyoutattoo.com Marsha

      There is a penalty. While I admit to being an idiot, I watched my serfs sink from the number one-five positions for my keywords to pages 5, 6 and 7…if they showed up at all, when I duplicated my own articles on Zimbio and Qassia.

      Since Google has the articles on Zimbio cached in their results, even though they’ve been deleted since early May, the Zimbio articles still show up in the results before my site does.

      While the site’s been up for a while, I’m a newbie to promotion and driving traffic. It’s painful keying in a phrase and seeing site after site referring back to mine as the authority for the information, but my site is nowhere to be found. Lesson definitely learned about duplicate content.

      Using these Zimbio and Qassia were the only changes made during this period.

  • Guest

    Hmm, I am confused.

    Is it agains Google guide or not?

    Years ago I created page, let’s call page.html and it was linked to by other sites. Then I wanted to make-over the site with corresponding page names (this-desciption-to-page.html) to all pages, but as the old page names got references I kept them, too. Both have the same content, naturally.  So is it ok or not?


  • http://www.hotelcalculator.com Michal Wlodarski

    it’s been confirmed that a dupe page will be just ignored or will be ranked somewhere in the dark end of results.

    And to the first commenter: you shouild 301 your old pages to new pages. That way the link juice will be transferred too.

  • Fozzyuw

    This is too bad.  I found myself thinking of doing this myself.

    I often find sites that give me directions/tutorials/research on how to solve a problem.  I started making Forum posts about it with a links back to the site, however, these sites can change at any time and their content can go with it (or just be moved).

    I started thinking of copy/pasting their page into a file cached on my local server.  That way, should the site ever leave, I can still reference the information.

    Funny… it’s like of like Google’s own "cache" page in this regard.

    It’s too bad.  I can still keep the content, but now I cannot have it searched because Google will run it afowl of dups.  Now, it’d be possible to have solutions to people’s questions except a person trying to find it might not find it because it cannot be searched.

    What can you do?

  • http://www.trishjones.com The Blogging Queen

    I agree with Josh here … it really is a myth that Google delists your site for duplicate content and the only people winning in this game right now are the scaremongers and everyone who is anyone is jumping on the bandwaggon.

    I don’t even think this issue that is raised here can be put into the duplicate content category … as you said you said Dave, when Google had a look which, I thought was very good of them, they found some pretty scary things like cached pages of other people’s site.

    Matt Cutts as well as Jonathan Leger are the people go to to if you want the real juice on duplicate content.



  • Guest



    Hmm, I am confused."

    Why even risk it? 301 your old copy of the page with the same content (page.html) to new-page.htm. Why would you feel like you needed to keep the old copy with the same content? Why keep splitting link pop between the two copies? 301 the old copy and pass all that juice to existing links into the new copy for a grand total of links to one page that now has a better chance at improved rankings.

    But wait, that would mean you just tried to manipulate your rankings and according the the fascists over at Chinoogle, that’s spamming.

    Ba-ROCK ON!

  • Guest

    We website owners resent people telling us what content we can put on our sites and how we can put it there.  Since Google owns Google, just like we own our sites, maybe they have the right to decide what content get on their website and how.

    If you want to put content on my sites, you have to play by my rules…why is Google different?

  • http://www.cell-phone-numbers.com/ Guest


    Google has the right to have rules – if we don’t like them – TOUGH!  They are nearing owning the universe anyway.


    You said:  blaa blaa put cached pages into a different directory …


    WHY would anyone WANT to do that in the first place??  If you take a page down, REPLACE it with new content; has EVERYONE forgotten how to write these days?


    If you can’t, won’t or don’t have the time – then simply upload a 404 page to your server so it says OPPS! – try the root directory – which brings you back to your main site which….bloody well should be there.?


    Or am I missing some point here?




  • http://www.healthandwellnessarticles.com/ HWA

    Assuming this is true as described here, does the dupe penalty apply only to cached pages from other sites?  The reason I ask, is because most YouTube videos offer code to embed a given video into your own site.  Why would duplicate content penalties apply only to text content and not to videos?

  • http://smackdown.blogsblogsblogs.com/ Michael VanDeMar

    John didn’t say that it was due to the content being duplicated. He never said why having a cached copy of the other guys page (it was one page, by the way, not the entire site) would cause an issue… the other page he mentioned was a cached copy of a Google search. I know for a fact that when I discuss the searches that a) they will change (that’s not a maybe), and b) even if someone does the search right after me, they could possibly hit a different datacenter than I did.

    It irks the hell out of me when people discuss things they see in serps that they have no record of, especially when making outlandish claims. A couple of years ago I was blogging about enormous amounts of spam infiltrating Google’s index. The moment I would show a batch, Google would ban them all, making it look as if they had never actually been there in the first place. Therefore I started last year snagging a copy of whatever I was talking about to support my discussion.

    There is nothing shady or misleading about what I did in the least. Every single page is marked as being a cached copy very clearly, dated, and in the case of serps has the ads removed. It is no different whatsoever than what Google does with their cached pages. Period.

  • http://www.mercadeoporinternet.com/ Rafael Montilla

    Question? When you translate a article from English to Spanish.

    Is it Duplicate Content?

  • http://www.mariocahyadi.com Mario Cahyadi

    I think it is simple put on robot.txt all folder that you wont the crawler indexed.

  • http://www.Windowoncyprus.com Mistress

    The curse of duplicate content practically ruined our small company at one point. and put 12 families out of work.  The problem was that our successful sites at the top of Google were ripe for the picking, and I discovered that over 400 webmasters thought it acceptable to copy our content lock stock and barrel to their/their clients sites. As Google does not differentiate between the original content and the copies we ended up being penalised for our success. Surely with the help of such wonderful tools as the wayback machine Google and any other search engine could log the 1st original and penalise subsequent copies rather than rewarding the perpetrators and penalising the originals!. I can hear the cries of ‘ use copyscape and tell the thieves to remove the duplicate’s but the simple truth is that when 500 pages are copied 50 times each that is a massive job. The wonderful thing about the internet was that it levelled the playing field and gave small companies the opportunity to shine, Google seems to be busy putting the big battalions back at the top in more ways than this single example. I could wax lyrical for an entire book, but simply don’t have the time as I am busy writing original content for the webmasters out there who are too lazy to write their own to simply take! I was a fan of Google for many years, but their priorities seem to have been hijacked by those who have forgotten their founders ethos and have sold out to pure profit.

    • Guest

      I believe you because the same exact thing happened to me.  It has happened to many, many small business owners with content-rich sites. 

      I have wondered the same things. I have no answers, but I have to say that as sad and unfortunate as it is, it is also is always nice to see someone else who has faced the same issues and pondered the same solutions.  It gives me a little bit of hope that maybe if enough of us scream about it, Google will fix the problem.  It’s been going on for about 3 years now, and it’s getting worse rather than better. 

      I have had things taken from my site that have been on it for a decade, and as soon as they are taken, my content disappears.  It’s absurd.  I have 100% clean and W3C verified code and a smooth, interlinked site.  My stuff does very well until it’s taken, and if enough of it is taken, my entire site gets tanked.  It’s terribly time consuming to deal with it, and it’s impacting my ability to create new content. 

      Google has the best engineers, and they can’t figure this out?  Oh, wait, when people get screwed by duplicate content, they tend to buy more advertising.  My advice?  Don’t spend it on Google.


    • http://www.bluerayripper.net Blue Ray Ripper

      googel is changing ,and we worried about the future trend.

  • Guest (WPN reader)

    What if one site, up and running, is changing drop shippers. Instead of closing down the original pages, meaning the whole site, in order to put in new products and information, could the new products and info be made up but then hidden untill all pages are ready? The 1st set would be deleted and the new set unhidden and now the site, same content, same theme, but different products and their product info as from the new drop shipper?


  • Join for Access to Our Exclusive Web Tools
  • Sidebar Top
  • Sidebar Middle
  • Sign Up For The Free Newsletter
  • Sidebar Bottom