Cache Bashing: Google API Used In SEO War

Get the WebProNews Newsletter:

[ Search]

The Nigritude Ultramarine seo contest just got nastier with the addition of an application that uses Google’s API to scrape Google’s cached pages of Nigritude Ultramarine sites. The wily creator of this app then cloaks the stolen pages as if they were his, and since he’s got higher PageRank he kicks the site he copied out due to the duplicate site penalty.

Discuss cache bashing and site jacking in WebProWorld.

There's more than one war taking place: The war for your website...
There’s more than one war taking place: The war for your website…

Michael Brandon, who posts as t2dman in WebProWorld, was a victim of cache bashing for his Nigritude Ultramarine site. His site went from 3rd to 103rd in 3 days.

In his recent post he said, “you have heard of the duplicate site penalty? You have heard how if a website scrapes your pages, you can get their IP address from your logs and stop them coming back? But what if your site is scraped from Google’s cache of your page. What comeback do you have then?”

Soon after t2dman’s post BlueFalcon, the programmer who developed the API “cache bashing” tool, posted a response.

“Nigritude-ultramarine.new-frontier.info has been developed to be automated and uses the Google API to retrieve content linked to the SEO challenge keywords. If you wish to prevent the application from using your Web site’s content via Google’s cache system, then I recommend you use the ‘noarchive’ META tags as explained by Google’s help pages.”

He said he designed his site “to show how existing companies and individuals may be using the Google API to falsely emulate better rankings than their competitors. My entry shows this vividly and I hope at the end of this competition this is shown to be a major factor of how Google is induced to show irrelevant results.”

BlueFalcon’s site, which simply reads “can a Perl programmer emulate top ranking in SERPs? Here is my try :),” is currently in 7th place 05/20/04.

T2dman replied back “the competition is about getting to the top with no rules. This includes there being no rule against reporting BLATENT blackhat to the ultimate arbitrator Google. Google does have rules.”

BlueFalcon described his methodology: “it may interest you to know that the ‘mother’ Web site, www.new-frontier.info, contains two Web pages, the index page and a sub-contact page. None rely on cloaking as they serve a non-search engine controlled purpose.”

It’s their dictionary.new-frontier.info subdomain that does the dirty work.

During development the dictionary site gained unexpected PR. They moved the site to “‘open-dictionary.com’ and rebranded ‘Open Dictionary’. To maintain the inherited PR of the pages residing on dictionary.new-frontier.info but without presenting users with the test Web site, I used mod_rewrite to control URL redirection depending on if Googlebot visited or a normal user.”

If I understand correctly, it’s BlueFalcon’s dictionary site that’s responsible for the cloaked pages’ high ranking.

This tactic suggests to me at least two major dangers – the first is that your competitors can copy your site, cloak it, and possibly knock your site out with Google’s duplicate content filter. I’m looking into this more, so don’t get all worried. I’m not sure how widespread or dangerous this could be.

The other danger I see is an automated “cache basher” like BlueFalcon created, which allows him to scrape Google’s cache and cloak sites based he finds there for Nigritude Ultramarine. Is this something you should worry about? Not yet.

If this practice becomes widespread though – and I can assure you there are people out there duplicating BlueFalcon’s tool – then your best bet will be to do as he advises: “use the ‘noarchive’ META tags

T2dman pins blame for cache bashing directly on Google. “There needs to be as harsh a penalty plus more for the culprit – manual penalty seems the only option, since the Google dup penalty algo as it stands has penalised the wrong pages.”

Garrett French is the editor of iEntry’s eBusiness channel. You can talk to him directly at WebProWorld, the eBusiness Community Forum.

Cache Bashing: Google API Used In SEO War
Top Rated White Papers and Resources
  • http://www.bulgariasfinest.com Jim Lawlor

    I guess there will always be an unscupulous element to search, such a shame

  • http://www.giftexposure.co.uk Dave

    You are right Jim

  • http://www.business-sale.com Chris

    Content hijacking is a huge problem, Matt Cutts recently said Google are looking at the situation

  • http://www.the-little-things-in-life.co.uk Peter

    I have had my content hijacked more than once, copyscape helps find the culprits

  • http://www.seethrureviews.com/Rocket-Spanish-Download-Reviews.html Rocket Spanish Reviews

    This is a major problem for search engines today! So many times, my indexed website content gets ripped, and copies word for word on some other website like ezinearticles and achieve better rankings then me. On top of that, I’m the original content creator is the one that get penalized.

  • http://www.londoncitylandscapes.co.uk Paul

    I believe the site first indexed is now judged the content owner

  • http://nysgo.com nycgo

    The noarchive tip was helpful. We utilize RSS Feeds and had thought that was the problem so we made sure only a summary was in the feed which stopped some sites but others we getting their fill.

    Hopefully, this will correct it in a few days.

  • Guest
  • Join for Access to Our Exclusive Web Tools
  • Sidebar Top
  • Sidebar Middle
  • Sign Up For The Free Newsletter
  • Sidebar Bottom