iEntry 10th Anniversary RSS Newsletter Advertising
Visit Twellow.com
Text: Decrease Font Size Increase Font Size | Print Print Article | Share: Delicious Digg StumbleUpon Post to Twitter Post to Facebook
19 commentsWednesday, May 28, 2008

Beware The Duplicate Content Curse

Cached content may draw Googlebot's wrath
One webmaster found Google unwilling to index pictures located in an images directory, but some extra content apparently left the site afoul of Google's guidelines.

Here's the short version: don't stick cached content in a directory you want Google to index. Chances are the Googlebot will freak out and run screaming from your server.

Michael VandDeMar wrote at Smackdown how a simple test of indexing images in a subdirectory ended up with Googly accusations of webmaster malfeasance.

Opening a discussion on a Google Groups webmaster help discussion eventually attracted the attention of a Google staffer, John Mueller, who took a peek into VanDeMar's images subdirectory and found some terrifying creepy-crawlies therein:

In particular regarding your /images/ subdirectory I noticed that there are some things which could be somewhat problematic. These are just two examples:

- You appear have copies of other people’s sites, eg /images/viewgcache-getafreelinkfromwired.htm
- You appear to have copies of search results in an indexable way, eg /images/viewgcache-bortlebotts.htm

I’m not sure why you would have content like that hosted on your site in an indexable way, perhaps it was just accidentally placed there or meant to be blocked from indexing. I trust you wouldn’t do that on purpose, right?

VanDeMar keeps those cached copies to support his discussions, as such pages can and will change regularly, or disappear altogether from sites. Doing this in a place where Google expects not to find such content evidently put him in a tough spot with the search engine, as Mueller suggested it ran counter to Google's webmaster guidelines.

The difficulty appears to be in the nature of the cached pages. Mueller thinks it's duplicate content, VanDeMar believes it isn't, based on his reading of the guidelines; he further questioned why the entire subdirectory received a delisting from Google.

The obvious solution, as one commenter suggested, would be to place the cached pages into a different directory and tell the Googlebot to stay out of it. Whether or not it's the fairest solution for webmasters won't figure into the decision, as Google has really dug in on quality issues it perceives over the past year.

Keeping cached copies of content sounds like a prudent course of action to take. It helps keep site visitors from clicking into a non-existent page, which makes the linking site look bad. If Google consistently dumps subdirectories that mix cached and original content because the company thinks duplication is in effect, webmasters will have to alter their linking structure to accommodate the fussy Googlebot.

Duplicate Content

There is no penalty for duplicate content. Cutts has confirmed it as well as others at Google.

I've seen PLR articles rank well for some relatively competitive terms. I've seen WordPress category pages (100% dupe) rank for some very competitive terms.

Duplicate content MAY not get quite as much weight as unique content, although that's not even for sure, duplicate content definitely won't hurt you.

It sounds like the example in this article was caused by shady practices. Why did they guy have cached copies of other people's pages? He should have been penalized!

duplicate

Hmm, I am confused.

Is it agains Google guide or not?

Years ago I created page, let's call page.html and it was linked to by other sites. Then I wanted to make-over the site with corresponding page names (this-desciption-to-page.html) to all pages, but as the old page names got references I kept them, too. Both have the same content, naturally.  So is it ok or not?

 

Publish A Comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
1 + 7 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
SEARCH
Popular WPN Business Resources












Subscribe to WebProNews


Send me relevant info