Visit Twellow.com

SES NY: Duplicate Content Jeopardizes Your Site

Those all-important search engine rankings you desire for your website could be in peril if you utilize duplicate content that runs afoul of search engine guidelines.

Follow All The SES NY Happenings At WPN
Have you encountered problems over duplicate content with your site being indexed? Were you able to resolve those problems and get reincluded in a search index? We would like to hear about it at WebProWorld.

Mike McDonald observed this session from SES 2006's day two agenda, and passed along some very good information for our readers.

Plenty of websites try to pull tricks that will move them up in the rankings. People like Matt Cutts take gleeful delight in exposing the likes of BMW and Ricoh and booting them out of Google's index. Anyone thinking Google or Yahoo wouldn't ban their site should think again.

Presenters at the SES 2006 New York session, Duplicate Content Issues, discussed the dangers of duplicate content. It comes in many forms, like multiple domains for the same homepage content; multiple links to several domains for one site; and "doorway" pages, according to Anne Kennedy, managing partner at Beyond Ink.

Yahoo does not want those multiple sites, and neither does the Open Directory Project. Since these are both places where search engines tend to start looking for content to index, being kicked out of them would be a bad development. Google's webmaster guidelines specifically state one should not create duplicate content.

More WPN Coverage of SES NY: The Topix.net Approach... Ask Party Invokes Star Wars Thinking about the Future The Beautiful People On Search Rich Media and Video Ads Lunch W/The Google Engineers Contextual Advertising Today Is Ask.com Day The robots.txt and 301 redirects are the webmaster's friend here. Use robots.txt to keep search engines from indexing landing pages, while using 301 redirects to point all domains owned by the business to a single site. Beyond Ink provides tips on doing redirects on their site.

Shari Thurow from GrantasticDesigns recommended reading up on Andrei Broder's papers on shingles, another way of classifying documents by their unique signature or fingerprint. Overlapping words or phrases look like shingles on a roof and can be found via a mathematical approach, like that used by a search engine algorithm.

Jake Baillie, TrueLocal's president, described the top six duplicate content mistakes:

  1. Circular navigation - having different paths through a site should be avoided. Publishers should define a consistent way of addressing page content no matter what navigation path a user takes through a site.

  2. Printer friendly pages - if these are html pages, robots.txt should be used to block search engines from indexing them.
  3. Inconsistent linking - calling directory pages in an inconsistent manner, like /directory and /directory/, should be avoided.
  4. Product-only pages - it is not good for a site to have product pages and SKU pages; they should be consolidated if possible.
  5. Transparent serving domains - use 301 redirection instead of DNS aliasing to get users to a canonical site from multiple domains.
  6. Bad cloaking - Don't use cloaking scripts you didn't write. Make sure your cloaking script is returning separate content for each URL being cloaked.

Rajat Mukherjee, director of product management at Yahoo!, emphasized sites should try not to make the same content available through multiple URLs.

"Search engines are not trying to penalize content," Mukherjee said. "We're trying to find the right content to promote. Independent of how large our indexes get, there will always be capacity constraints."

"Honest site owners often worry about duplicate content when they don't really have to," Google's Cutts said. "There are also people that are a little less conscientious." He also noted that different top level domains, like x.com, x.ca, are not a concern.

A site that has an article broken into multiple sections, and also a printer-friendly version that contains all of those sections, likewise aren't cause for worry.

Cutts did emphasize the importance of consistent internal linkage as Baillie did. For example, if a site uses www., it should be used everywhere on a site, or nowhere on the site.

He also made the interesting observation that sites using absolute links instead of relative ones reduces the number and likelihood of being "scraped" for their content. Cutts recommended using copyright notices throughout a site.

For those users who have heard about Google's "Bigdaddy" datacenter update, Cutts said it is rolling out every 7 to 10 days and it should be done in the next 6 weeks.

---
document.write("Email WebProNews here.")

Drag this to your Bookmarks.

Add to document.write("Del.icio.us") | DiggThis | Yahoo! My Web

Technorati:

David Utter is a staff writer for WebProNews covering technology and business.

Digg This! StumbleUpon This!
AddThis Social Bookmark Widget

News Tags: Search, Google, SES, content, sesny2006

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
4 + 14 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.