Quantcast

SES NY: Duplicate Content Jeopardizes Your Site

Get the WebProNews Newsletter:
[ Search]

Those all-important search engine rankings you desire for your website could be in peril if you utilize duplicate content that runs afoul of search engine guidelines.

SES NY: Duplicate Content Jeopardizes Your Site
Follow All The SES NY Happenings At WPN

Have you encountered problems over duplicate content with your site being indexed? Were you able to resolve those problems and get reincluded in a search index? We would like to hear about it at WebProWorld.

Mike McDonald observed this session from SES 2006′s day two agenda, and passed along some very good information for our readers.

Plenty of websites try to pull tricks that will move them up in the rankings. People like Matt Cutts take gleeful delight in exposing the likes of BMW and Ricoh and booting them out of Google’s index. Anyone thinking Google or Yahoo wouldn’t ban their site should think again.

Presenters at the SES 2006 New York session, Duplicate Content Issues, discussed the dangers of duplicate content. It comes in many forms, like multiple domains for the same homepage content; multiple links to several domains for one site; and “doorway” pages, according to Anne Kennedy, managing partner at Beyond Ink.

Yahoo does not want those multiple sites, and neither does the Open Directory Project. Since these are both places where search engines tend to start looking for content to index, being kicked out of them would be a bad development. Google’s webmaster guidelines specifically state one should not create duplicate content.

More WPN Coverage of SES NY:
The Topix.net Approach…

Ask Party Invokes Star Wars

Thinking about the Future

The Beautiful People On Search

Rich Media and Video Ads

Lunch W/The Google Engineers

Contextual Advertising

Today Is Ask.com Day

The robots.txt and 301 redirects are the webmaster’s friend here. Use robots.txt to keep search engines from indexing landing pages, while using 301 redirects to point all domains owned by the business to a single site. Beyond Ink provides tips on doing redirects on their site.

Shari Thurow from GrantasticDesigns recommended reading up on Andrei Broder’s papers on shingles, another way of classifying documents by their unique signature or fingerprint. Overlapping words or phrases look like shingles on a roof and can be found via a mathematical approach, like that used by a search engine algorithm.

Jake Baillie, TrueLocal’s president, described the top six duplicate content mistakes:

  1. Circular navigation – having different paths through a site should be avoided. Publishers should define a consistent way of addressing page content no matter what navigation path a user takes through a site.

  2. Printer friendly pages – if these are html pages, robots.txt should be used to block search engines from indexing them.
  3. Inconsistent linking – calling directory pages in an inconsistent manner, like /directory and /directory/, should be avoided.
  4. Product-only pages – it is not good for a site to have product pages and SKU pages; they should be consolidated if possible.
  5. Transparent serving domains – use 301 redirection instead of DNS aliasing to get users to a canonical site from multiple domains.
  6. Bad cloaking – Don’t use cloaking scripts you didn’t write. Make sure your cloaking script is returning separate content for each URL being cloaked.

Rajat Mukherjee, director of product management at Yahoo!, emphasized sites should try not to make the same content available through multiple URLs.

“Search engines are not trying to penalize content,” Mukherjee said. “We’re trying to find the right content to promote. Independent of how large our indexes get, there will always be capacity constraints.”

“Honest site owners often worry about duplicate content when they don’t really have to,” Google’s Cutts said. “There are also people that are a little less conscientious.” He also noted that different top level domains, like x.com, x.ca, are not a concern.

A site that has an article broken into multiple sections, and also a printer-friendly version that contains all of those sections, likewise aren’t cause for worry.

Cutts did emphasize the importance of consistent internal linkage as Baillie did. For example, if a site uses www., it should be used everywhere on a site, or nowhere on the site.

He also made the interesting observation that sites using absolute links instead of relative ones reduces the number and likelihood of being “scraped” for their content. Cutts recommended using copyright notices throughout a site.

For those users who have heard about Google’s “Bigdaddy” datacenter update, Cutts said it is rolling out every 7 to 10 days and it should be done in the next 6 weeks.

Drag this to your Bookmarks.

Add to | DiggThis | Yahoo! My Web

Technorati:

David Utter is a staff writer for WebProNews covering technology and business.

SES NY: Duplicate Content Jeopardizes Your Site
Top Rated White Papers and Resources
  • http://www.spanishseo.org/ Spanish SEO

    It’s really interesting to hear from Google now that they decided to finally make an official statement regarding the duplicate content penalty. However, that does mean that people are free to duplicate content without consideration of their TOS.

    On the other hand, it’s also sad to see how people are still confused regarding duplicate content and its consecuences from an algortihmic standpoint.