PubCon: The Duplicate Content Zone

    November 16, 2006
    WebProNews Staff

A PubCon session entered a place beyond indexing and search traffic: The Duplicate Content Zone, where websites sometimes go and are never seen again. WebProNews tagged along as the session hosts played the Rod Serling role for the audience.

Too much duplicate content on a website will drop it in the SERPs faster than the Tower Of Terror at Disney World plummets its riders. Only you don’t have Matt Cutts dressed in a bellhop outfit pulling the lever in Orlando. Bill Slawski not only makes me envious with his patent coverage, but the fact that he’s a short drive from steamed blue crabs when they’re in season. He touched on the topic of printer-friendly pages, which many sites make available as a convenience for their visitors. These pages should go in a separate folder, and protected from spidering by a relevant entry in the site’s robots.txt file. If the same page has different URLs, be sure to use 301 redirects to help visitors along to the desired page. Slawski noted that duplicate content happens sometimes when one site takes content from another. This infringement could end up costing a site publisher in terms of duplicate content penalties. He recommended contacting the site owner and its host before embarking on more serious legal action or a DMCA notice. While many dynamically generated sites use session IDs to track a visitor’s session, these should not be served to indexes that visit. Some spiders ignore these by default, but if one’s site has session ID pages showing up in a search engine then some steps to stop this will need to be taken. Yahoo’s Tim Converse illustrated the point by noting Yahoo won’t even index a crawled site if it is determined to be a duplicate. They look at approximate copies as well, not just word-for-word ones. Being similar does not necessarily mean a site will be excluded from the index. Not all duplication is evil. Hosting content in HTML and Microsoft Word format for visitor choice would be an example, as would syndication of content. Abusive stuff like scraper sites and weaving content from different pages to make a new page will get one in trouble. Google’s Brian White said his company filters content in a number of pipelines. Anyone hoping for additional insight will be disappointed, as White did not provide details of how this is done. If other sites showing up in Google are scraping one’s content, Google can help under the DMCA law. They provide a contact page with more information about DMCA takedowns. Using the DMCA can be more perilous than anything in the Duplicate Content Zone. Ask a lawyer for advice before handing out a takedown notice.

— Tag: Add to | Digg | Reddit | Furl Bookmark WebProNews: David Utter is a staff writer for WebProNews covering technology and business.