PubCon: Getting Rid Of Duplicate Content
The issue of duplicate content is something that all webmasters and site owners have to take into consideration and the PubCon session "Getting Rid of Duplicate Content Once and For All," addresses that challenge.
Ben D’ Angelo, Software Engineer, Google, spoke about duplicate content issues. There are multiple disjoint situations including multiple URLs pointing to the same page, different countries with the same language, and syndicated content across other sites.
To avoid such issues you should have one URL for one piece of content. The reason for this is users don’t like duplicated results, it saves resources by having room to index other content, and it saves resources on your server.
Sources of duplicate content within your sites are multiple URLs pointing to the same page, www. Vs non www., session ids, URL parameters, and printable versions of your pages.
Google handles duplicate content in a number of ways. The general idea is to cluster pages and choose the best representative. Google uses different filters for different types of duplicate content. The goal is to serve on version of the content in the SERPs.
To prevent duplicate content there are a variety of things you can do. For exact duplicates a 301 redirect is the best option. For near duplicate content use noindex and robots.txt
For domains by country, different languages are not duplicate content. Use unique content specific to that country. Use different TLDs and Webmaster tools for geo targeting.
For URL parameters put data which does not effect the substance of the page in a cookie, not the URL.
When it comes to other sites include the original absolute URL in any syndicated content. Syndicate slightly different content. Manage your expectations if you use syndicated content, you will probably not outrank the original source.
Don’t be too concerned about scrapers or proxies, they generally won’t impact your rankings. If you are concerned you can file a DMCA or spam report with Google.