There has been a ton of speculation regarding Google's Penguin update. Few know exactly what the update specifically does, and how it works with Google's other signals exactly. Google always plays its hand close to its chest.
"While we can't divulge specific signals because we don't want to give people a way to game our search results and worsen the experience for users, our advice for webmasters is to focus on creating high quality sites that create a good user experience and employ white hat SEO methods instead of engaging in aggressive webspam tactics," Google's Matt Cutts said in the announcement of the update.
He also said, "The change will decrease rankings for sites that we believe are violating Google’s existing quality guidelines."
"We see all sorts of webspam techniques every day, from keyword stuffing to link schemes that attempt to propel sites higher in rankings," he said. To me, that indicates that this is about all webspam techniques - not just keyword stuffing and link schemes, but also everything in between.
So it's about quality guidelines. Cutts was pretty clear about that, and that's why we've been discussing some of the various things Google mentions specifically in those guidelines. So far, we've talked about:
Another thing on the quality guidelines list is: "Don't create multiple pages, subdomains, or domains with substantially duplicate content."
Of course, like the rest of the guidelines, this is nothing new, but in light of the Penguin update, it seems worth examining the guidelines again, if for no other reason than to provide reminders or educate those who are unfamiliar. Duplicate content seems like one of those that could get sites into trouble, even when they aren't intentionally trying to spam Google. Even Google says in its help center article on the topic, "Mostly, this is not deceptive in origin."
"However, in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic," Google says. "Deceptive practices like this can result in a poor user experience, when a visitor sees substantially the same content repeated within a set of search results."
Google lists the following as steps you can take to address any duplicate content issues you may have:
Don't block Google from duplicate content. Google advises against this, because it won't be able to detect when URLs point to the same content, and will have to treat them as separate pages. Use the canonical link element (rel="canonical").
Note: there are reasons why Google might skip your Canonical link elements.
It's important to note that Google doesn't consider duplicate content to be grounds for penalty, unless it appears that it was used in a deceptive way or to manipulate search results. However, that seems like one of those areas, where an algorithm might leave room for error.
Here are some videos with Matt Cutts (including a couple of WebProNews interviews) talking about duplicate content. You should watch them, if you are concerned that this might be affecting you:
This one comes from Google's Greg Grothaus rather than Cutts. Also worth watching:
If you think you've been wrongfully hit by the Penguin update, Google has a form you can fill out to let them know.
Tell us about duplicate content issues you've run into in the comments.