This week, Google posted a new Webmaster Help video featuring Matt Cutts talking about a potential duplicate content issue. This time, he even broke out the whiteboard to illustrate his points.
Specifically, Cutts addressed the user-submitted question:
Many sites have a press release section, or a news section that re-posts relevant articles. Since it's all duplicate content, would they be better off removing these sections even with plenty of other unique content?
"The answer is probably yes, but let me give you a little bit of color about the reasoning for that," Cutts says in the video. "So a lot of the times at Google, we're thinking about a continuum of content, and the quality of that content, and what defines the value add for a user. So let's draw a little bit of an axis here and think a little bit about what's the difference between high quality guys versus low quality guys? Take somebody like The New York Times. Right? They write their own original content. They think very hard about how to produce high quality stuff. They don't just reprint press releases. You can't just automatically get into The New York Times. It's relatively hard. Right?"
"At the other end of this spectrum is the sort of thing that you're talking about, where you might have a regular site, but then one part of that site, one entire section of that site, is entirely defined by maybe just doing a news search, maybe just searching for keywords in press releases," he continues. "Whatever it is, it sounds like it's pretty auto-generated. Maybe it's taking RSS feeds and just slapping that up on the site. So what's the difference between these?"
"Well, The New York Times is exercising discretion," Cutts explains. "It's at exercising curation in terms of what it selects even when it partners with other people, and whenever it puts other content up on its site. And most of its content tends to be original. Most the time it's thinking about, OK, how do we have the high quality stuff, as opposed to this notion-- even if you've got high quality stuff on the rest of your site, what is the value add of having automatically generated, say, RSS feeds or press releases, where all you do is you say, OK. I'm going to do a keyword search for Red Widgets and see everything that matches. And I'm just going to put that up on the page."
"So on one hand, you've got content that's yours, original content--there's a lot a curation. On the other hand, you've got something that's automated, something that's more towards the press release side of things, and it's not even your content. So if that's the case, if you're just looking for content to be indexed, I wouldn't go about doing it that way."
For many in the SEO realm, there aren't any new revelations here, but duplicate content is an issue that continues to be a problem many worry about, even after so many years. It's still part of Google's quality guidelines, and as you probably know, the Penguin update is designed to algorithmically enforce those, so that on its own is a good reason to exercise caution in this area.