Matt Cutts Addresses Duplicate Content Issue In New Video

    June 1, 2012
    Chris Crum
    Comments are off for this post.

This week, Google posted a new Webmaster Help video featuring Matt Cutts talking about a potential duplicate content issue. This time, he even broke out the whiteboard to illustrate his points.

Specifically, Cutts addressed the user-submitted question:

Many sites have a press release section, or a news section that re-posts relevant articles. Since it’s all duplicate content, would they be better off removing these sections even with plenty of other unique content?

“The answer is probably yes, but let me give you a little bit of color about the reasoning for that,” Cutts says in the video. “So a lot of the times at Google, we’re thinking about a continuum of content, and the quality of that content, and what defines the value add for a user. So let’s draw a little bit of an axis here and think a little bit about what’s the difference between high quality guys versus low quality guys? Take somebody like The New York Times. Right? They write their own original content. They think very hard about how to produce high quality stuff. They don’t just reprint press releases. You can’t just automatically get into The New York Times. It’s relatively hard. Right?”

“At the other end of this spectrum is the sort of thing that you’re talking about, where you might have a regular site, but then one part of that site, one entire section of that site, is entirely defined by maybe just doing a news search, maybe just searching for keywords in press releases,” he continues. “Whatever it is, it sounds like it’s pretty auto-generated. Maybe it’s taking RSS feeds and just slapping that up on the site. So what’s the difference between these?”

“Well, The New York Times is exercising discretion,” Cutts explains. “It’s at exercising curation in terms of what it selects even when it partners with other people, and whenever it puts other content up on its site. And most of its content tends to be original. Most the time it’s thinking about, OK, how do we have the high quality stuff, as opposed to this notion– even if you’ve got high quality stuff on the rest of your site, what is the value add of having automatically generated, say, RSS feeds or press releases, where all you do is you say, OK. I’m going to do a keyword search for Red Widgets and see everything that matches. And I’m just going to put that up on the page.”

“So on one hand, you’ve got content that’s yours, original content–there’s a lot a curation. On the other hand, you’ve got something that’s automated, something that’s more towards the press release side of things, and it’s not even your content. So if that’s the case, if you’re just looking for content to be indexed, I wouldn’t go about doing it that way.”

For many in the SEO realm, there aren’t any new revelations here, but duplicate content is an issue that continues to be a problem many worry about, even after so many years. It’s still part of Google’s quality guidelines, and as you probably know, the Penguin update is designed to algorithmically enforce those, so that on its own is a good reason to exercise caution in this area.

  • http://www.shapingyouth.org Amy Jussel, Shaping Youth

    I thought when the headline read MC “addresses the duplicate content issue” you were referring to what happens when there’s either:
    1.) An edited/shorter form version of a longer post (on the same blog, pulled up from its own archives linked back in recap mode)or
    2.) An updated post referencing the original post with fresh links or
    3.) A ‘crosspost’ elsewhere in the blogosphere sharing the original content by permission or syndication agreement…

    Could you address each of these in terms of the new Google ‘rules of the road?’

    p.s. To Matt’s automated vs curated points I mostly agree, but in cases like the WordPress plug-ins that find similar subject matter within one’s own archives, it makes sense to me to churn it to the surface as a value-add to the reader, so I hope Google doesn’t ‘hold this against us’ algorithmically… 😉

  • http://ultimatemarketingstrategies.net Peter Sundstrom

    Content curation is definitely the buzzword of the moment. I can’t see how Google can tell the difference between curated content that has been manually compiled and some automatically generated curated content.

  • http://DigitalAccessPass.com Ravi Jayagopal

    Vague reply as always :-). So do press releases have no value? How does Google differentiate between a genuine press release that’s become popular and picked up by many sites (for eg., an Associated Press news story) vs lower quality content duplication?

    I guess they don’t go by quality of content – they go by quality of the sites publishing the content. So it’s back to, which quality news site can you get to pick up your press release. That’s all there is. Otherwise, press releases by themselves don’t appear to have much value these days.