Can Google Solve The Scraper Problem?

    February 27, 2014
    Chris Crum
    Comments are off for this post.

Google has a form called the Scraper Report for people to report when they see a scraper site ranking ahead of the original content that it’s scraping.

The idea is that you can let Google know about people stealing your content and your rankings, and hopefully get the situation rectified.

Do you ever see scraper sites ranking above your own original content? Do you think this form will help Google solve the problem? Let us know what you think in the comments.

Head of webspam Matt Cutts tweeted:

The form asks for the URL on the site where the content was taken from, the exact URL on the scraper site, and the Google search result URL that demonstrates the problem.

It then asks you to confirm that your site is following Google Webmaster Guidelines and is not affected by any manual actions. You confirm this with a checkbox.

Scraper Report

Danny Sullivan asks a good question:

No answer on that so far, though Sullivan suggests in an article that Google will “potentially” use it as a way to improve its ranking system.

Google actually put out something similar a few years back . That one Said, “Report scraper pages…Google is testing algorithmic changes for scraper sites (especially blog scrapers). We are asking for examples, and may use data you submit to test and improve our algorithms.”

The new one is much more vague on what Google intends to do with the information it obtains. Obviously the old one didn’t make a big enough difference.

The reactions to Matt’s tweet have been interesting. One would like to see a similar tool for images.

One response in particular has gone viral:

As of the time of this writing, it’s got 14,455 retweets and 10,982 favorites. Certainly more than the average reply to a Matt Cutts tweet. The tweet is even getting coverage from publications like Mashable and Business Insider. Cutts has so far not responded.

“Google’s efforts to thwart Internet copycats known as “scrapers” have backfired,” writes Mashable’s Jason Abbruzzese. “What started out with the best intentions has become Friday’s Internet joke du jour after Google was caught using its own scraper to mine content.”

He goes on to note the obvious in that it “highlights the tension between the company’s goal of providing quick answers and its role as a portal to the rest of the Internet.”

Sometimes Google’s own “scraping” doesn’t even “scrape” the right information.

It remains to be seen whether Google’s new form will significantly solve the problem of scrapers appearing over original content, but I don’t think it will do anything to keep Google from putting its own “answers” for users’ searches – right or wrong.

Do you think Google’s efforts will improve its search results? Share your thoughts.

  • Brodie Tyler

    I think it’s awesome that Google is working on this. My favorite part though is Dan Barker’s tweet. I wonder how they will respond!

    • http://www.bloketoys.co.uk/ BlokeToys.co.uk

      They won’t respond, because they can’t without further embarrassment. That Tweet was favorited and re-tweeted by thousands of webmasters. Google is fast loosing the trust and support of many webmasters around the world but they don’t want to admit it.

      • JH

        At this point I consider all from Matt a lie . . . What will happen is sites like Wal Mart or Overstock will push us farther back in SERPS calling the rest of us selling like items scrapers…Still waiting for that level playing field you talked about three years ago Matt

        • http://www.bloketoys.co.uk/ BlokeToys.co.uk

          I used to listen to him a long time ago, but after spending five years trying to build a business using only unique content, doing only white-hat SEO, creating pages upon pages of unique content, and STILL being beaten to death by massive corporations with duplicate descriptions and no unique content for their items, I can safely say that Matt and Google can go suck it.

          Maybe if Google actually starts to work by its own diktats and starts offering genuinely useful content over organic pages of corporate advertisers I’ll start to pay attention again. But until that happens, Matt is a liar and a hypocrite in my opinion.

          • http://www.dirtworks.nethttp://www.newenglandnatural.com Dirt Works

            I’m with you. My business is dead and my family and I have gone from riches to poverty all because of Google screwing around with their algorithm and penalties for things a web master has no real control over and google won’t fix.

    • http://www.admain.co.uk admainiac

      They won’t respond. They’ll probably end up penalising the reportees site for trying to manipulate search rankings by trying to get another site kicked out of results.

      They spend that much time trying to keep valid sites down, they still ned to ask the very people they are forcing to us adwords to help them tidy up their own poor results. And compared to Bing, they are poor results. The orst results Google have ever offered.

      You would actually think that in certain business sectors, they have agreements in place to rank floated companies.

      At least Bing gives variety.

  • david

    Just proves that Google is aware it’s algo can’t detect duplicate content perfectly, anyone who tests knows this already. You can easily rank duplicate content with a few touches.

  • denny

    Now matt would say that wikipedia is copying Google’s content 😛

    • joe hill

      It’s even worst, it’s pure nepotism, when you take Huffington Post, it consists almost completely of copied content but always is placed at top position how come? It there someone playing a total corrupt games I wonder who is paid? It reminds me fatally to the games of Thai politicians, they are corrupt down to the bones, they do the same.

  • http://wredlich.com/ny/ Warren Redlich

    Why only if they outrank? Shouldn’t Google deindex all scrapers?

    • david

      They can’t do that. Duplicate content is natural (product descriptions,news sites etc) and furthermore deciding automatically the original source of text is too risky, legit sites would end up penalised, untrustworthy sites would outrank authority sites.

      • http://wredlich.com/ny/ Warren Redlich

        They can do it. It might be difficult in some cases but in general Google can tell which was the original (the one that was indexed first).

        Of course we’d want them to be careful and maybe avoid the close calls. I have a site that’s clearly the original, and generally outranks the scraper sites. The scraping was done long after we had posted the original content. These scraper sites add no value, are updated less frequently if at all, offer less in the way of features, no opportunity to correct bad info (we have multiple ways for users to submit corrections), etc.

        Why index them at all?

        • david

          You are correct, Google can detect and will filter straight scraper sites. I should have stipulated in my comment, I’m referring to sites who know what their doing. With backlinking power and better optimisation you will indeed outrank an original source. I’ve tested this!

  • joe hill

    This guy (cutts) has no idea what he is talking about as sooooo many people at google. I found about 400 texts and about 600 pics in the last 2 years copied from our sites, strangely they where all in a better place than our originals, why? because they had more links although totally unrelated but who cares? for google a link is a link also when no context. Unfortunately google is a playground for kids bought for almost nothing right after the school they dont know much. Now they even moved all the stuff to India and there it gets even worst, it took me 3 month ot set up a client adword campaign with the guys in south Asia etc. The other day they cut off our almost 10 year adsense account telling not even ONE reason why, they delivered at paper with 10 possible reasons nobody know what and why only some computer generated unrelated textblocks were moved. Google is in a perfect mess since this Mr. Schmidt left, they dont have any management left, only computer kids playing games.

    • chris

      I set up an adwords account last week and consequently received a call from from one of their guys. Pretty much all I remember from the conversation I wasn’t thoroughly interested in listening too was 3 points 1.) I should capitalize each letter it would improve my ctr 2.) Google wasn’t #1 in the world for nothing 3.) My ads were performing great and had a higher ctr than usual. I asked one question which the reply was I should just set up a campaign to find out. I definitely have to put this in the top 5 of most useless conversations I have ever had in my life. Save yourself the phone call in future I wont be continuing with my account FB is a hell of a lot cheaper at the moment! (A HELL OF A LOT!)

  • GoogleProblems

    Google put me out of business years ago b/c they can’t figure out an original source from a scraped one. I complained and complained and used every avenue possible to report scraped content to them, but nothing changed. A scraped article not only dumped my original article in the rankings, but if enough content was stolen from me, every page on my site (scraped or not) was pushed to the nether regions of Google Search. It didn’t matter if someone posted it on an obvious splog or on Wikipedia (and yes, people posted my content verbatim on Wikipedia, often without credit). This problem goes back 7 years. Google does not have an answer for it.

    • joe hill

      Yes they are in a mess and totally ignorant to other people. Yesterday after I wrote the post above I found 4 copied text blocks of one of our sites elsewhere, this is all very time consuming. We noticed that the relevant page in our site vanished. Now lets see what happen

    • http://www.bloketoys.co.uk/ BlokeToys.co.uk

      This is what happens when a corporation is given self-regulating control of a market and isn’t required under law to be answerable to those that company harms in the process.

      This is a problem we see across the internet. Twitter and Facebook think they don’t have to comply with European laws, because they’re an American company. Google thinks anti-competitive practices are okay all over the world because America tells them it’s okay there…

      The fundamental fact is that Google has been afforded too much power. It is a private corporation now able to police large swathes of internet traffic as it sees fit, for its own profit, and no government is willing to curtail their dominance and control.

      If Google wanted to, it could go into a meeting with ten of the biggest retail corporations in the world, sign a contract and immediately start removing all their local, national and regional competition from the results. There is absolutely nothing to stop them from destroying thousands of companies and millions of incomes all over the world in favor of a billion $ deal with an American corp that wants utter domination.

  • Linda Hernandez

    You would actually think that in certain business sectors, they have agreements in place to rank floated companies.


  • mattsmith222

    That’s only the beginning. Check this out: http://googlein10years.wordpress.com/2014/03/03/googlepedia/

  • John Tello

    I wonder if these discussion also applies to similar content, not equal content. For instance, there are rewriters, spinners, synonymizers and other content processors. What happens with those? Can you complain to Google about them?