Matt Cutts: Google Can’t Tell If It’s Crawling Databases, But Has Policies To Remove Private Info

    July 3, 2012
    Chris Crum
    Comments are off for this post.

Ryan Satterfield at Planet Zuda posted an article about Google exposing private info by indexing other sites’ databases, which he says includes social security numbers and credit card numbers.

“If you’ve given a site your credit card number or social security number, then there is a very high chance it is in Google search,” he writes. “This information is very easy for anyone to find, especially for cyber-criminals because Google has made it so anyone can do a Google search with the words filetype: and then the extension for ‘virtual notebooks’.”

Satterfield adds, “I contacted Google immediately when I discovered this problem believing that they would want to fix it. I was wrong. They were fully aware that people can find your info, but they feel that they can’t stop it, nor is it their job to ‘censor or curate’ their results unless they are required to do so by law. They said that it is the webmasters job to hide any information that shouldn’t be seen.”

He shares what he says is an email response he received from Google, which says:

Hi Ryan,

Thank you for your report, I apologize it was not answered sooner. We do not consider these searches (commonly called “google dorks”) to be a security risk that we can control. The amount and variety of information that is indexed on the internet precludes any sort of blacklisting system where certain information is removed. Additionally it is Google/s long standing policy to not censor or curate our results except where required by law (such requests can be viewed at http://www.chillingeffects.com).

The best way to remove these results is for the affected website owners to remove the content from their website (or restrict access via robots.txt or another mechanism) and then submit a request for the content to be removed from the Google Cache.

The Google Security Team

Planet Zuda brought the subject up with Google’s Matt Cutts on Twitter. Here is the exchange they had:


So far, that one tweet seems to be all Cutts has had to say on the matter, at least publicly.

  • http://www.ryanmjones.com Ryan Jones

    hate to say it, but Google is right. it IS the webmaster’s job to decide what to make crawlable and what not to. Google will listen to your server. If it says “don’t crawl this” it won’t crawl it.

    There ARE valid reasons why somebody WOULD want a .sql indexed – so banning them all doesn’t make sense.

    • http://www.theokaynetwork.com Steve G

      I agree, but at the same time Google should identify information that shouldn’t be available in case somebody made a mistake or a bug shows up that lets Google see information that shouldn’t be indexed. Imagine if you had a database you didn’t mean to have accessible to search engines and Google’s stance was that it’s your fault and they’re not going to do anything about it.

  • http://www.ryanmjones.com Ryan Jones

    Also, there are several cases i can find where webmasters want .sql files to be indexed. Here’s one: http://www.wikijava.org/wiki/Create_db.sql

  • http://tautweb.com/ Taut Web

    Well, all I can say is, Google (other search engines) indexes almost everything. When you are keen on being indexed, want to get traffic from search results, and so on, then you have “agreement” with Google to have your site’s content indexed. You cannot blame Google at all. Why don’t u just block the bots from indexing your site by employing robots.txt (and stop receiving traffic, though)?

  • http://www.cxinteractive.com Brian

    I agree with Ryan Jones. An sql database that has sensitive info like credit card numbers or social security numbers should be password protected. It shouldn’t be publicly available in the first place. Google crawls URLs like Matt says so make sure those URLs aren’t publicly accessible. That’s a serious breach of privacy on the part of the website. Just as you wouldn’t post CC numbers on your homepage, it’s up to you to block this content from users and search engines. If it’s blocked from users then the search engine bots won’t be able to index it either.

  • OSB

    It’s the webmaster’s job to ensure all sensitive data is secure. Can’t go blaming Google if a webmaster has messed up created a massive security breach. Hopefully their customers think again, though. It sounds quite bitter, like someone is searching for a scapegoat, if they try telling Google to block this stuff (it’s still there in the open when it shouldn’t be).

  • http://www.planetzuda.com/news/ Ryan Satterfield

    I am the author of the article being cited. I agree with Steve G that Google should identify information that shouldn’t be available, because the scenario he described is not theoretical. To reply to Taut Web’s comment about every search engine indexing everything, that isn’t true. My article covered other search engines and how they work. Google provides a search engine that works well. I just wish they’d fix this issue.

    • http://tautweb.com/ Taut Web

      Well, I said “almost everything”

  • robby

    Remember how search engines work, create only quality content. Think when you visit your crappy site with credit card – do you will enter number of your card in comments box or not?

    Why in comment box – because if no credit card form on your website you need to enter it somewhere. You must follow our G webmaster guidelines.

  • john

    anybody noticed what last time him more work for google penalty prevention than for your readers??? And it really harm your site usability!