Matt Cutts: Google Can’t Tell If It’s Crawling Databases, But Has Policies To Remove Private Info
Ryan Satterfield at Planet Zuda posted an article about Google exposing private info by indexing other sites’ databases, which he says includes social security numbers and credit card numbers.
“If you’ve given a site your credit card number or social security number, then there is a very high chance it is in Google search,” he writes. “This information is very easy for anyone to find, especially for cyber-criminals because Google has made it so anyone can do a Google search with the words filetype: and then the extension for ‘virtual notebooks’.”
Satterfield adds, “I contacted Google immediately when I discovered this problem believing that they would want to fix it. I was wrong. They were fully aware that people can find your info, but they feel that they can’t stop it, nor is it their job to ‘censor or curate’ their results unless they are required to do so by law. They said that it is the webmasters job to hide any information that shouldn’t be seen.”
He shares what he says is an email response he received from Google, which says:
Thank you for your report, I apologize it was not answered sooner. We do not consider these searches (commonly called “google dorks”) to be a security risk that we can control. The amount and variety of information that is indexed on the internet precludes any sort of blacklisting system where certain information is removed. Additionally it is Google/s long standing policy to not censor or curate our results except where required by law (such requests can be viewed at http://www.chillingeffects.com).
The best way to remove these results is for the affected website owners to remove the content from their website (or restrict access via robots.txt or another mechanism) and then submit a request for the content to be removed from the Google Cache.
The Google Security Team
Planet Zuda brought the subject up with Google’s Matt Cutts on Twitter. Here is the exchange they had:
@planetzuda we just crawl urls. It’s near-impossible to see if a url is really a database. We have policies to remove private info like this
@mattcutts the real problem is that Google allows people to search for filetype: then the filetype extension of databases. To be continued.
@mattcutts Kevin in security said you guys aren’t going to fix this. Bing & Yahoo made it so people can’t do that certain filetype: search.
@mattcutts I wrote some C++ to test if if something’s a DB. Just use rfind, get last three characters. If .sql then don’t index.
So far, that one tweet seems to be all Cutts has had to say on the matter, at least publicly.