Developer Shares Story of Being Threatened by Facebook for Crawling

Robots.txt-Related Legalities Brought Into Question

Get the WebProNews Newsletter:

[ Social Media]

Pete Warden, a former software engineer at Apple, who is now working on his own start-up, posted an interesting story about how Facebook threatened to sue him for crawling the social network. I reached out to both Warden and Facebook for more details, but so far have only received response from Facebook, who calls  the incident as "violation of our terms."

But first, Warden’s story. Read the whole thing in his words here for more context about what he wanted to do with the data, but to make a long story short, he was building a tool to bring data from email and various social networks into one place to make it easier for users to manage their contacts, and he crawled Facebook. He says he checked Facebook’s robot.txt, and that "they welcome the web crawlers that search engines use to gather their data," so he wrote his own. He was able to obtain data like which pages people were fans of and links to a few of their friends. He created a map showing how different countries, states and cities were connected to each other and released it so that others could use the information. Once Facebook caught wind of this, they threatened legal action. Warden writes:

Their contention was robots.txt had no legal force and they could sue anyone for accessing their site even if they scrupulously obeyed the instructions it contained. The only legal way to access any web site with a crawler was to obtain prior written permission.

Obviously this isn’t the way the web has worked for the last 16 years since robots.txt was introduced, but my lawyer advised me that it had never been tested in court, and the legal costs alone of being a test case would bankrupt me. With that in mind, I spent the next few weeks negotiating a final agreement with their attorney. They were quite accommodating on the details, such as allowing my blog post to remain up, and initially I was hopeful that they were interested in a supervised release of the data set with privacy safeguards. Unfortunately it became clear towards the end that they wanted the whole set destroyed.

Andrew Noyes, Facebook Public Policy Communications Manager talks Pete Warden crawling Facebook dataFacebook Public Policy Communications Manager Andrew Noyes tells WebProNews, "Pete Warden aggregated a large amount of data from over 200 million users without our permission, in violation of our terms. He also publicly stated he intended to make that raw data freely available to others. Warden was extremely cooperative with Facebook from the moment we contacted him and he abandoned his plans."

"We have, and will continue to, act to enforce our terms of service where appropriate," adds Noyes.

Noyes pointed to Facebook’s Statement of Rights and Responsibilities, which states that "You will not collect users’ content or information, or otherwise access Facebook, using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission." That’s under the safety section, by the way.

"I’m bummed that Facebook are taking a legal position that would cripple the web if it was adopted (how many people would Google need to hire to write letters to every single website they crawled?), concludes Warden. "And a bit frustrated that people don’t understand that the data I was planning to release is already in the hands of lots of commercial marketing firms, but mostly I’m just looking forward to leaving the massive distraction of a legal threat behind and getting on with building my startup."

Hearing some of what both parties have to say on the issue, what are your thoughts? Discuss here.

If we hear back from Warden or if Facebook offers us more insight into the situation, which I’m told may still happen, I’ll update this article.

Developer Shares Story of Being Threatened by Facebook for Crawling
Top Rated White Papers and Resources
  • Stupidscript

    As long as Warden was not registered as a Facebook user, why would he be bound by the T&C? IANAL but AFAIKT, the T&C are only for those who wish to use the service. AFAIK, T&C are not “law” unto themselves, but rather are contracts made between those who wish to subscribe to the service and those who provide the service.

    Saying that someone who is not a customer is bound by the rules the customers must abide by is absurd. If you want a walled garden, then erect a wall. You can’t expect everyone will play by Facebook’s private rules just because Facebook says so. You have to be a signatory to the agreement … and people who do not sign up for the service are not signatories to the agreement.

    Not only that, but simply publishing a robots.txt file IMPLICITLY GIVES PERMISSION by omitting those areas where scrapers/robots/spiders are allowed access to UNLESS particular bots are included in the User-agent directive. So even if the T&C were to be found to be enforceable against people who have not agreed to it’s terms, Facebook, by their use of the robots.txt file, have already given their permission for the scraping.

    It is a damn shame that the fear of a lawsuit that Warden had an excellent chance of winning has caused him to abandon his work. Not that I’m a fan of scraping and selling, but the principle needs to be established.

    Otherwise, what’s to stop Facebook from including in their T&C a clause that says, “You will not make any unflattering statements about the Facebook service in any form” and then attempting to keep anyone with a negative experience from posting their comments about that experience in another forum?

  • http://www.thewebsquad.com Eric

    In the statement of rights and responsibilities, it says “users and others who interact with Facebook. By using or accessing Facebook, you agree to this statement.” So I’m guessing as soon as you interact with Facebook, you are bound by those terms.

    He may not have been a user but he was accessing it. The robots.txt file is only for those who have permission. I’m sure if he had permission from Facebook, this wouldn’t have been an issue at all. But to use a spider to crawl without permission is against the ToS, even if there is a robots.txt file.

    Privacy is a big concern when crawling Facebook. Imagine how much information could be dug and how much people wouldn’t want their information to be dug.

    Also, your last paragraph would be against free speech. Facebook would not be able to write that in due to free speech.

  • http://serpeo-seo.com Website Design Philadelphia

    Stu – Brilliant!
    I think this was a puff of smoke from Facebook which realized a chink in their armor. People are willing to put their lives and personal information out there in exchange for a second of perceived fame. Facebook saw the downfall of their entire operation if they did not stop this guy. Facebook would be held accountable for personal information being indexed in a way that they did not realize could happen. Get it together Facebook.

  • http://www.nerdalert.biz philadelphia seo firm

    This is really very worthwhile, many thanks for writing it. You need to also go to our site. We’re a company which offers Seo services, Website Development, Custom Computer software Development and IT ideas in Philadelphia.

  • Join for Access to Our Exclusive Web Tools
  • Sidebar Top
  • Sidebar Middle
  • Sign Up For The Free Newsletter
  • Sidebar Bottom