Reddit Sues Perplexity AI Over Illegal Scraping of User Data for AI

Reddit has sued Perplexity AI and data-scraping firms for illegally harvesting user content from Google search results to train AI models, bypassing protections via proxies and bots. Perplexity denies wrongdoing, citing partnership talks, but the case highlights tensions over data rights in AI. This could set precedents for ethical data sourcing.

Reddit Inc. has filed a lawsuit against artificial-intelligence startup Perplexity AI Inc. and several data-scraping firms, accusing them of unlawfully harvesting vast amounts of user-generated content from Google search results. The suit, lodged in New York federal court, claims Perplexity and its partners bypassed Reddit’s protective measures to scrape data at an “industrial scale” for training AI models. This marks the latest skirmish in the escalating tensions between content platforms and AI companies hungry for training data.

The complaint details how Perplexity allegedly collaborated with entities like Oxylabs UAB and SerpApi LLC to disguise bots and extract Reddit posts appearing in Google searches. Reddit argues this violates its terms of service and constitutes unauthorized access, potentially infringing on copyrights and trade secrets. The platform seeks to block such activities and demands damages, highlighting a broader fight over data rights in the AI era.

The Mechanics of Scraping and Evasion Tactics

Investigations reveal that Perplexity’s methods involved sophisticated evasion. According to a report in Ars Technica, the AI firm was “caught red-handed” using proxies to mimic human users, scraping billions of search results without permission. This allowed Perplexity to ingest Reddit’s discussions, which are often rich in real-time insights, directly into its search engine.

Perplexity, backed by high-profile investors like Jeff Bezos, has positioned itself as a next-generation search tool that summarizes web content using AI. However, critics say it freeloads on others’ work. The lawsuit echoes earlier accusations, such as those from Dow Jones & Co., which sued Perplexity for similar copyright issues, as noted in another Ars Technica piece.

Perplexity’s Defense and Industry Repercussions

In response, Perplexity expressed shock at the lawsuit, claiming it had been in discussions with Reddit about partnerships. A post on the subreddit r/perplexity_ai, as shared on Reddit, emphasized the company’s respect for content creators and suggested the suit overlooks collaborative opportunities. Yet, Reddit counters that no agreements were reached, and the scraping continued unabated.

This case pulls back the curtain on a shadowy ecosystem of data brokers reselling scraped information to AI giants. As detailed in a New York Times article, startups like Perplexity rely on such intermediaries to fuel their models, often without compensating original sources. Reuters also reported on the suit, noting Perplexity’s AI search engine directly benefits from Reddit’s user data, per Reuters.

Broader Implications for AI Data Practices

The litigation underscores growing regulatory scrutiny. Reddit, which has struck deals with companies like OpenAI for licensed data access, is drawing a line against unsanctioned scraping. CNBC highlighted how this expands Reddit’s battles with the AI industry, accusing Perplexity of illegally harvesting user posts for tools like its answer engine, as per CNBC.

Experts predict this could set precedents for data ownership. Search Engine Land reported on the technical disguises used, such as bots mimicking legitimate traffic, in its coverage at Search Engine Land. As AI firms race to innovate, lawsuits like this may force a reckoning on ethical data sourcing, potentially reshaping how search giants like Google handle indexed content.

Looking Ahead: Potential Outcomes and Shifts

If successful, Reddit’s suit could deter similar practices and encourage more licensing agreements. Perplexity’s CEO has previously defended the company’s approach as transformative, but mounting legal pressures—from Britannica’s recent infringement claims to ongoing probes—suggest a turbulent path ahead. Industry insiders watch closely, as resolutions here might influence global standards for AI training data.

Ultimately, this dispute reflects the high stakes in balancing innovation with intellectual property rights. As platforms fortify defenses, AI developers may need to pivot toward transparent, compensated data acquisition to avoid courtroom battles.

Reddit Sues Perplexity AI Over Illegal Scraping of User Data for AI

Notice an error?

Ready to get started?

WebProNews is a leading publisher of business and technology email newsletters and websites.