Hugging Face announced it has acquired XetHub, a company specializing in scaling large datasets for AI and machine learning (ML) models.

XetHub was founded by former Apple engineers Yucheng Low, Ajit Banerjee, and Rajat Arya. While at Apple, the trio helped the company build out its ML infrastructure. According to Hugging Face, “XetHub has developed technologies to enable Git to scale to TB repositories and enable teams to explore, understand and work together on large evolving datasets and models.”

Julien Chaumond, Hugging Faces’ CTO, says the acquisition will help the company unlock future growth for years to come.

The XetHub team will help us unlock the next 5 years of growth of HF datasets and models by switching to our own, better version of LFS as storage backend for the Hub’s repos.

Hugging Face says XetHub’s tech will make developers’ workloads far more efficient.

Let’s say you have a 10GB Parquet file. You add a single row. Today you need to re-upload 10GB. With the chunked files and deduplication from XetHub, you will only need to re-upload the few chunks containing the new row. Another example for GGUF model files: let’s say @bartowski wants to update one single metadata value in the GGUF header for a Llama 3.1 405B repo. Well, in the future bartowski can only re-upload a single chunk of a few kilobytes, making the process way more efficient

All 14 of XetHub’s employees are joining Hugging Face. Financial terms of the deal were not disclosed, but CEO Clem Delangue told Forbes it was the company’s biggest acquisition to date.