Meta Caught Cheating On AI Benchmarks

Meta released two new Llama 4 models, touting their "unparalleled, industry-leading performance," but the company has been caught cheating on its benchmarks.
Meta Caught Cheating On AI Benchmarks
Written by Matt Milano

Meta released two new Llama 4 models, touting their “unparalleled, industry-leading performance,” but the company has been caught cheating on its benchmarks.

Meta is relatively unique in the AI space, developing some of the leading open source AI models, as opposed to OpenAI, Anthropic, and others, all of which are developing closed source models. Meta has been eager to prove that open source models can compete with the best the industry has to offer.

In its press release, Meta detailed the two new models.

These Llama 4 models mark the beginning of a new era for the Llama ecosystem. We designed two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion active parameter model with 16 experts, and Llama 4 Maverick, a 17 billion active parameter model with 128 experts. The former fits on a single H100 GPU (with Int4 quantization) while the latter fits on a single H100 host. We also trained a teacher model, Llama 4 Behemoth, that outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks such as MATH-500 and GPQA Diamond. While we’re not yet releasing Llama 4 Behemoth as it is still training, we’re excited to share more technical details about our approach.

Meta goes on to tout their performance, claiming they outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro. The results quickly put Maverick in second place on the LM Arena site. The only problem is that it now appears Meta was cheating in its benchmarks.

The issue was spotted by a number of researchers who posted on X. As the researchers pointed out, Meta appears to have used an “experimental” version of Maverick for its benchmarks, not the version widely available to the public.

https://twitter.com/suchenzang/status/1908938638869909724

Met has yet to comment on the discrepancy.

Subscribe for Updates

GenAIPro Newsletter

News, updates and trends in generative AI for the Tech and AI leaders and architects.

By signing up for our newsletter you agree to receive content related to ientry.com / webpronews.com and our affiliate partners. For additional information refer to our terms of service.
Get the WebProNews newsletter delivered to your inbox

Get the free daily newsletter read by decision makers

Subscribe
Advertise with Us

Ready to get started?

Get our media kit

Advertise with Us