Google Splits AI Silicon: Dual TPUs Target Training and Agentic Inference in Nvidia Showdown

Google's eighth-gen TPUs split into TPU 8t for training and TPU 8i for inference, targeting agentic AI with massive scale, triple SRAM, and 2x perf/watt over Ironwood. Announced at Cloud Next 2026, they challenge Nvidia amid surging demand from Anthropic and Meta.
Google Splits AI Silicon: Dual TPUs Target Training and Agentic Inference in Nvidia Showdown
Written by Lucas Greene

Google just changed the AI hardware game. At Cloud Next 2026 in Las Vegas, the company unveiled its eighth-generation Tensor Processing Units as two distinct chips: TPU 8t for model training, TPU 8i for inference. No more one-size-fits-all. This split acknowledges a hard truth. Training massive models demands raw compute power. Running AI agents—those autonomous systems that reason, plan, and act—needs low latency and high throughput for millions of parallel tasks.

Amin Vahdat, Google’s senior vice president and chief technologist for AI infrastructure, put it plainly in a Google blog post. “With the rise of AI agents, we determined the community would benefit from chips individually specialized to the needs of training and serving.” Sundar Pichai, Alphabet’s CEO, echoed that in his own post, noting the TPU 8i architecture delivers “massive throughput and low latency needed to concurrently run millions of agents cost-effectively.”

Consider the numbers. TPU 8t cranks out nearly three times the compute per pod compared to Ironwood, Google’s seventh-gen TPU from 2025—specifically, 121 FP4 exaFLOPS in a 9,600-chip superpod with two petabytes of shared high-bandwidth memory. It scales linearly to over a million chips in one cluster using JAX and Pathways. Performance per dollar? Up to 2.7 times better than Ironwood for large-scale training jobs. And it’s twice as efficient per watt. That means frontier models, those behemoths taking months to train, could shrink to weeks. Google DeepMind’s Genie 3 world models already train on these setups.

TPU 8i shifts focus to serving. Each chip packs 288 GB of HBM and 384 MB of on-chip SRAM—triple Ironwood’s amount. Why SRAM? It holds bigger key-value caches for long-context decoding, cutting idle time in agentic workflows. Pods scale to 1,152 chips, delivering 11.6 exaFLOPS. At low-latency targets for large mixture-of-experts models, it offers 80% better performance per dollar over Ironwood. Boardfly topology slashes communication hops by up to 50% versus 3D tori, vital for all-to-all ops in reasoning chains.

Both chips pair with custom Axion ARM CPUs—one per two TPUs—ditching x86 overheads from prior gens. Liquid cooling with smart valves adapts to workloads. Frameworks like JAX, PyTorch, vLLM, and SGLang work out of the box. General availability hits later this year.

This isn’t Google’s first rodeo. TPUs date to 2015, cloud rentals since 2018. Ironwood, optimized for inference, scaled to 9,216 chips at 42.5 exaFLOPS. But agentic AI exploded demand. Citadel Securities runs quant research on TPUs. All 17 U.S. Energy Department labs use them for AI co-scientists. Anthropic locked in multiple gigawatts of capacity; Meta inked a multibillion-dollar deal. Even on-premises pilots are in talks.

Nvidia feels the heat. The GPU kingpin dominates, but hyperscalers push back. Amazon’s Trainium and Inferentia, Microsoft’s Maia—everyone builds custom silicon. Google’s TPU business, paired with DeepMind, could be worth $9 billion annually, per DA Davidson estimates cited by CNBC. Nvidia stock dipped 1.5% post-announcement before rebounding, as Ars Technica reported. No direct benchmarks against Nvidia GPUs, though. Google avoids that trap.

Inside the chips. TPU 8t’s SparseCore handles embedding lookups; native FP4 doubles matrix throughput. Vector units overlap softmax with multiplies. TPU 8i’s Collectives Acceleration Engine speeds reductions for chain-of-thought decoding—5x lower latency. As detailed in Google’s technical deep dive, Virgo networks boost bandwidth 4x in data centers.

Agentic demands this specialization. Agents simulate worlds, chain thoughts, loop feedback. Training builds them. Inference deploys swarms. Unified chips wasted cycles. Now? Efficiency soars. Google calls it 97% “goodpute”—useful work per volt. Data centers redesigned around TPUs yield 6x more power per kilowatt-hour.

Competition intensifies. Yahoo Finance notes Google’s jab at Nvidia and AMD partners. Anthropic’s multi-gigawatt TPU pact follows Amazon’s $100 billion AWS deal. X chatter buzzes: one post from Google’s account highlights the dual approach powering Gemini agents.

But challenges loom. Power grids strain under exaFLOPS. SRAM scales die sizes, hiking fab costs—TSMC 2nm rumors swirl on X. Nvidia’s CUDA moat endures for devs. Google bets on its stack: chips to models.

Investors watch closely. Alphabet shares rose on the news. This dual-chip push signals maturity. AI shifts from hype to deployment. Google positions as the agentic backbone.

Fragment. Specialized silicon wins.

Subscribe for Updates

AgenticAI Newsletter

Explore how AI systems are moving beyond simple automation to proactively perceive, reason, and act to solve complex problems and drive real-world results.

By signing up for our newsletter you agree to receive content related to ientry.com / webpronews.com and our affiliate partners. For additional information refer to our terms of service.

Notice an error?

Help us improve our content by reporting any issues you find.

Get the WebProNews newsletter delivered to your inbox

Get the free daily newsletter read by decision makers

Subscribe
Advertise with Us

Ready to get started?

Get our media kit

Advertise with Us