Google's TPU Split: Training and Inference Chips Reshape AI Compute Wars

Google Cloud just changed the AI hardware game. At Google Cloud Next 2026, the company unveiled its eighth-generation Tensor Processing Units, splitting them into two distinct chips for the first time: TPU 8t for massive model training, TPU 8i for lightning-fast inference. This move acknowledges a hard truth. Training and serving demands have diverged sharply.

Amin Vahdat, Google Cloud’s SVP and chief technologist for AI infrastructure, put it bluntly: “the demands of training and serving have completely diverged.” Google Cloud Blog. And so, Google built separate powerhouses.

TPUs trace back over a decade. Google designed these custom ASICs to crush matrix math—the core of neural networks. Unlike general-purpose GPUs, TPUs use systolic arrays for efficient, high-throughput computations on AI workloads. Early versions like TPU v1 handled inference inside Google services. Pods scaled to thousands of chips. By v5e, each chip hit 393 trillion int8 operations per second. Google Cloud Blog.

But evolution accelerated. TPU v5p targeted training with 3,672 TFLOPS and 760GB memory per eight-chip slice, matching dual H100 setups but with more capacity. V6e followed, delivering 7,344 TFLOPS at 300W TDP—rivaling quad-H100 systems on 60-65% less power. Stability AI shifted 40% of image inference to v6 in 2025. Midjourney slashed costs 65%, from $2.1 million to $700,000 monthly. Introl Blog.

TPU 8t: The Training Colossus

TPU 8t packs nearly 3x the compute of prior generations. A single superpod crams 9,600 chips, yielding 121 exaflops and 2 petabytes of shared high-bandwidth memory. Doubled inter-chip interconnect bandwidth hits near-linear scaling. Google claims 2.7x performance-per-dollar over Ironwood (v7) for large-scale training. Turn months into weeks. Orchestrate over 1 million chips via Pathways and JAX.

Power efficiency? Up to 2x per watt versus Ironwood. Broadcom designs the silicon, handling SerDes and packaging for these beasts. Pods reach 97% goodput, with auto-rerouting around failures. No operator needed. Google Cloud Blog.

Anthropic’s betting big. The AI firm inked a deal for up to 1 million TPUs—tens of billions in capacity—ramping to 3.5 gigawatts by 2027. Claude models train and serve here. Google Cloud Press.

TPU 8i flips the script for inference. Tripled on-chip SRAM to 384MB. HBM jumps to 288GB for KV caches in long-context reasoning. Dedicated Collectives Acceleration Engine slashes on-chip latency 5x. Pods of 1,152 chips deliver 11.6 FP8 exaflops. 80% better performance per dollar than Ironwood, especially for low-latency MoE models and agents.

ICI bandwidth doubles to 19.2 Tb/s. Network diameter shrinks over 50%. Run millions of concurrent agents. MediaTek reportedly crafts this chip. Google Cloud Blog; TechCrunch.

Pods, Networks, and the Nvidia Shadow

Virgo Network ties it together. This collapsed fabric quadruples bandwidth, linking 134,000 TPUs in one data center or 1 million across sites. Scale 80,000 GPUs too, via NVIDIA Vera Rubin NVL72 later this year. Google offers both worlds—no lock-in.

TPUs shine on price. V5e gave 2.7x performance per dollar over v4 on GPT-J benchmarks, per MLPerf 3.1. AssemblyAI saw 4x gains versus market rivals. YouTube serves billions of recommendations, 2.5x more queries at same cost. Google Cloud Blog.

Vs. Nvidia? V6e matches H100 throughput at 4x better value for batch inference—120 tokens/sec on LLaMA 70B. GPUs edge low-latency singles. But TPUs win scale and power. Cohere tripled throughput post-migration. Character.AI cut costs 3.8x. Introl Blog.

Broadcom locked a deal through 2031 for TPUs and racks. X post by @StockSavvyShay. Sundar Pichai tweeted: “TPU 8t, optimized for training and TPU 8i, optimized for inference. Looking good!” X.

Availability hits soon. Preview TorchTPU brings PyTorch native support, 50-100% boosts. VLLM too. AXIA Energia prevents blackouts via TPU weather models. X post by @googlecloud.

Google’s play pressures Nvidia. Custom silicon plus vertical stack yields efficiency others chase. Agents demand this split. Training builds giants. Inference deploys them at scale. TPUs deliver both—cheaper, greener, faster. The compute wars? Just heating up.

Google’s TPU Split: Training and Inference Chips Reshape AI Compute Wars

Notice an error?

Ready to get started?