Tensor Processing Units (TPUs)
Engineered for next-generation AI
Build, optimize, and scale training, inference, and reinforcement learning workloads to power autonomous reasoning agents
Overview
A decade of Tensor Processing Units (TPUs)
TPUs are custom-designed accelerators purpose-built for AI workloads such as agents, code generation, large language models, media content generation, synthetic speech, vision services, recommendation engines, and personalization models, among others. TPUs power Gemini, and all of Google AI powered applications like Search, Photos, and Maps, all serving over 1 Billion users.
Purpose-built for agentic AI
The shift to Agentic AI requires infrastructure capable of multi-step reasoning and continuous reinforcement learning. TPUs break the inference "memory wall" by hosting massive KV caches entirely on-silicon, utilizing expanded on-chip SRAM with TPU 8i. Combined with our SparseCore engine to offload communication tasks, this architecture reduces core idle time. The result is low-latency, predictable performance that powers complex reasoning loops.
Performance without compromise
Speed up your deployment time by reducing training timelines for frontier models. Cloud TPUs maximize goodput, ensuring that nearly every compute cycle is spent on active learning. This is supported by an high-speed Inter-Chip Interconnect, Optical Circuit Switching, and Virgo Network, so accelerators operate as a highly reliable, unified system.
Sustainable economics at scale
TPUs are engineered to improve value and power consumption by focusing on the computational demands of AI, eliminating the operational overhead found in multi-purpose architectures. Integrated power management dynamically adjusts to real-time request volume, delivering high performance-per-watt and supports complex AI workloads sustainably.
Open, flexible, and reliable operations
Build on an open ecosystem using familiar libraries and tools. Cloud TPUs provide native, high-performance support for PyTorch and JAX, and support the vLLM engine for fast inference. Manage and scale these deployments reliably across global clusters with Google Kubernetes Engine (GKE).
Cloud TPU versions
| Cloud TPU version | Description | Availability |
|---|---|---|
TPU 8i | TPU 8i is optimized for post-training and inference while provides an 80% performance-per-dollar improvement over previous generations for low-latency inference for large MoE models. | Coming soon |
TPU 8t | TPU 8t is built for large-scale pre-training and embedding-heavy workloads at a scale of 9,600 chips in a single superpod, provides to 2.7x performance-per-dollar improvement over Ironwood for large-scale training. | Coming soon |
Ironwood | 7th-generation energy-efficient TPU engineered for large-scale training, reasoning, and inference. Features 9,216 liquid-cooled chips per pod, provides 42.5 ExaFlops and 4X better performance per chip over Trillium. | Ironwood is generally available in North America (Central) and Europe (West region) |
Trillium | Sixth-generation TPU featuring improved energy efficiency and peak compute performance for training and inference. Operates with 67% more energy efficiency and provides 4.7x higher peak compute performance per chip compared to previous generation TPU v5e. | Trillium is generally available in North America (US East region), Europe (West region), and Asia (Northeast region) |
Description
TPU 8i is optimized for post-training and inference while provides an 80% performance-per-dollar improvement over previous generations for low-latency inference for large MoE models.
Availability
Description
TPU 8t is built for large-scale pre-training and embedding-heavy workloads at a scale of 9,600 chips in a single superpod, provides to 2.7x performance-per-dollar improvement over Ironwood for large-scale training.
Availability
Description
7th-generation energy-efficient TPU engineered for large-scale training, reasoning, and inference. Features 9,216 liquid-cooled chips per pod, provides 42.5 ExaFlops and 4X better performance per chip over Trillium.
Availability
Ironwood is generally available in North America (Central) and Europe (West region)
Description
Sixth-generation TPU featuring improved energy efficiency and peak compute performance for training and inference. Operates with 67% more energy efficiency and provides 4.7x higher peak compute performance per chip compared to previous generation TPU v5e.
Availability
Trillium is generally available in North America (US East region), Europe (West region), and Asia (Northeast region)
How It Works
Get an inside look at the magic of Google Cloud TPUs, including a rare inside view of the data centers . Customers use Cloud TPUs to run some of the large-scale AI workloads and that capacity comes from much more than just a chip. In this video, take a look at the components of the TPU system, including data center networking, optical circuit switches, water cooling systems, biometric security verification and more.
Get an inside look at the magic of Google Cloud TPUs, including a rare inside view of the data centers . Customers use Cloud TPUs to run some of the large-scale AI workloads and that capacity comes from much more than just a chip. In this video, take a look at the components of the TPU system, including data center networking, optical circuit switches, water cooling systems, biometric security verification and more.
Common Uses
Run large-scale AI pre-training workloads
Additional resources
Efficient post-training and reinforcement learning
Scale reinforcement learning workloads efficiently
Build base models into intelligent agents through intensive post-training workflows. The 8th generation of TPU system rapidly processes continuous reinforcement learning trials, rewarding the best reasoning paths without the cycle delays common to previous generations. This allows you to efficiently fine-tune world models, enabling agents to refine their reasoning in simulated environments before executing in the real world.
Additional resources
Scale reinforcement learning workloads efficiently
Build base models into intelligent agents through intensive post-training workflows. The 8th generation of TPU system rapidly processes continuous reinforcement learning trials, rewarding the best reasoning paths without the cycle delays common to previous generations. This allows you to efficiently fine-tune world models, enabling agents to refine their reasoning in simulated environments before executing in the real world.
Low-latency AI inference workloads at scale
Additional resources
Generate a solution
What problem are you trying to solve?
What you'll get:
Step-by-step guide
Reference architecture
Available pre-built solutions
Start your proof of concept
Try Cloud TPUs for free
Get a quick intro to using Cloud TPUs
Run PyTorch on TPUs
Run JAX on TPUs
Serve using vLLM on TPUs
Business Case
Autonomous reasoning agents
TPUs provide the memory bandwidth and low-latency inference required to run continuous, multi-step reasoning loops for real-time coding assistants, autonomous customer service, and security operations.
Foundation models and multimodal generative AI
Delivering continuous, high-throughput compute, TPUs efficiently build and serve massive foundation models across text, image, audio, and video modalities.
Precision science and healthcare
TPUs manage complex, matrix-heavy mathematics to accelerate computationally intensive simulations for structural biology, genomic sequencing, and drug discovery.
Physical AI
Build physical agents that interact with and adapt to the real world. Simulate and train robots, autonomous agents, and industrial machines faster and more efficiently with synthetic and real-world data.
