Tensor Processing Units (TPUs)

Engineered for next-generation AI

Build, optimize, and scale training, inference, and reinforcement learning workloads to power autonomous reasoning agents

Overview

A decade of Tensor Processing Units (TPUs)

TPUs are custom-designed accelerators purpose-built for AI workloads such as agents, code generation, large language models, media content generation, synthetic speech, vision services, recommendation engines, and personalization models, among others. TPUs power Gemini, and all of Google AI powered applications like Search, Photos, and Maps, all serving over 1 Billion users.

Purpose-built for agentic AI

The shift to Agentic AI requires infrastructure capable of multi-step reasoning and continuous reinforcement learning. TPUs break the inference "memory wall" by hosting massive KV caches entirely on-silicon, utilizing expanded on-chip SRAM with TPU 8i. Combined with our SparseCore engine to offload communication tasks, this architecture reduces core idle time. The result is low-latency, predictable performance that powers complex reasoning loops.

Performance without compromise

Speed up your deployment time by reducing training timelines for frontier models. Cloud TPUs maximize goodput, ensuring that nearly every compute cycle is spent on active learning. This is supported by an high-speed Inter-Chip Interconnect, Optical Circuit Switching, and Virgo Network, so accelerators operate as a highly reliable, unified system.

Sustainable economics at scale

TPUs are engineered to improve value and power consumption by focusing on the computational demands of AI, eliminating the operational overhead found in multi-purpose architectures. Integrated power management dynamically adjusts to real-time request volume, delivering high performance-per-watt and supports complex AI workloads sustainably.

Open, flexible, and reliable operations

Build on an open ecosystem using familiar libraries and tools. Cloud TPUs provide native, high-performance support for PyTorch and JAX, and support the vLLM engine for fast inference. Manage and scale these deployments reliably across global clusters with Google Kubernetes Engine (GKE).

Cloud TPU versions

Cloud TPU version	Description	Availability
TPU 8i	TPU 8i is optimized for post-training and inference while provides an 80% performance-per-dollar improvement over previous generations for low-latency inference for large MoE models.	Coming soon
TPU 8t	TPU 8t is built for large-scale pre-training and embedding-heavy workloads at a scale of 9,600 chips in a single superpod, provides to 2.7x performance-per-dollar improvement over Ironwood for large-scale training.	Coming soon
Ironwood	7th-generation energy-efficient TPU engineered for large-scale training, reasoning, and inference. Features 9,216 liquid-cooled chips per pod, provides 42.5 ExaFlops and 4X better performance per chip over Trillium.	Ironwood is generally available in North America (Central) and Europe (West region)
Trillium	Sixth-generation TPU featuring improved energy efficiency and peak compute performance for training and inference. Operates with 67% more energy efficiency and provides 4.7x higher peak compute performance per chip compared to previous generation TPU v5e.	Trillium is generally available in North America (US East region), Europe (West region), and Asia (Northeast region)

Description

TPU 8i is optimized for post-training and inference while provides an 80% performance-per-dollar improvement over previous generations for low-latency inference for large MoE models.

Availability

Description

TPU 8t is built for large-scale pre-training and embedding-heavy workloads at a scale of 9,600 chips in a single superpod, provides to 2.7x performance-per-dollar improvement over Ironwood for large-scale training.

Availability

Description

7th-generation energy-efficient TPU engineered for large-scale training, reasoning, and inference. Features 9,216 liquid-cooled chips per pod, provides 42.5 ExaFlops and 4X better performance per chip over Trillium.

Availability

Ironwood is generally available in North America (Central) and Europe (West region)

Description

Sixth-generation TPU featuring improved energy efficiency and peak compute performance for training and inference. Operates with 67% more energy efficiency and provides 4.7x higher peak compute performance per chip compared to previous generation TPU v5e.

Availability

Trillium is generally available in North America (US East region), Europe (West region), and Asia (Northeast region)

How It Works

Get an inside look at the magic of Google Cloud TPUs, including a rare inside view of the data centers . Customers use Cloud TPUs to run some of the large-scale AI workloads and that capacity comes from much more than just a chip. In this video, take a look at the components of the TPU system, including data center networking, optical circuit switches, water cooling systems, biometric security verification and more.

Common Uses

Run large-scale AI pre-training workloads

Additional resources

Efficient post-training and reinforcement learning

Scale reinforcement learning workloads efficiently

Build base models into intelligent agents through intensive post-training workflows. The 8th generation of TPU system rapidly processes continuous reinforcement learning trials, rewarding the best reasoning paths without the cycle delays common to previous generations. This allows you to efficiently fine-tune world models, enabling agents to refine their reasoning in simulated environments before executing in the real world.

Running PyTorch Natively on TPUs at Google Scale

Additional resources

Scale reinforcement learning workloads efficiently

Running PyTorch Natively on TPUs at Google Scale

Low-latency AI inference workloads at scale

Additional resources

Generate a solution

What problem are you trying to solve?

What you'll get:

Step-by-step guide

Reference architecture

Available pre-built solutions

Start your proof of concept

Try Cloud TPUs for free

Get a quick intro to using Cloud TPUs

Run PyTorch on TPUs

Run JAX on TPUs

Serve using vLLM on TPUs

Business Case

Autonomous reasoning agents

TPUs provide the memory bandwidth and low-latency inference required to run continuous, multi-step reasoning loops for real-time coding assistants, autonomous customer service, and security operations.

Foundation models and multimodal generative AI

Delivering continuous, high-throughput compute, TPUs efficiently build and serve massive foundation models across text, image, audio, and video modalities.

Precision science and healthcare

TPUs manage complex, matrix-heavy mathematics to accelerate computationally intensive simulations for structural biology, genomic sequencing, and drug discovery.

Physical AI

Build physical agents that interact with and adapt to the real world. Simulate and train robots, autonomous agents, and industrial machines faster and more efficiently with synthetic and real-world data.