Google Cloud Expands AI Infrastructure Domain With Sixth-Gen TPUs

Google Cloud will enhance AI cloud infrastructure with new TPUs and NVIDIA GPUs, the cloud division announced on Oct. 30 at the App Day & Infrastructure Summit.

Now in preview for cloud customers, the sixth-generation of the Trillium NPU powers many of Google Cloud’s most popular services, including Search and Maps.

“Through these advancements in AI infrastructure, Google Cloud empowers businesses and researchers to redefine the boundaries of AI innovation,” Mark Lohmeyer, VP and GM of Compute and AI Infrastructure at Google Cloud, wrote in a press release. “We are looking forward to the transformative new AI applications that will emerge from this powerful foundation.”

Trillium NPU speeds up generative AI processes

As large language models grow, so must the silicon to support them.

The sixth generation of the Trillium NPU delivers training, inference, and delivery of large language model applications at 91 exaflops in one TPU cluster. Google Cloud reports that the sixth-generation version offers a 4.7-times increase in peak compute performance per chip compared to the fifth generation. It doubles the High Bandwidth Memory capacity and the Interchip Interconnect bandwidth.

Trillium meets the high compute demands of large-scale diffusion models like Stable Diffusion XL. At its peak, Trillium infrastructure can link tens of thousands of chips, creating what Google Cloud describes as “a building-scale supercomputer.”

Enterprise customers have been asking for more cost-effective AI acceleration and increased inference performance, said Mohan Pichika, group product manager of AI infrastructure at Google Cloud, in an email to TechRepublic.

In the press release, Google Cloud customer Deniz Tuna, head of development at mobile app development company HubX, noted: “We used Trillium TPU for text-to-image creation with MaxDiffusion & FLUX.1 and the results are amazing! We were able to generate four images in 7 seconds — that’s a 35% improvement in response latency and ~45% reduction in cost/image against our current system!”

New Virtual Machines anticipate NVIDIA Blackwell chip delivery

In November, Google will add A3 Ultra VMs powered by NVIDIA H200 Tensor Core GPUs to their cloud services. The A3 Ultra VMs run AI or high-powered computing workloads on Google Cloud’s data center-wide network at 3.2 Tbps of GPU-to-GPU traffic. They also offer customers:

Integration with NVIDIA ConnectX-7 hardware.
2x the GPU-to-GPU networking bandwidth compared to the previous benchmark, A3 Mega.
Up to 2x higher LLM inferencing performance.
Nearly double the memory capacity.
1.4x more memory bandwidth.

The new VMs will be available through Google Cloud or Google Kubernetes Engine.

SEE: Blackwell GPUs are sold out for the next year, Nvidia CEO Jensen Huang said at an investors’ meeting in October.

Additional Google Cloud infrastructure updates support the growing enterprise LLM industry

Naturally, Google Cloud’s infrastructure offerings interoperate. For example, the A3 Mega is supported by the Jupiter data center network, which will soon see its own AI-workload-focused enhancement.

With its new network adapter, Titanium’s host offload capability now adapts more effectively to the diverse demands of AI workloads. The Titanium ML network adapter uses NVIDIA ConnectX-7 hardware and Google Cloud’s data-center-wide 4-way rail-aligned network to deliver 3.2 Tbps of GPU-to-GPU traffic. The benefits of this combination flow up to Jupiter, Google Cloud’s optical circuit switching network fabric.

Another key element of Google Cloud’s AI infrastructure is the processing power required for AI training and inference. Bringing large numbers of AI accelerators together is Hypercompute Cluster, which contains A3 Ultra VMs. Hypercompute Cluster can be configured via an API call, leverages reference libraries like JAX or PyTorch, and supports open AI models like Gemma2 and Llama3 for benchmarking.

Google Cloud customers can access Hypercompute Cluster with A3 Ultra VMs and Titanium ML network adapters in November.

These products address enterprise customer requests for optimized GPU utilization and simplified access to high-performance AI Infrastructure, said Pichika.

“Hypercompute Cluster provides an easy-to-use solution for enterprises to leverage the power of AI Hypercomputer for large-scale AI training and inference,” he said by email.

Google Cloud is also preparing racks for NVIDIA’s upcoming Blackwell GB200 NVL72 GPUs, anticipated for adoption by hyperscalers in early 2025. Once available, these GPUs will connect to Google’s Axion-processor-based VM series, leveraging Google’s custom Arm processors.

Pichika declined to directly address whether the timing of Hypercompute Cluster or Titanium ML was connected to delays in the delivery of Blackwell GPUs: “We’re excited to continue our work together to bring customers the best of both technologies.”

Two more services, the Hyperdisk ML AI/ML focused block storage service and the Parallestore AI/HPC focused parallel file system, are now generally available.

Google Cloud services can be reached across numerous international regions.

Competitors to Google Cloud for AI hosting

Google Cloud competes primarily with Amazon Web Services and Microsoft Azure in cloud hosting of large language models. Alibaba, IBM, Oracle, VMware, and others offer similar stables of large language model resources, although not always at the same scale.

According to Statista, Google Cloud held 10% of the cloud infrastructure services market worldwide in Q1 2024. Amazon AWS held 34% and Microsoft Azure held 25%.

https://assets.techrepublic.com/uploads/2024/10/tr_20241030-google-cloud–trillium-nvidia-ai-infrastructure.jpg

Source link
Megan Crouse

‘The Apple graveyard’: 7 mythical Apple products from the past 50 years that never saw the light of day

How to watch Dwars door Vlaanderen 2026: Free Streams & TV Channels

How to watch Dwars door Vlaanderen 2026: Free Streams & TV Channels

How Apple accidentally destroyed the record business — and why I wish we’d stuck with iPods

Ex-Israeli Intelligence Official: Shockwaves of Trump’s “Take Over Gaza” Heard, Felt Across Region

What UK political parties are promising in the 2019 general election

Otto Warmbier’s parents want North Korea to suffer for their son’s death

Could a ‘youthquake’ cause Boris Johnson to lose the general election?

Season 2 Blu-ray Collector’s Edition Preorder Price and Bonuses Revealed

Their Marriage – Hollywood Life

Donald Trump’s Address Impacts ‘Masked Singer’, ‘Survivor’, ‘Chicago’

Former Lucasfilm Chief Kathleen Kennedy Questions AI in Filmmaking

Jefferies Names Top Tobacco Stocks Amid Nicotine Pouch Competition

RBC Capital upgrades Barratt Redrow stock rating on valuation

Cantor Fitzgerald cuts Apellis stock rating on Biogen acquisition

Explained: Why global brokerages are hitting panic button on India. FII exodus, oil shock ringing alarm?

The YouTuber who has become one of Gen Z’s most beloved celebrities

26 last-minute holiday gifts that are still thoughtful and unique

Practicing gratitude regularly can make you less stressed and sleep better

8 things millennials wish you would just stop getting them for the holidays