Alibaba unveils the network and datacenter design it uses for large language model training



Alibaba has revealed its datacenter design for LLM training, which apparently consists of an Ethernet-based network in which each host contains eight GPUs and nine NICs that each have two 200 GB/sec ports.

The tech giant, which also offers one of the best large language models (LLM) around via its Qwen model, trained on 110 billion parameters, says this design has been used in production for eight months, and aims to maximize the utilization of a GPU’s PCIe capabilities increasing the send/receive capacity of the network. 

https://cdn.mos.cms.futurecdn.net/utxCZJKcHqSFcgdF49JwpV-1200-80.jpg



Source link

Latest articles

spot_imgspot_img

Related articles

spot_imgspot_img