- Reaction score
- 1,936
Now, witness the firepower of this fully armed and operational AI supercluster
Elon Musk's new expensive project, the xAI Colossus AI supercomputer, has been detailed for the first time. YouTuber ServeTheHome was granted access to the Supermicro servers within the 100,000 GPU beast, showing off several facets of the supercomputer. Musk's xAI Colossus supercluster has been online for almost two months, after a 122-day assembly.
Patrick from ServeTheHome takes a camera around several parts of the server, providing a birds-eye view of its operations. The finer details of the supercomputer, like its power draw and pump sizes, could not be revealed under a non-disclosure agreement, and xAI blurred and censored parts of the video before its release. The most important things, like the Supermicro GPU servers, were left mostly intact in the footage above.
The GPU servers are Nvidia HGX H100s, a server solution containing eight H100 GPUs each. The HGX H100 platform is packaged inside Supermicro's 4U Universal GPU Liquid Cooled system, providing easy hot-swappable liquid cooling to each GPU. These servers are loaded inside racks which hold eight servers each, making 64 GPUs per rack. 1U manifolds are sandwiched between each HGX H100, providing the liquid cooling the servers need. At the bottom of each rack is another Supermicro 4U unit, this time with a redundant pump system and rack monitoring system.
These racks are paired in groups of eight, making 512 GPUs per array. Each server has four redundant power supplies, with the rear of the GPU racks revealing 3-phase power supplies, Ethernet switches, and a rack-sized manifold providing all of the liquid cooling. There are over 1,500 GPU racks within the Colossus cluster, or close to 200 arrays of racks. According to Nvidia CEO Jensen Huang, the GPUs for these 200 arrays were fully installed in only three weeks.
www.tomshardware.com
Elon Musk's new expensive project, the xAI Colossus AI supercomputer, has been detailed for the first time. YouTuber ServeTheHome was granted access to the Supermicro servers within the 100,000 GPU beast, showing off several facets of the supercomputer. Musk's xAI Colossus supercluster has been online for almost two months, after a 122-day assembly.
Patrick from ServeTheHome takes a camera around several parts of the server, providing a birds-eye view of its operations. The finer details of the supercomputer, like its power draw and pump sizes, could not be revealed under a non-disclosure agreement, and xAI blurred and censored parts of the video before its release. The most important things, like the Supermicro GPU servers, were left mostly intact in the footage above.
The GPU servers are Nvidia HGX H100s, a server solution containing eight H100 GPUs each. The HGX H100 platform is packaged inside Supermicro's 4U Universal GPU Liquid Cooled system, providing easy hot-swappable liquid cooling to each GPU. These servers are loaded inside racks which hold eight servers each, making 64 GPUs per rack. 1U manifolds are sandwiched between each HGX H100, providing the liquid cooling the servers need. At the bottom of each rack is another Supermicro 4U unit, this time with a redundant pump system and rack monitoring system.
These racks are paired in groups of eight, making 512 GPUs per array. Each server has four redundant power supplies, with the rear of the GPU racks revealing 3-phase power supplies, Ethernet switches, and a rack-sized manifold providing all of the liquid cooling. There are over 1,500 GPU racks within the Colossus cluster, or close to 200 arrays of racks. According to Nvidia CEO Jensen Huang, the GPUs for these 200 arrays were fully installed in only three weeks.
First in-depth look at Elon Musk's 100,000 GPU AI cluster — xAI Colossus reveals its secrets
Now, witness the firepower of this fully armed and operational AI supercluster


