Pulling Back the Curtain: The Infrastructure Behind the Magic of AI – InformationWeek

This post was originally published on this site.

The science fiction writer Arthur C. Clarke famously said, “Any sufficiently advanced technology is indistinguishable from magic.” When we command ChatGPT to create an image or an essay, we may as well be casting a spell. Given the speed with which creation occurs, the “magical” quality of artificial intelligence is unavoidable.

Think of a great magic trick by magicians Penn and Teller, Siegfried and Roy or David Blaine. We only see the outcome, which appears effortless, though we intuitively suspect that it was harder than it looks.

With artificial intelligence firmly on the zeitgeist for consumers and businesses, it’s worth unpacking the physical realities underpinning some of the recent leaps in AI technology that are a testament to the innovation of hardware technology and computing, and a reminder that there are physical enablements and bottlenecks helping make AI a reality or constraining growth.

Here’s what isn’t magic. Many of the machine learning algorithms that are foundational to current generative AI go back to the 1980s, and in many ways have simple beginnings, with statistical probabilistic groundwork and weighting at their core. The biggest difference is that current technology enables the analysis of billion-plus and trillion-plus parameter data sets compared to thousand-plus in the 1990s and million-plus in the 2000s.

Related:10 Hottest AI Jobs

It starts with the semiconductor chip, whose technological advancement remains staggering and foundational to making much of this innovation possible. Whereas 60 years ago the size of a single transistor made it visible to the human eye, the latest transistors are now 10,000 times smaller than a human hair, built at the microscopic level. This is critical to why the current iPhone 15 is as powerful as the world’s fastest supercomputer was in 1998 with 19 billion transistors on a single A17 chip.

GPUs push this density even further with 80 billion transistors on a single Nvidia H100 chip, and 208 billion transistors on the next generation Nvidia Blackwell chips. Greater transistor density translates to greater computing power, particularly when coupled with the shift to accelerated parallelized computing via GPUs versus serialized computing for CPUs.

The evolution of transistor density and GPU chip sizes has direct physical demands. More power is required on a per-chip basis and is going up rapidly from 400 watts per Nvidia A100 chip to 700 watts per H100 chip and a projected 1,000 watts per future B200 chip and 1,200 watts per GB200 chip. This is a major increase from the 200-400-watt CPU servers more typically supportive of a cloud computing environment.

As singular chips and servers, these power requirements aren’t too daunting. However, the evolutionary jump in power needs is due to the fact that the latest Nvidia servers typically require at least 8 GPUs per unit, along with attached storage and a CPU that adds a multiplier of 1.25-1.5X to power requirements. In 2022, the 1.8 trillion parameter large language model GPT4 that powers OpenAI’s ChatGPT required 20,000 highly networked A100 GPUs running for 90 days with an IT power load of approximately 12 megawatts steady state and another 3-6 megawatts of power to cool that supercluster.

Fast forward to today. In June, Oracle announced a new OCI Supercluster able to scale up to 64,000 GB200 chips — that’s potentially 115 IT megawatts with another 29-58 megawatts required to cool it. On July 22, Elon Musk announced that xAI will have a 100,000 H100 cluster going online by the end of 2024. That’s potentially 105 IT megawatts with another 26-52 megawatts to cool it. OpenAI’s GPT5 is expected to have 10 trillion parameters and to be trained on a cluster of this size.

Let’s put this into context. For those two clusters alone, approximately 150-175 megawatts per cluster would be required to support a 90-day AI training operation. For a single cluster, that’s larger than the vast majority of data centers in operation today. With the average nuclear power plant in the US sized at 1.8 gigawatts, that’s nearly 10% of its capacity for a single cluster. There are still plenty of smaller clusters in operation at the 5K, 10K, 20K GPU levels, but the dynamic remains similar: These are 10 to 40 megawatt clusters that have the potential to use all the power in a typical data center.

The availability of large blocks of consistent, reliable power remains a bottleneck for deploying foundational GPU superclusters, yet it’s not as simple as just plugging into the grid, because these training clusters require high availability. A power disruption on day 75 of a 90-day training run could catastrophically lead to corruption or a restart altogether. When even the best utilities have 4-6 power events (from blackouts to brownouts) per year, the importance of high availability data center design and operations remains critical to AI enablement.

The magic of AI is grounded by simple concepts but repeated at previously unfathomable scale. While there is an emergent skepticism around the inherent value of AI, Big Tech is doubling down on their investment based on the promise of AI inference, which is the process of using a trained machine learning model to analyze and evaluate new data. As innovation continues to uncover meaningful ways to release the power of AI in real time, the wave of required underlying infrastructure expansion will only continue.