Data Science Central

This post was originally published on this site.

Although it’s rarely publicized in the media, not everything about deploying—and certainly not about training or fine-tuning—advanced machine learning models are readily accessible through an API. For certain implementations, the success of enterprise-scale applications of language models hinges on hardware, supporting infrastructure, and other practicalities that require more than just a cloud service provider.

Graphics Processing Units (GPUs) are arguably at the forefront of these considerations, particularly for organizations with their own data centers. Although the compute resources GPUs furnish is indispensable for certain AI use cases (particularly for building models and other data science requisites), they’re not always readily available.

“They seem to be hard to find, or they come with a lot of strings attached,” admitted Eli Lahr, Leaseweb Senior Solutions Engineer. “Long commitments, you know, high amounts of upfront costs. Or, companies just can’t get them, no matter how much they ask or how much money they offer. Their providers don’t have them available.”

Leaseweb’s recent announcement of the availability of a series of NVIDIA GPUs was designed to mitigate these issues and fortify the infrastructure necessary for employing advanced machine learning. The company is offering the NVIDIA L4, NVIDIA L40S, and NVIDIA H100 NVL, which are made available through its line of dedicated servers.

“They basically span everything from small to large use cases,” mentioned Devon Rutherford, Sales Solutions and Operations Manager at Leaseweb. “We can do multiple GPUs per server. We work with a number of different clients that utilize our GPUs for everything from language models to video analytics.”

Scalable enterprise processing

The worth of GPUs to enterprise applications of language models and other advanced machine learning use cases is inestimable. Moreover, they transcend the basic summarization and question-answering paradigm popularized through RAG and other prompt augmentation methods. Instead, they power applications that come close to realizing the full scale of enterprise AI for mission-critical use cases. Lahr articulated a healthcare use case that likely wouldn’t be possible sans the scalable processing supplied by GPUs.

The healthcare provider culls data from a number of sources, including medical records, analyzes the data, and employs the results to personalize treatment for patients. With this approach, the organization “makes sure people aren’t taking medications that they aren’t supposed to take, or taking their history into account—family history, genetics,” Lahr commented. “It’s just an incredible amount of data. And then, they’re able to advise local healthcare workers on what they should do and shouldn’t do with any particular patient.”

The GPU selection process

The healthcare organization Lahr mentioned employs Leaseweb’s NVIDIA GPUs for this application. However, there are numerous GPUs available, which puts the onus on the enterprise to select the right one to fit its particular needs. The specific application almost always proves determinative in this respect. “The more important, larger conversation we have to have with any of our customers is where are you with things?” Lahr revealed. “Is this an inference model? Are we in production? Are we training?”

The data science requirements for building models from scratch, or even fine-tuning models in some cases, are typically more stringent than those for operationalizing models. However, as the data science cycle inevitably involves both ends of this spectrum, organizations should strategize about which types of GPUs they access. “If you’re training, you probably need a little bit more power,” Lahr noted. “That’s where the H100s really come in. By the time you get to the production workload, you might not need as much power and the L4 or L40S might do.”

GPU strategy

The H100s Lahr mentioned are the most capable of the GPUs Leaseweb currently offers, while the L4 is the least. Nonetheless, it can power any number of workloads, including those that are fundamental to data science and hints at the evolution of these processors and their overall merit to the enterprise. “The L4, we call it the entry-level now, but it does a lot more than the mid-level did in the last generation,” Lahr commented. “Things change quite a bit as we go from group to group.”

When employing GPUs, therefore, it may behoove organizations to select less capable (and less expensive) ones, and consider combining them in workloads—particularly since more than one can be accessed through dedicated servers. “That’s a good strategy for some, to stay with multiple small cards instead of one big one,” Lahr said. “We can build to suit. We’re flexible that way.”

Scalable AI processing

GPUs are pivotal for providing the computational resources organizations require to scale their cognitive computing needs. They can accelerate these deployments for all facets of AI, from building and fine-tuning models to putting them in production. Consequently, they’ll likely remain a priority for a number of use cases, particularly those involving facets of high-performance computing.Â