This post was originally published on this site.
<meta name="x-tns-categories" content="AI / Hardware / Operations“><meta name="x-tns-authors" content="Joab Jackson“>
As a JavaScript developer, what non-React tools do you use most often?
â
Angular
0%
â
Astro
0%
â
Svelte
0%
â
Vue.js
0%
â
Other
0%
â
I only use React
0%
â
I don’t use JavaScript
0%
2024-09-07 07:00:53
When You Don’t Need GPUs to Run AI: A ‘Farm Fresh’ Case Study
How the latest generation of Intel AMX-enabled Xeon processors can power many AI jobs that previously could only be done with GPUs.
Sep 7th, 2024 7:00am by
While Nvidia enjoys unprecedented demand for GPUs as companies ramp up their AI projects, quiet progress from Intel and other chip makers has reduced the need for GPUs in the first place, allowing at least some workloads to be run without graphical processing units at all.
This was the takeaway of a talk at the VMware Explore conference, held last week in Las Vegas.
It was a presentation from the Ontario-based Nature Fresh Farms. The Ontario-based farming operation has 250 acres of greenhouses in the U.S. and Canada, growing 1.6 million plants at any one time through the year: bell peppers, tomatoes, cucumbers and strawberries.
All aspects of the grow cycle is managed by AI. With a handful of servers, all aspects of the plantâs life are monitored and controlled: water, light (both outside and greenhouse-generated), temperature, C02 levels, humidity, and nutrition in the soil.
Number of GPUs used? Zero.
Nature Fresh Farms
Bradley and his team started with a simple three-node Kubernetes cluster, along with Intelâs OpenVINO, an open source framework for running inference and other AI jobs.
The system pulls in information not only from greenhouse sensors, but from video and photos, and also from nearby weather stations. This allows the system then to fine-tune operations, such as giving a plant water only when it is sunny, or kicking on artificial light when it is cloudy outside.
âWeâve learned that, just like a human being, if you feed us when weâre hungry, we do better,â Keith Bradley, vice president of IT and security for Nature Fresh Farms.
Overall, the company collects about 12MB per plant for the entire life of the plant, or 23TB a year for the whole operation.
âWe analyze everything and use AI ML to improve each dayâs growth to get more optimal results,â Bradley said.
All this data is used not only to optimize the health of the crops but also to give sales projections to the sales team.
âAI helps us do that prediction because there are so many variables you donât see,â Bradley said.
AI is also used to scan the crates to ensure all the produce is ready for sale, identifying rotten pieces of produce from contaminating the entire box.
Because of AI, the company has seen 2-3% increase in yields over successive years, and increasing the amount of farming that can be done by a small crew.
Whatâs surprising here is that Nature Fresh is a relatively small operation. âWeâre not a multi-billion dollar company that can build a platform for an AI farm,â Bradley said.
Nature Fresh uses OneAPI to run workloads on different CPUs, so the IT team doesnât have to worry about writing specific workloads to specific CPUs or hardware accelerators. The company, along with Kubernetes also uses OneAPI to optimize the workloads, ensuring the live data for real-time decisions gets priority over less-urgent analysis jobs.
Because it is cross-platform, OneAPI gave Nature Fresh the ability to ramp up AI operations without a lot of capital for the latest hardware, Bradley said.
The OpenVINO site is a good place to learn about what AI jobs could be done by CPUs alone, which turns out to be quite a bit. It includes a robust set of Jupyter Notebook-based tutorials for starting a variety of AI jobs, including chatbots, LLMs, text image generation, video analysis, image colorization, noise reduction, gesture detection, object recognition and classification, facial recognition, handwriting-to-text, next word suggestion, medical image analysis, sound classification, gaze detection, defect recognition.
GPUs or CPUs
GPUs do one thing: matrix math.
The advantage of GPUs is that they can do mathematical operations in parallel on matrices really quickly. Although they were originally designed for rendering graphics, parallel math operations turn out to be really useful for AI as well.
But GPUs are no longer the only silicon available that can do matrix math. AMD has enhanced matrix capabilities with its latest EPYC line processors, and there are a growing number of hardware accelerators coming on the market.
In the latest round of Intel Xeon fourth generation (âSapphire Rapidsâ) and fifth generation (âEmerald Rapidsâ) CPUs, Intel has included Advanced Matrix Extensions (AMX), which puts some matrix math instructions in each core of the CPU itself.
Intel has estimated that AMX can increase Pytorch performance 10x, and works out-of-the-box with TensorFlow and OpenVINO (and VMwareâs vSphere 8 virtual machine platform).
Introduced in 2023, AMX introduced the âtwo-dimensional registries called tiles upon which accelerators can perform operations,â said Earl Ruby, principal engineer, R&D, Broadcom, who also spoke at this session.
âIt is intended as an extensible architecture. So the first accelerator implemented is called the matrix multiply unit, or TMOL, and data comes directly into the tiles same time. The host hops ahead and dispatches the loads for the tiles. TMOL operates on the data the moment itâs ready,â he said.
âAt the end of each multiplication round, this tile moves the cache and does some parallel processing that enables the processing of multiple data with a single instruction. And the goal of the software side is to make sure that both the host and the AMX unit are running simultaneously, which maximizes throughput and performance.â
Demo, Please
With this new hardware and supporting software, there are many AI workloads that, contrary to popular belief, could be run without GPUs, Ruby said
As an example, Ruby ran a demo. He opened the Hugging Face 7 billion parameter model on a cluster of older âIce Lakeâ-third generation of Intel Xeon server processors and compared the speed of that initialization with that of an identical ramp-up on a cluster of AMX-based processors.
The Ice Lake chip performance was sluggish at best.
âThis is why people were thinking, âOh, you have to have GPUs because if you want to do this sort of thing, performance is just not that great on CPU,â Ruby said.
In comparison, booting Hugging Face on the Sapphire Lake processor was quite snappy.
âItâs responsive enough to get real work done,â Ruby said.
Ruby then showed an example of fine-tuning the Hugging Face model with about 17,000 finance questions, which the AMX cluster was able to knock out the job in about 3.5 hours.
âSo if youâve got AMX sitting around in a data center and you want to do some fine-tuning overnight, you can do it,â Ruby said. âYou donât need GPUs.â
Ruby also used that model to run three chatbots, all on a single fourth-generation Xeon:
There are still cases where GPUs are essential: Where low latency or immediate responses are required, for fine-tuning huge models, or for creating models from scratch, Ruby said.
But there are an increasing number of cases where the latest CPUs will work just fine: Batch processing ML workloads, light inferencing, or when keeping operating costs down and power requirements in check, Ruby said.
âYou donât have to use GPUs. CPUs will cover [many] use cases. And if youâre trying to get started, and you donât want to have to buy a bunch of hardware just to get started, you can get started today with what youâve got,â Ruby said.
âWe donât need to be instantaneously responding all the time. A three- or four-minute lag doesnât affect us,â Bradley added. âBut once we hit that point, itâs great to know we can start to convert.â
âI donât know when we will hit that point, but itâs great to know we keep building the platform and use what we have.â
You can enjoy the entire talk here.
Disclosure: Broadcom paid travel/lodging for the reporter to attend this conference.
YOUTUBE.COM/THENEWSTACK
Tech moves fast, don’t miss an episode. Subscribe to our YouTube
channel to stream all our podcasts, interviews, demos, and more.
Created with Sketch.