How to cut the cost of GPU instances for AI

Nick Chase

March 20, 2024

•

4 mins

When you are trying to build any product, you always want to get the most bank for your buck. Even a few cents can be the difference between a viable product and one that only profits the people you’re getting your resources from. The guys who got rich on the gold rush were selling shovels.

The same is true when you’re looking for compute resources for your AI application. In many cases, the cheapest way is to use a hosted AI model like OpenAI. It’s dirt cheap, it has good performance, and you don’t have to dance around all these “sorry, we’ve never faced that before“ problems.

But what if third-party AI is not an option for you? Maybe you need to compare different models, or your workload is way bigger than API limits, or you just don't want to share your data with anyone else. Did you meet with that OpenAI GPT personally? Are you sure it’s not a snitch?

Anyway, you’ve got your reasons, so let’s see how much running your own resources will cost you. There are a lot of platforms on which you can rent a GPU instance, but let’s stick to AWS because that’s such a common choice.
The g4dn.xlarge instance is a very default configuration for AI applications. It costs $0.526/hour for on demand resources, which comes to about $380/month. Not terribly extravagant (unless you’re running a few thousand machines), but let’s try to cut it down.

First, we need to understand our workload schedule. If we are planning to use our model 24/7, it makes sense to switch to a 1-yr Reserved Instance, which comes down to $230/month, but if the workload is running less than 14 hours per day, you may want to stay on the on demand plan. You can cut your expenses even further by switching to the 3-yr Reserved Instance plan, but 3 years is a long time in the hardware market, and you’re almost guaranteed to see much better options long before that 3 years is up.

‍

Speaking of which, g4ad.xlarge is one of those options. It uses an AMD GPU instead of NVidia, it has minor hardware differences from the g4dn.xlarge (such as SSD, network bandwidth, etc.) and even better price $170/month.

Wow, it’s almost 30% cheaper, so where's the catch? Well, there are many, actually. Let’s dive into a little history to understand why and what to expect in the future.

NVidia released CUDA (Compute Unified Device Architecture) back in 2007, the year the Crysis video game hit the charts. What a year! CUDA enabled processing of parallel workloads much faster than any CPU, and brought it to mortal users. That made a huge impact on machine learning and in fact started what later would be LLM’s (and Bitcoin of course).

AMD responded with its “Close to Metal” campaign in 2008, but no one batted an eye. Most users were using NVidia GPUs already, so there was no good reason to switch to AMD--especially since you were setting yourself up for a lot of software issues. Later, in 2014, AMD released an open standard GPU computing called HSA (Heterogeneous System Architecture). But as with many other “new world standard“ interfaces, it became standard only for AMD. (People who have more than 3 AC adapters know what I’m talking about!) The current product, ROCm (Radeon Open Compute platform), is AMD’s third attempt to take their part of market. So why is it different?

‍

Well, the product itself is not very different, but the market is now waiting for better options and that changes everything. Now GPU computations have becomes mandatory for many software giants and even for smaller companies. Having a 30% price advantage is just not fair. There’s a very big chance for AMD to finally get what they aimed for.

Unfortunately, they still have to overcome the same old problems: from a perceived lack of documentation to performance drops that may be related to NVidia, because you can't test CUDA on AMD.

Want to give it a try? You can use the rocm/pytorch:latest preconfigured image and build pytorch with pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7.

‍

DALL-E 3 “intel joining amd and nvidia fight“

‍

So what should we expect in the future? Having multiple contestants on the market definitely lowers the price. Intel will probably release their product for most popular platforms. They can’t compete with AMD and NVidia in the gaming market, because they would have to support thousands of configurations, but for GPU calculations you only need to pull several prebuilt configs and sell them through third party platforms.

And who knows? the community may come up with a non-hardware-specific solution so you focus on the business logic of your app again, just like in 2016. Keep an eye on the news and don't forget to support open source!

‍

How to cut the cost of GPU instances for AI

Lowering the cost of failure

What is Hugging Face, and how do I use Huggingface.js?

Get the latest news about CloudGeometry, AI Agents, GenAI, Data, Kubernetes & Application Modernization solutions in your Inbox

Email

Phone

Office