Why are GPU so memory bound?

I think optional tags missing VRAM and Memory is quite telling about the current situation in the world of GPU. As a content creator I’ve struggled to understand why my GPU memory is 100% utilized while the GPU shows 0% utilized for 30 hours. I’ve got a RTX 3070 with 8GB VRAM.
Now I’m trying to use Stable Diffusion with CUDA and pytorch. I can’t do anything over 512x512 without getting out of memory errors. Other people say they have no issue, but I can’t find anyone who understands this topic of GPU memory. With system RAM we can set a virtual memory file on a SSD drive and at least see the SDD performance utilization going up, thus proving to us that indeed we are memory bound and having more memory would help us compute faster. Also that lets us at least continue our work at a slower pace. When I have no ability to add more memory then I’m just stuck no able to complete a task due to CUDA out of memory.
Where can I learn all about GPU memory, how to manage it, how to get beyond out of memory errors.
Most of all, why are GPU manufactures making more cores and not focused on adding more memory? I’d love to see a RTX 3070 with 256GB of VRAM rather than a 4070 that is still memory bound. Us 3D creators need a ton of memory to hold our maps, textures, and models. Pretty sad that I can render a 8K imag in DAZ3D, but I can’t render bigger than 512x512 with Stable Diffusion. 2048x2048 says it needs 256GiB of GPU memory.
I’m so frustrated with the direction of GPUs. People with RTX 3060 and 12GB able to render things that I can’t with a RTX3070 8GB.
Any help understanding GPU memory management with CUDA will be appreciated.

Speaking in generalities, from an engineering perspective it is moderately hard to create high-capacity memory (think TB SSDs), and it is moderately hard to create high-speed memory (think 5 GHz SRAMs inside the CPU). What is very hard is to create memory that provides both high capacity and high speed. Solving very hard problems is generally also very expensive. There are analogies in other engineering disciplines, for example aircraft construction. That is why large supersonic aircraft are not a thing: no A380 at Mach 2+ expected anytime soon.

GPUs provide massive amounts of computational horsepower. This is something that can be provided in relatively straightforward ways by massively parallel architectures, and is largely limited by die size. As a consequence, these days FLOPS are “too cheap to meter”.

In order to keep all those execution units busy, they need to be fed with massive amounts of data. That means high-speed memory (specifically: high-throughput memory) is absolutely essential to GPUs. Making that high-speed memory large is difficult, but doable up to a point. Any such high-performance solution will be expensive. NVIDIA will happily sell you an A6000 with 48GB of memory , or an A100 with 80 GB of memory (the latter via a system integrator, presumably), you just have to pony up the necessary cash.

As a user, it is up to you to select the hardware that is most suitable for your use case(s). Some software assumes copious amounts of memory is available. Maybe this is largely unavoidable. Maybe this is due to a conscious trade-off on the part of the creators to keep code complexity low. Maybe nobody paid attention to using memory efficiently because people in the relevant target market have traditionally used hardware with lots of memory. You can try to complain to the software makers, but if the product is free (as in beer) that is probably not a fruitful use of your time.

If your use case requires lots of memory, but has modest requirements for FLOPS and memory throughput, purchase hardware accordingly. That applies to GPUs as well as host systems.

1 Like

Thank you for the thoughtful reply. I saw rumor of conspiracy that Nvidia won’t offer large memory in consumer products in order to get people to buy their commercial products. Not sure how true that is. Are they going to release a RTX 4090 Ti with 48GB?

To the point of memory cost I was thinking in terms of laptop memory upgrades. For example, 64GB to 128GB laptop is about $500 more. Upon further investigation I see that is for DDR4 memory, while these GPU have DDR6, which is much more expensive memory. Thus making the $3000 price increase from RTX 4090 24GB to RTX A6000 48GB perhaps more reasonable. Doesn’t seem reasonable though. Will be insightful to see what they price the RTX 4090 Ti if it has 48GB. The 4090 and A6000 have comparable specs other than amount of VRAM.

I just wish I had known this before buying a 3070 with 8GB instead of a 3060 with 12GB. Little did I know back then how critical those extra 4GB would be to enjoying my experience with content creation. I sort of gave up on it due to the constant GPU out of memory errors, with no way to proceed other than buying a new laptop that I can’t afford. Cest la vie.

Rumors and conspiracy theories are rarely worth the bits expended on them. Does NVIDIA practice market segmentation in their product line? Sure, as do car manufacturers, airlines, IP providers, etc. Generally, consumers pay more for more features, better performance, and better service/support. Also, when approaching the halo product the price curve is typically not linear.

Personally, I doubt there is much of a market in the consumer segment for an RTX 4090 with 48GB, so I would consider it unlikely for such a product to materialize any time soon. From what I see online, consumers are complaining loudly about the “high” price of the RTX 4090. Guess what doubling the amount of memory would do to the price of such a SKU? GDDR6 is a high-performance product for which vendors charge a premium. It is also a more specialized market compared to DDR4, with very few competitors. As a consequence one cannot draw conclusions from the DDR4 market with regard to GDDR6 (or HBM2, an even more niche market).

GPU chips are typically designed with a specific amount of pins used for the memory interface. Each pin and associated I/O driver costs die size, power consumption, and money. Presumably the RTX 4090 already uses the full width of the memory interface as designed. That means doubling the on-board memory would require using DRAM chips with twice the capacity of current memory chips. However, DRAM manufacturers only increase the maximum capacity of individual chips every few years, and to my knowledge the RTX 4090 already uses the highest density DRAM available.

Buyer’s remorse is a common occurrence. You could purchase a different GPU (possibly a used one), and sell your current one. For future purchases, you may want to a adopt a checklist that clearly specifies the “must have” and “nice to have” features based on the requirements of your use case(s).