I have a code which runs fast on GPU but requires much more RAM than it does when it runs on CPU. For example the code runs on 8GB CPU RAM, but requires 36GB GPU RAM. Since I’m now trying to run on an A10G GPU (which has 4xGPUS with 24GB RAM on each), hence my program wont run on this one because 24GB is less than the 36GB RAM I was using on T4.
So the next question is, can I modify my code to use less GPU RAM?
There is no magic expansion of memory requirements when porting host-based code to the GPU. What you are observing would appear to be specific to this particular application of yours.
In the absence of elaborating information, it appears that you chose to use more RAM for the GPU version of your code, presumable as a trade off between memory usage and performance. Since we have not been told anything about this code, offering advice on how to shrink the memory footprint is not really possible.
If this were my code, I would revisit the original design decision that caused RAM usage to bloat by a factor of more than 4 when porting the code to the GPU. This could involve an examination of the data structures and data types involved for example.
Here are some general references that I acquired while working on embedded and mobile products in the early 2000s. I do not have any particular recollection of the contents of any of them, and it is entirely possible that some or even much of the advice they provide is now outdated and/or no longer applicable.
David Loshin, Efficient Memory Programming, McGraw-Hill 1999
Rene Alexander & Graham Bensley, C++ Footprint and Performance Optimization, SAMS Publishing 2000
James Noble and Charles Weir, Small Memory Software: Patterns for systems with limited memory, Pearson Education 2001
Kris Kaspersky, Code Optimization: Effective Memory Usage, A-List LLC 2003
Frantisek Franek, Memory as a Programming Concept in C and C++, Cambridge University Press 2004
Question is properly still remain unanswered …
Why is same program requires less RAM in CPU compared to GPU ?
It depends on the port of the CPU program to GPU. There are reasons, why GPU programs typically use more memory:
- They are often optimized for a certain GPU or at least for a certain minimum amount of memory and can often assume that no other program is using the GPU at the same time. That typically is not true for the CPU
- GPUs use pipelining for copying memory and different stages of algorithms. Whereas CPUs more often modify data in place, GPUs often read one block of memory and write into a different one. That often makes it easier to avoid performance costly synchronizations
- GPUs often process more data at the same time to better use the parallelization. E.g. if you have a video editing software, on the GPU it could process 8 frames at the same time and on the CPU 1 frame (just an example).
- Nvidia GPUs profit a lot from coalesced memory accesses, where blocks of 32 or 128 bytes are read by a warp. Sometimes it is possible to improve the memory coalescing for the cost of overall higher memory consumption