Why does GPU code require more RAM than same code on CPU?

I have a code which runs fast on GPU but requires much more RAM than it does when it runs on CPU. For example the code runs on 8GB CPU RAM, but requires 36GB GPU RAM. Since I’m now trying to run on an A10G GPU (which has 4xGPUS with 24GB RAM on each), hence my program wont run on this one because 24GB is less than the 36GB RAM I was using on T4.
So the next question is, can I modify my code to use less GPU RAM?

There is no magic expansion of memory requirements when porting host-based code to the GPU. What you are observing would appear to be specific to this particular application of yours.

In the absence of elaborating information, it appears that you chose to use more RAM for the GPU version of your code, presumable as a trade off between memory usage and performance. Since we have not been told anything about this code, offering advice on how to shrink the memory footprint is not really possible.

If this were my code, I would revisit the original design decision that caused RAM usage to bloat by a factor of more than 4 when porting the code to the GPU. This could involve an examination of the data structures and data types involved for example.

Here are some general references that I acquired while working on embedded and mobile products in the early 2000s. I do not have any particular recollection of the contents of any of them, and it is entirely possible that some or even much of the advice they provide is now outdated and/or no longer applicable.

David Loshin, Efficient Memory Programming, McGraw-Hill 1999
Rene Alexander & Graham Bensley, C++ Footprint and Performance Optimization, SAMS Publishing 2000
James Noble and Charles Weir, Small Memory Software: Patterns for systems with limited memory, Pearson Education 2001
Kris Kaspersky, Code Optimization: Effective Memory Usage, A-List LLC 2003
Frantisek Franek, Memory as a Programming Concept in C and C++, Cambridge University Press 2004