Cuda - Out of Memory Area



  • Nvidia RTX A4000
  • Global Memory 16g
  • Using CuFFT (creating FFT Plans)
  • Kernel which uses a very large array (exceeds register limitation)

Our app uses around 7g of global memory. The app execute successfully.

A new feature was added to the app which takes around 3g
Now the kernel (mentioned above) is not executed for Out of memory reason (Even there is at least unused 5g )
Since there is a lot of unused memory I assume that part of the memory used in the new feature , in some way, occupies the memory that was needed to run the kernel.
But which memory?

Can someone please explain me this issue?

An authoritative answer cannot be given with no code. However this:

may be a factor. You can find forum posts discussing the effect of local memory usage on the overall memory consumption of a kernel. Here is one example.

Its the same issue.

can you please explain how you did the calculation?

“The calculation will show that your 210816 byte per thread request requires 34540093440 bytes when considered device-wide (for V100 device)”

Thank you

the calculation follows the formula I linked to where I said “njuffa has described it here”


  1. Click on the link above where I said “Here is one example”.
  2. Then in the article it takes you to, locate the passage where I said “njuffa has described it here”
  3. Click on the link provided there.
  4. Read the linked article