Kernel which uses a very large array (exceeds register limitation)
Our app uses around 7g of global memory. The app execute successfully.
A new feature was added to the app which takes around 3g
Now the kernel (mentioned above) is not executed for Out of memory reason (Even there is at least unused 5g )
Since there is a lot of unused memory I assume that part of the memory used in the new feature , in some way, occupies the memory that was needed to run the kernel.
But which memory?
An authoritative answer cannot be given with no code. However this:
may be a factor. You can find forum posts discussing the effect of local memory usage on the overall memory consumption of a kernel. Here is one example.