Hi, I get the compiler error:
ptxas error : Entry function ‘ppush_kernel’ uses too much local data (0x30d40 bytes, 0x4000 max)
‘ppush_kernel’ is a global subroutine that is called by the main host program.
It takes 11 largish (50,000 type real elements) device arrays as parameters. These arrays are stored in a .h file which is included in a module file that the main program uses. But this module file is not used in the specific ppush_kernel subroutine.
Within the ppush_kernel subroutine I declare another 50,000 element (local) array. I am assuming it is automatically shared amongst threads, and so it is only stored in memory once, though I never explicitly declared it as a shared array.
I’m wondering which arrays would be causing the problem, and what are the solutions?
Also another question: how does the compiler know what the memory limits are of the GPU I am using? And if I compile on a machine different then what I actually run on could this cause a problem? (e.g. some cases of submitting a job to a cluster).