Hi, I am encountering excessive local memory allocation problem (I think) inside a device function I am trying to implement. I understand that I should not put too much arrays inside device functions as they will eat up too much memory. What are some techniques to allocate large array variables inside device functions?
Basically, I am trying to use a boolean array with size of 500^2 (shouldn’t be too big, it’s only about 30KB). I am declaring that variable as bool matrix[500*500]; There is a loop that iterates about 2000-4000 times, and storing and accessing each variable happens every iteration. Apparently the device function is throwing the following error:
ptxas error : Entry function ‘_globfunc__Z12fooPiS_PfS0_fS0_S_S’ uses too much local data (0x7d0 bytes + 0x3ebe8 bytes system, 0x4000 max)
If my question is too vague to answer, could you at least help me with some pointers in how to use as many variables as possible inside a device function?
The best thing to do is to post your specific kernel. Then we will be able to reproduce your situation and see where the problem is.
Now back to your problem, a bool is 1 byte (not 1 bit), so if you are using 500x500 booleans, that’s about 250KB of local memory, which is way more than a SP can hold. You’ll be much better off using bit fields instead.
I think you’ll find that your bool array in nowhere near 30KB in size, as you can’t create an array of bits. A 500^2 bool array would have a minimum size of 250KB assuming each bool is represented by a single byte. (Which is exactly what the error says if you interpret the hex) If you want to use arrays larger than a few items, use shared memory, and if the array is larger than 16KB, use global memory.