I have this simple piece of code:
dim3 gridDim(count_useful / numGroupPerBlock,1);
The variables have the following values:
count_useful = 32
numGroupPerBlock = 32
The program prints:
Error:out of memory
In the kernel code there is a variable declared as following;
extern shared u64 balanced;
which I use for computation. From what I know there is a size limit on shared memory of 16KB so why the program gives me such an error?
Thanks for replies.
EDIT: the message is not “out of memory” but “too many resources requested for launch”
You ask for 8*32 threads. That means each thread cannot use more than 8192/(256)=32 registers per thread. You probably use more than that in your kernel.
Ok I solved just before reading your answer: I use 40 registers per thread.
Thanks for help