I have this simple piece of code:
dim3 gridDim(count_useful / numGroupPerBlock,1);
dim3 blockDim(numGroupPerBlock,8);
printf("Size:%d\n",5*numGroupPerBlock*8*sizeof(u64));
processParallelMessagesBalanced<<<gridDim,blockDim,5*numGroupPerBlock*8*sizeof(u64)>>>
(d_chunkPointer,numBlocks_loc[0],d_numText,numGroupPerBlock);
printf("Error: %s\n",cudaGetErrorString(cudaGetLastError()));
The variables have the following values:
count_useful = 32
numGroupPerBlock = 32
The program prints:
Size: 10240
Error:out of memory
In the kernel code there is a variable declared as following;
extern shared u64 balanced ;
which I use for computation. From what I know there is a size limit on shared memory of 16KB so why the program gives me such an error?
Thanks for replies.
EDIT: the message is not “out of memory” but “too many resources requested for launch”
I have this simple piece of code:
dim3 gridDim(count_useful / numGroupPerBlock,1);
dim3 blockDim(numGroupPerBlock,8);
printf("Size:%d\n",5*numGroupPerBlock*8*sizeof(u64));
processParallelMessagesBalanced<<<gridDim,blockDim,5*numGroupPerBlock*8*sizeof(u64)>>>
(d_chunkPointer,numBlocks_loc[0],d_numText,numGroupPerBlock);
printf("Error: %s\n",cudaGetErrorString(cudaGetLastError()));
The variables have the following values:
count_useful = 32
numGroupPerBlock = 32
The program prints:
Size: 10240
Error:out of memory
In the kernel code there is a variable declared as following;
extern shared u64 balanced ;
which I use for computation. From what I know there is a size limit on shared memory of 16KB so why the program gives me such an error?
Thanks for replies.
EDIT: the message is not “out of memory” but “too many resources requested for launch”
You ask for 8*32 threads. That means each thread cannot use more than 8192/(256)=32 registers per thread. You probably use more than that in your kernel.
Ok I solved just before reading your answer: I use 40 registers per thread.
Thanks for help External Media