I have narrowed down the problem in my code to the malloc statements in my kernel. They are not giving an error, but the values of other variables that are in the kernel are changing due to, what I suspect, is memory corruption from using too much of the heap. I have the cudaThreadGetLimit call in my code which returns 8MB. My kernel call looks as follows:
dim3 dimGrid (100,100);
dim3 dimBlock (1,1);
kernel <<< dimGrid, dimBlock >>> (…arguments…);
So I want 10000 threads (just trying to make it simplistic with the code I am working with). Inside the kernel, there are two places where there are malloc’s. The first allocates 2 char sequences (with the max of this being 500 chars each) and a matrix that is maximum 500*500 ints. By my calculations thats less than the limit given by cudaThreadGetLimit. Am I looking at this incorrectly? Is that value telling me something different than I am thinking? Does this 8MB in fact mean per thread or does it mean maximum memory that can be allocated by all threads together. Thanks for the help. I am a beginning CUDA programmer.