malloc in kernel

is there any size limit when using malloc in kernel. here is my code.

int index = threadIdx.x;

float* zz_r = (float*)malloc(sizeof(float)* 100000);
for(int i=0; i<100000; i++)
    zz_r[i] = 0;

printf("%d\n", index);

when i used size of 10000, i can print all “index”. however, when i increase the size to 100000, i doesn’t print anything. i just to make sure that all threads done with their own initialization.

Two questions:

  • How many threads are in your grid?
  • Are any error codes returned by the CUDA functions?
  • I have only 1 block with 130 threads

  • Nope, it’s compiled and run without any error

Are you using a card which allows you to use cudaThreadSetLimit() to change the thread specific malloc heap size? (2x and over, I think)

I’m not sure of the behaviour if you allocate past the heap size, but it may not cause an error then fail to execute. You can check the current limit with cudaThreadGetLimit(&limit, cudaLimitMallocHeapSize), then up the limit if needed with cudaThreadSetLimit(cudaLimitMallocHeapSize, newlimitsize)