the document CUDA_C_Programming_Guide.pdf (Site 135) describe how to allocate memory with malloc/free in kernel functions(global).
Example from there is:
__global__ void mallocTest()
{
char* ptr = (char*)malloc(123);
printf(“Thread %d got pointer: %p\nâ€, threadIdx.x, ptr);
free(ptr);
}
void main()
{
// Set a heap size of 128 megabytes. Note that this must
// be done before any kernel is launched.
cudaThreadSetLimit(cudaLimitMallocHeapSize, 128*1024*1024);
mallocTest<<<1, 5>>>();
cudaThreadSynchronize();
}
I use the same example but —> error: calling a host function from a device/global function is not allowed
Since Toolkit Version 3.2 is it possible to allocate memory with malloc.
From Changelog Toolkit 3.2 :“Support for memory management using malloc() and free() in CUDA C compute kernels”
Do you have a compute capability 2.0 or 2.1 card (ie. Fermi), and are you passing a code generation option to nvcc to build for compute capability 2.0 or 2.1?