malloc-ing in either host or device code

Hi-

I solved the issues in my earlier opening post and now have running code (parallel ray intersections in a height field array, for what its worth). So naturally I’m ready to break it and try something that might be faster. As part of that I’d like to dynamically malloc some data within routines that I’d like to be usable for either host or device. So the first part of my question is “is it OK to use cudaMalloc within kernel executed code?” If so. can I write a function like this:

[codebox]host device void *Malloc(uint sz)

{

#ifdef ‘‘compiling for host’’

return malloc(sz);

#else

void *p;

cudaMalloc(&p, sz);

return p;

#endif

}[/codebox]

that aims to abstract memory allocation. The problem I first face is that I don’t see a macro for ‘‘compiling for host’’ as I allude to above. From forum and other searches, I see there are CUDACC and CUDABE marcos, are these what I would use? Or do I need the ‘split’ the function? I did try splitting the function using the same name and argument list like this:

[codebox]host void *Malloc(uint sz)

{

return malloc(sz);

}

device void *Malloc(uint sz)

{

void *p;

CudaMalloc(&p, sz);

return p;

}[/codebox]

but that doesn’t work as it results in error: function “Malloc” has already been defined so the host and device decorations don’t distinguish these functions.

Thanks!

Allocating device memory is done on the host!

device functions are only callable from inside your kernels (global functions) and you cannot allocate device memory from a device function.

Thanks, I was afraid of that. I’ll have to preallocate from the host.

But to add, is there a compiler def to switch code in host device functions?