Hi-
I solved the issues in my earlier opening post and now have running code (parallel ray intersections in a height field array, for what its worth). So naturally I’m ready to break it and try something that might be faster. As part of that I’d like to dynamically malloc some data within routines that I’d like to be usable for either host or device. So the first part of my question is “is it OK to use cudaMalloc within kernel executed code?” If so. can I write a function like this:
[codebox]host device void *Malloc(uint sz)
{
#ifdef ‘‘compiling for host’’
return malloc(sz);
void *p;
cudaMalloc(&p, sz);
return p;
}[/codebox]
that aims to abstract memory allocation. The problem I first face is that I don’t see a macro for ‘‘compiling for host’’ as I allude to above. From forum and other searches, I see there are CUDACC and CUDABE marcos, are these what I would use? Or do I need the ‘split’ the function? I did try splitting the function using the same name and argument list like this:
[codebox]host void *Malloc(uint sz)
{
return malloc(sz);
}
device void *Malloc(uint sz)
{
void *p;
CudaMalloc(&p, sz);
return p;
}[/codebox]
but that doesn’t work as it results in error: function “Malloc” has already been defined so the host and device decorations don’t distinguish these functions.
Thanks!