Allocation of double pointers in cudaHostGetDevicePointer

Hello Guys,

I’m trying to allocate a double pointer using CUDA, so I’ve allocated a double pointer using cudaHostAlloc, with the mapped flag on, so when I try to get access to the device pointer that corresponds to this double pointer (char**) by using the next line code

cudaSafeCall(cudaHostGetDevicePointer((void **)&(inputDevice[chunkIndex]), (void *)(inputHost[chunkIndex]), 0));

it gives me a bus error code: cudaErrorInvalidValue…

but when I change the char** to char* it works just fine…

so how can I allocate double pointers in the device using cudaHostGetDevicePointer?

Thank you for you time, guys…

I am pretty sure that the answer is that you can’t, because it requires dereferencing a device pointer on the host, which is illegal. What you can do is assign pointer values inside GPU kernels, so a work around is to allocate the storage for the pointers, then the storage for what they will point to and assign the pointer values inside a small initialization kernel. The argument list length for CUDA is rather finite, so if there is a lot of pointers to assign, you might have to do something else (like write them into constant memory and read from that).

A less elegant, but more manageable way to do things is to work with indices rather than pointers. It winds up looking and feeling a lot like Fortran 77, but it works and can be considerably faster because you loose one level of pointer indirection in the GPU MMU. Or maybe wait for Fermi, because it will probably fix it (apparently memory assignment in GPU code will be supported via the C++ new operator).

I’m also facing this problem, and I can’t seem to code it right. Avidday, could you elaborate more? How would I allocate the storage to the pointers then assign their values?

This is a little piece of code that keeps giving me “bus error” whenever I run it. I’m guessing the problem is like what you said, dereferencing a device pointer.

cudaSafeCall(cudaMalloc((void **)&intermediate_values, NthreadsPerBlock*NblocksPerGrid*sizeof(interValues)));

for(int i=0; i<NthreadsPerBlock*NblocksPerGrid; i++)

	cudaSafeCall(cudaMalloc((void **)&intermediate_values[i].threadData, (NData/NthreadsPerBlock/NblocksPerGrid)*sizeof(tuple)));

intermediate_values should be an array of *interValues which is in turn an array of *tuples.

Any help please?