Passing pointer from C++ into cuda host code and copying results to that pointer am I crazy??


I am stumped. I have code that works most of the time but I am afraid I have made some assumptions that are totally wrong.

I have a specific question about memory allocation.

I have c++ code and a cuda interface class that is supposed to pass my gpu stuff back and forth. It works most of the time. But I run into trouble here:

I allocate memory on the c++ side with a statement like:

pH_partVolResult = (float4*)calloc((umaxAndVmax[0]*umaxAndVmax[1]), sizeof(float4));

Then I pass this to my cuda interface class which calls something like:

extern "C" float cuGpuFunction(float4* pH_partVolResult);

Then I do some stuff on the host side of the gpu and in a kernel that goes like this:

k_castRaysIntoBlankLinearMemWBisection<<<dimGrid, dimBlock>>>(d_gridB, myPlane, scale, d_blankTopSurface, volDimensions, d_blankVolume);;

Then I copy the result to my c++ allocated memory pointer that I passed in in the beginning like:

cudaMemcpy(pH_blankVolResult, d_blankTopSurface, gridSize*sizeof(float4), cudaMemcpyDeviceToHost)

This works most of the time but… sometimes it crashes.

I am asking you guys about how I am doing this… is this the right approach or am I missing something?



Hope you are releasing GPU memory in your GPU class destructors. Keep proper error checking on each CUDA API call to see what’s going wrong.