I am stumped. I have code that works most of the time but I am afraid I have made some assumptions that are totally wrong.
I have a specific question about memory allocation.
I have c++ code and a cuda interface class that is supposed to pass my gpu stuff back and forth. It works most of the time. But I run into trouble here:
I allocate memory on the c++ side with a statement like:
pH_partVolResult = (float4*)calloc((umaxAndVmax*umaxAndVmax), sizeof(float4));
Then I pass this to my cuda interface class which calls something like:
extern "C" float cuGpuFunction(float4* pH_partVolResult);
Then I do some stuff on the host side of the gpu and in a kernel that goes like this:
k_castRaysIntoBlankLinearMemWBisection<<<dimGrid, dimBlock>>>(d_gridB, myPlane, scale, d_blankTopSurface, volDimensions, d_blankVolume);;
Then I copy the result to my c++ allocated memory pointer that I passed in in the beginning like:
cudaMemcpy(pH_blankVolResult, d_blankTopSurface, gridSize*sizeof(float4), cudaMemcpyDeviceToHost)
This works most of the time but… sometimes it crashes.
I am asking you guys about how I am doing this… is this the right approach or am I missing something?