Crash with cudaMemcpy3D

My application crashes during device to host memcopy3D when I have some OpenGL textures also in memory. So, does it have to do with how much texture memory has been used? How can I find out how much of it is left on my GPU? I can find out the global mem stats.

OS Version: Mac OS X 10.5.6 (9G55)
Report Version: 6

Exception Codes: KERN_INVALID_ADDRESS at 0x000000002bce80c8
Crashed Thread: 0

Thread 0 Crashed:
0 libSystem.B.dylib 0xffff07c7 __memcpy + 39 (cpu_capabilities.h:246)
1 libcuda.dylib 0x008d178a cuTexRefGetFlags + 121850
2 libcuda.dylib 0x008bf2e1 cuTexRefGetFlags + 46929
3 libcuda.dylib 0x008b137a cuMemcpy3D + 138
4 libcudart.dylib 0x00899bb3 __cudaRegisterFunction + 87907
5 libcudart.dylib 0x0087e772 cudaMemcpy3D + 146


It most likely has to do with a bug in the way you set up memcpy3D(). The error you’re recieving, SIGSEGV, is a segmentation fault, meaning you’re trying to access a memory location on the host not assigned to your program. If your textures were at play, you’d be getting an out of memory error from CUDA.

I would suspect so, but the same setup works just great for a console (no OpenGL) app I had before. Further, in my current app it works for host array of size 128^3 (which also happens to be the GPU array size), but the moment my host array size increases (GPU array size still the same) I get this crash even though I copy within the first 128^3 region :(

Here is my copy3D setup:

void copy3DMemToHost(cudaPitchedPtr _src, float *_dst, cudaExtent copy_extent, cudaExtent dst_extent, cudaPos src_offset, cudaPos dst_offset)


  cudaMemcpy3DParms copyParams = {0};

  copyParams.srcPtr = _src;

  float *h_target = _dst + dst_offset.x + dst_offset.y*dst_extent.width + dst_offset.z*dst_extent.width*dst_extent.height;//For some reason, using copyParams.dstPos doesn't give correct results, so we set the offset here.

  fprintf(stderr, "Target mem location on host: %p\n", h_target);

  copyParams.dstPtr = make_cudaPitchedPtr((void*)h_target, dst_extent.width*sizeof(float), dst_extent.width, dst_extent.height);

  copyParams.kind = cudaMemcpyDeviceToHost;

  copyParams.extent = make_cudaExtent(copy_extent.width*sizeof(float), copy_extent.height, copy_extent.depth);

  copyParams.srcPos = make_cudaPos(src_offset.x*sizeof(float), src_offset.y, src_offset.z); // We want to copy copy_extent sized volume starting at (x_off, y_off, z_off).


  CUT_CHECK_ERROR("Mem -> Host Memcpy failed\n");


Furthermore, If I reduce the copyextent to say 100^3,

-sometimes I get no error

-at times I get error in line 152 : unspecified launch failure which is: ‘CUDA_SAFE_CALL(cudaMemcpy3D(&copyParams));’

-Also sometimes, the operation is performed in one part of my program but fails in another and the call trace goes only up till cuTexRefGetFlags.

I also do not try to think it coming out of the texture mem since I’m copying from global mem -> host mem. I have no idea what s causing it.


Many thanks Mr_Nuke,

I was indeed accessing host mem outside the bounds.

I get it working now :)