cudaMemcpy issue

i have a CUDA app that goes something like this

  1. create variables
  2. make global variables and cudaMalloc some space for them
  3. take in a input matrix
  4. copy input matrix to a matrix in global space (using cudaMallocPitch)
  5. KERNEL: do some functionality on the global matrix
  6. copy global matrix back to local matrix (using cudaMemcpy)
  7. print solution using local matrix

the problem is that it freezes/hangs when it tries to do this line:
6) copy global matrix back to local matrix (using cudaMemcpy)

but if i comment out the line that executes the kernel
5) KERNEL: do some functionality on the global matrix
then the cudaMemcpy works fine

in one line: why is the use of my kernel making the host freeze when i try and do a memcpy ( after the kernel has finished)

it hangs as if waiting for some memory to be freed up


is it possible that something in my kernel is holding on to a resource even after the kernel has finished executing?

There most likely is a problem inside the kernel and the exception only pops up when you try to do a memcpy as there is an a barrier before the copy.
Add a CUT_CHECK_ERROR(“Kernel execution failed ”); between 5 and 6 to see if this is the case.

thanks for the tidbit… i knew nothing about that function

still it didnt yield for me

to make things worse, it seems to be semi-random

that is it happens when i feed the program big matrixes (6000x3000), but sometimes it doesnt happen (freeze) and lets the program run normally.

it freezes on this line:

float*row = (float*)((char*)g_matrix + i * pitch);

(yes this is for a global matrix that i have cudaMallocPitch’d)

whats more is that the whole ssh locks up and i have to start another terminal. the frozen proccess uses 99% of the CPU time, and is un-killable. it times itself out after about 10 minutes