i have a CUDA app that goes something like this
- create variables
- make global variables and cudaMalloc some space for them
- take in a input matrix
- copy input matrix to a matrix in global space (using cudaMallocPitch)
- KERNEL: do some functionality on the global matrix
- copy global matrix back to local matrix (using cudaMemcpy)
- print solution using local matrix
the problem is that it freezes/hangs when it tries to do this line:
6) copy global matrix back to local matrix (using cudaMemcpy)
but if i comment out the line that executes the kernel
5) KERNEL: do some functionality on the global matrix
then the cudaMemcpy works fine
in one line: why is the use of my kernel making the host freeze when i try and do a memcpy ( after the kernel has finished)
it hangs as if waiting for some memory to be freed up
is it possible that something in my kernel is holding on to a resource even after the kernel has finished executing?