Kernel Time Execution

Hey there,

as i just read, kernel launches are asynchronous.

Image the following:

float *A, A_dev;

// allocation and so on

// ...

kernel<<<1, 1>>>(A_dev);

HANDLE_ERROR(cudaMemcpy(A, A_dev, byteInA, cudaMemcpyDeviceToHost));

The kernel execution time is quite high, like 2 seconds. What would you expect if i call cudaMemcpy() immediately after the kernel launch, although the kernel is still running?

Could this be causing the graphics driver to crash?

What i do get in the HANDLE_ERROR function is a cudaErrorUnknown if i do the described. Or could the cudaErrorUnknown be something else, too?

I am using CUDA SDK 3.2.

Best regards and thanks, tdhd

cudaMemcpy() waits until previous kernel calls are finished before performing the memory copy to ensure you get the finished result. Unfortunately cudaErrorUnknown is not very descriptive (definitely wish that CUDA would improve this) so some guess-and-check will be required to figure out what the problem is.

Two seconds is rather long for a kernel that has 1 thread and 1 block. Is there are substantial amount of work going on in the device code? Perhaps an infinite loop? Two seconds is in the ballpark of the time when the watchdog timer will terminate your kernel if you are running it on a GPU that is also rendering your GUI desktop.

I posted about that problem in another thread. My GPU is doing the calculation and the GUI rendering. To me, the display driver crashes even, my screen goes black and recovers from it after a few seconds.

That’s quite normal when your kernel is killed by the watchdog timer.