I am doing some matrix computations using CUDA. I follow the usual sequence of
Copy data to device
Do computations on device
Copy data from device to host
The third step gives me error, so I cannot copy data from device back to host.
This happens only for matrix of larger size, i.e. 5000x5000. For 500x500 and 1000x1000 sizes, there is no problem.
The error code returned is 6, which is cudaErrorLaunchTimeout. The description in my driver_types.h file is as follows:
I placed a cudaGetLastError() statement right after the kernel launch and it returned cudaSuccess.
Immediately after this, there is the Cudamemcpy() statement which returns the above error.
Any idea what is happening? There is some response on this thread - http://forums.nvidia.com/index.php?showtopic=70109. But I don’t have any monitor connected to device.
Any help will be appreciated.