I implemented a simple ray tracer. Everything works fine until I increase the data size. The program works fine after several cudaMalloc, cudaMemcpy(host to device) and kernel launch. But after I call cudaMemcpy (from device to host), the whole display driver shutdown and restart, and I get a cudaErrorLaunchTimeout error.
I check cuda’s api return value every step (including kernel launch), so this error should be caused by the cudaMemcpy() call.
If I decrease my data size, this error won’t happen. But I don’t think I’m running out of memories.
Any one have an idea?