Unknown Error

I get an “unknown error” on any CUDA code line following my kernel function call. For example, if the first CUDA code following my kernel call is:

CUT_CHECK_ERROR("Kernel execution failed");

I get the error:

“Cuda error: Kernel execution failed in file ‘template.cu’ in line 124 : unknown error.”

If I instead replace the CUT_CHECK_ERROR with:

CUDA_SAFE_CALL(cudaMemcpy(h_num_c, d_num_c, sizeof(unsigned int), cudaMemcpyDeviceToHost));

I get a similar error:

“Cuda error in file ‘template.cu’ in line 132 : unknown error.”

My code is attempting to launch the kernel with 118 blocks with 256 threads each. I have used cubin to make sure that I am not exceeding shared memory or the amount of registers. (smem = 40, reg = 8)

DeviceEmu gives me the result I expect, so I do not believe there is some problem in the kernel code (such as a hang or indexing an array out of bounds).

Has anyone else seen a similar “unknown error”?

Thanks.

DeviceEmu unless combined with valgrind is useless to check for out-of-bounds access. On the CPU most out-of-bound accesses does not lead to a visible error.

Thanks for the information Reimar.

This error has returned for me. I was getting an error with the 177.84 driver that told me my kernel execution had timed out and was terminated, so I tried installing the latest driver for 2.0 (178.08). Now I again get the “Unknown Error”: Cuda error: Kernel execution failed in file ‘template.cu’ in line 76 : unknown error. (line 76 is the kernel launch line) The screen also flickers when this occurs and then goes back to normal (this is when I see the error printed).

From the .cubin file that I have generated I see no reason why the kernel launch should fail. In the above code, the number of blocks is 1101 and block size is 128. Shared memory usage is 3240 per block and only 13 registers per block are used (my GPU is an 8800 GTS).

Also, the amount of memory for the verticies passed to the device is roughly 35 MB, hardly enough to cause a problem. Moreover, I normally see a memory exceeded error if this was truly the problem.

My kernel is a graph coloring algorithm and I do not get the error for all inputs to my kernel. There are some graphs that I can input without issue. (DebugEmu and ReleaseEmu, as mentioned, also always yield an appropriate result.)

Is there anything I may be missing regarding why my kernel is failing to launch or timing out during execution? What are some clues I should be looking for to rootcause the issue?

Thanks for any help.

Hai,
I am having an issue in my cuda programming. When i run my code the output window closes with unknown error at the line where i did memcpy. But when i run in nsight-> start cuda debugging it works. Can any one help me please?

when I rollback to previous driver: 9/1/2018

cudaErrorInsufficientDriver(35)