Intermittent Kernel Failures

sepia.latimanus · August 15, 2011, 7:18pm

Hi all,

I have a code that I’m pretty sure is working correctly except for that CUDA 4/GTX bug where consistent reading from memory causes random errors (The Official NVIDIA Forums | NVIDIA) and that occasionally one of my main kernel calls will fail. This always occurs when I increase my grid size considerably (which leads to more blocks, though always far below hardware limits) but if I massage the program (recompiling a few times just commenting out some error checking functions) it will eventually run normally, and correctly. (the kernel error will break the program when cudaMemcpy tries to perform later on).

Has anyone ever encountered a problem like this? Local memory limits should only apply to single blocks I think and the program runs fine with small numbers of blocks with 128 threads (the ratio I use.) Cold reboots don’t seem to affect the problem.

Thanks!

S

Topic		Replies	Views
Getting around apparent CUDA bugs CUDA Programming and Performance	5	1024	September 20, 2011
Kernels fail to launch after a certain blockDim.x CUDA Programming and Performance	2	963	January 6, 2012
Kernel randomly fails to launch after several thousand successful launches CUDA Programming and Performance	4	2597	September 25, 2009
random memory errors when kernel writes data CUDA Programming and Performance	9	2382	June 28, 2012
No error for exceeding thread/grid size? CUDA Programming and Performance	0	5231	August 9, 2007
stranges kernel crashes on (apparently) trivial operations... CUDA Programming and Performance	5	2519	July 23, 2009
kernel failed after few invokation CUDA Programming and Performance	9	7855	October 30, 2010
CUDA and hardware reboot execution problem leading to reboot CUDA Programming and Performance	2	5331	August 3, 2007
Kernel failing on repeated invocation (thousends of times) Either I did something really stupid ... CUDA Programming and Performance	1	857	April 23, 2009
kernel execution fail - because of memory ? function memory CUDA Programming and Performance	1	4434	December 30, 2009

Intermittent Kernel Failures

Related topics