cuda kernel gets killed unexpectedly Unexplained cuda kernel failure


I have been facing problems with one of my CUDA kernels failing unexpectedly. The kernel is basically doing a convolution-type operator - so, no arbitrary memory accesses or locks. The same kernel runs on smaller data sets with no problems. Also, on the large data set, it runs fine for other convolution sizes. It just fails for the large data convolved with 19x1 sized kernel at one point in the program. I have checked to see if this is a memory leakage problem (which it is not).

Has anybody else faced similar issues? Any suggestions on what could be wrong?

Thanks for any assistance.

Too vaguely – u have described…

btw, whats that 19x1 ??

19 blocks with 1 thread per block??

If you suspect your cuda kernel is being murdered by some1, call 007 :-)

I guess I did not explain it properly.
The threads are in 256x1 blocks.
The 19x1 kernel is the size of the convolution being performed (neighbourhood size).
I suspect the kernel is being killed by some timeout signal (I am not sure of this because the GPU I am running this on is a Tesla C870 - not even attached to the display & the machine is not running X).
The same sized convolution runs in other parts of the code in <1 sec and works for smaller data sizes.
Something weird is going on…

Since it fails only for 19x1 – I suspect this is a problem with your code. How about 20x1 (i am a zero in convolution… am jus askin) and other sizes? Do they work fine?
Can you tell something about the size of your large data set?