I have been facing problems with one of my CUDA kernels failing unexpectedly. The kernel is basically doing a convolution-type operator - so, no arbitrary memory accesses or locks. The same kernel runs on smaller data sets with no problems. Also, on the large data set, it runs fine for other convolution sizes. It just fails for the large data convolved with 19x1 sized kernel at one point in the program. I have checked to see if this is a memory leakage problem (which it is not).
Has anybody else faced similar issues? Any suggestions on what could be wrong?
Thanks for any assistance.