I have just started working on a quadratic filter to be applied to a 16 bits/channel image.
Currently, all I am interested in is to apply the filter definition without resorting to svd or fft.
I successively launch my kernel 3 times; one for each channel.
I am not using shared memory, just global and locals.
When I set the filter dimension (the size of the square around the pixel in question, whose values are used in the computation) beyond a certain value, first launch (for red channel) does not give any errors but does not return anything either. The other two launches (green and blue) give the “the launch timed out and was terminated” error.
Is there a built-in watchdog timer mechanism in cuda that is preventing proper execution and completion of my first launch ?
Suggestions are welcome and appreciated.