Do you have patience to answer an novice?

I need to convolve a kernel (10x10 float ) over many 2K x 2K images (float). Is there something already in the cuBLAS or cuFFT (for cuFFT I assume I would have to convert the image and the kernel to Fourier space first) for doing this? (Let’s assume I can’t use openCV unless it is to copy the source)

Or should I roll my own along the lines of: https://www.evl.uic.edu/sjames/cs525/final.html

It might be that I can get by with a smaller kernel. Yes it is separable, For the moment our notion is that it is Gaussian, but a match kernel is probably even more to our liking. (we are looking for small star like objects in a black field).

One image processing guy suggested first creating a integral image, and then doing a box filter. What about that? (He has no idea about CUDA). Would that be cheaper than a Fourier transform?