1D image convolution for fp values

I need Gaussian blurring on single-channel images with 32-bit fp representation. The “convolutionSeparable” example in the SDK is informative, but is now somewhat old (2007). The NPP library only has 1D row/column filtering for 8-bit images. Google turns up many results, convolution on GPU seems to be a popular class project, but is there a convenient/well-tested tool like the NPP that does 1D convolution filtering on floats?

I also looking for 1D convolution on GPU. Could you advice me something, please?