Why no support for floating point convolution in NPP?

I’ve worked with image processing in CUDA for about 2.5 years now and I’ve always written my own functions. Today I started looking at NPP but I couldn’t find any function for 2D convolution of float valued images, I could only find support for 8 bit images, why is that?

I also want to see support for (non-separable) 3D and 4D convolution, so far I’ve implemented this myself.

Hi and thanks for giving NPP a try.

There is no reason for not having floating-point convolutions other than that we haven’t gotten around to implementing them yet. We are currently improving the breadth of data types and color channel configurations supported by NPP. Our plan is to do this in a “bottom-up” fashion, starting with the most basic primitives for data movement and arithmetic. As those areas get completed, we intend move on to image statistics and filtering.

At this point we have no plans to support 3D and 4D convolutions. We think of NPP as covering signal processing (1D) and image processing (2D). I don’t see how 3D and 4D convolutions would fit into either of those areas. If you have ideas around generalized 3D volume processing (and whatever the 4D case could be), I’d love to hear them.

3D and 4D image processing is just a generalization of the 2D case, 4D means time resolved volume data. In the medical imaging domain it is common to work with 3D and 4D data, 4D data can for example be generated with magnetic resonance imaging (MRI), computed tomography (CT) and ultrasound (US).

Here is an example of 4D CT data of a beating heart, to the left is the original data which is extremely noisy, to the right is the denoised data. The resolution of the CT data is 512 x 512 x 445 x 20. Me and my supervisors have developed the world’s first true 4D image denoising algorithm, instead of doing denoising for one volume at the time (3D) our algorithm uses ALL the 4 dimensions to do the denoising. We first estimate the local structure tensor, a 4 x 4 matrix, in each time voxel and then use the tensor to control a set of 11 enhancement filters. The spatial support of the 11 enhancement filters is 11 x 11 x 11 x 11 time voxels, to apply these filters to our 4D CT dataset requires about 375 000 billion multiplications.


375000 billion ops… Ooops… My heart just skipped a beat…Think its time for a CT scan… ;-)

I’ve started looking at denoising of 5D data (yes it is possible to collect 5D data) and then the number of multiplications increase to 119 million billion, to store the filter responses would require 7466 GB of memory…

3 spatial dimensions, 1 temporal dimension, so 4D. (A 3D movie). But what would the 5th dimension be for?


3 spatial dimensions and TWO temporal dimensions, one for the breathing rhythm and one for the heart rhythm (but there is only ONE real temporal dimension).

Hmm…but thats not a ‘real’ dimension then - the 5th one isnt ‘orthogonal’ to the rest - just a copy of the 4th… Let me put the question another way:

The 1st dimension shows how the image changes in x axis,

The 2nd dimension shows how the image changes in y axis,

The 3rd dimension shows how the image changes in z axis,

The 4th dimension shows how the image changes in time axis,

The 5th dimension shows how the image changes in _________?


This definitely sounds very interesting! I would love to hear more. Kindly drop me an email (address mentioned in my signature below)

The 4th dimension shows how the heart changes with time (the lungs are fixed),

The 5th dimension shows how the lungs change with time (the heart is fixed)

I’m also looking for convolution of fp images. I’ve written my own code, but I’d love to use an implementation where someone has put in the effort to extract all possible performance, make it portable, etc. Is anyone aware of another library that provides fp filtering?

Here is an example of 4D convolution on the GPU,