NPP expansion Limitations in the NPP methods

I’m porting a pretty heavy IP library to NPP and was wondering about strategy. In manner of frank disclosure, I’m new to CUDA and NPP but have much shader and IP experience.

I was looking into the NPP API and it became obvious pretty early on that I will have to implement many methods in CUDA and unless there be some way to extend NPP I will have to do this in a 3rd library, is there some doc describing how to extend NPP? is this common practice? Is there any way to contact NVidia and ask for additions to the API?

To clarify, for instance (there are many others) I need to implemnt a isPositive method (which transforms an image buffer into a bit mask that is 1 whenever the image is positive), this can be done for instance with the threshold and division methods but not by any 1 method multiplying actual run time by 2.

You might want to look at CUVI Lib which is an add-on library for NPP. What we actually did was to use the NPP framework functions and write our own functions. The complete list of functions that are not available in NPP are:

* Optical Flow (Horn & Schunk)
* Optical Flow (Lucan & Kanade)
* Hough Transform
* Hough Lines
* 2D Discrete Wavelet Transform (Haar)
* 2D Discrete Cosine Transform
* RGB2Gray Color Conversion

The library is essentially 3rd party with NPP DLLs redistributed with the release. It’s freeware for the time being so you can try it out anytime.

Thanks for the response, I actually went ahead and implemented the whole library in CUDA (NPP was missing in addition to actual algorithms a number of input options I could not do without and which forced me to call 10 functions instead of one for converting to and from the inputs/ outputs ) , which was ok, but about NPP, is there any clear document describing how NPP was implemented over CUDA or what performance hits/advantages using NPP entails?