Calling NPP functions with GPU pointers

Anyone knows if it’s possible to call NPP functions with pointers to memory which is already located on the GPU device?

I already have a library that manages transfers to/from the GPU device and minimizes the amount of transfers. With the NPP library, it seems that the functions will work with buffers that are on the host CPU’s memory space, therefore if I have pre-processed results located on the GPU I have to transfer them to host memory, then NPP will transfer the data back to the GPU again. Rather inefficient…

Yes, not only can you pass device pointers to the NPP functions, but you must for them to work! All pointers in the NPP API are assumed to be device pointers, unless explicitly stated otherwise in the documentation.

This principle was adopted so developers are in charge of the memory transfers. BTW this is also how the other NVIDIA compute middle-ware libraries (cuBLAS, cuFFT) work.