npp library function argument *pDeviceBuffer

I notice some npp function has an argument *pDeviceBuffer. What is the purpose of the argument, how shall I set it while using the functions. Also, the results of functions,such as nppsMax_32f, are written back to a pointer. Is the memory on host or device memory? Thank you.

I notice some npp function has an argument *pDeviceBuffer. What is the purpose of the argument, how shall I set it while using the functions. Also, the results of functions,such as nppsMax_32f, are written back to a pointer. Is the memory on host or device memory? Thank you.

Thanks for brining this up. It made me realize that our documentation is misleading and incomplete for those functions.

The result of those reduction funcions is returned to a device-pointer location. Generally speaking, unless explicitly noted otherwise, any pointers in the NPP API are device pointers.

To your question about the pDeviceBuffer: What you need to pass here is essentially “scratch memory” that the primitive needs in order to do its work. The matching nppsReductionGetBufferSize_XXX function tells you how big this buffer needs to be. The buffer size is always returned in bytes.

The advantages of this mechanism are:

[list=1]

The primitive doesn’t have to internally allocate memory which is beneficial in a multi-threaded environment.

If the primitive is used repeatedly, the externally allocated scratch buffer can be reused, eliminating the overhead of repeated allocations and deallocations.

Since the scratch buffer is unstructured different primitives can share the same scratch buffer, as long as it is big enough.

The nppsReductionGetBufferSize_XXX function is actually a good example of a function that uses host pointers. The buffer size is returned via a host pointer. This makes much sense, since the scratch-buffer allocation would be performance via a cudaMalloc from the host.

Thanks for brining this up. It made me realize that our documentation is misleading and incomplete for those functions.

The result of those reduction funcions is returned to a device-pointer location. Generally speaking, unless explicitly noted otherwise, any pointers in the NPP API are device pointers.

To your question about the pDeviceBuffer: What you need to pass here is essentially “scratch memory” that the primitive needs in order to do its work. The matching nppsReductionGetBufferSize_XXX function tells you how big this buffer needs to be. The buffer size is always returned in bytes.

The advantages of this mechanism are:

[list=1]

The primitive doesn’t have to internally allocate memory which is beneficial in a multi-threaded environment.

If the primitive is used repeatedly, the externally allocated scratch buffer can be reused, eliminating the overhead of repeated allocations and deallocations.

Since the scratch buffer is unstructured different primitives can share the same scratch buffer, as long as it is big enough.

The nppsReductionGetBufferSize_XXX function is actually a good example of a function that uses host pointers. The buffer size is returned via a host pointer. This makes much sense, since the scratch-buffer allocation would be performance via a cudaMalloc from the host.

On a general note, I think it would be helpful if a cross-reference link could be provided for questions that are cross-posted to Stackoverflow in addition to being posted here. Here is the cross reference for this discussion:

On a general note, I think it would be helpful if a cross-reference link could be provided for questions that are cross-posted to Stackoverflow in addition to being posted here. Here is the cross reference for this discussion: