Problem with NPP Signal Statistical Functions

When I use NPP signal statistical functions (such as nppMinMax_32f, nppSum_32f), the library runs the code without error, but the output values do not change. For example, in the following code, although the ‘pSrc’ memory is set to zero, the ‘Sum’ variable stays 1 after summing over ‘pSrc’.
I’m running this code in Visual C++ 2010 under Windows 7 x64 and NPP library version 5. Could you please tell me what is wrong?

int		nLength = 100 ;
Npp32f *	pSrc ;
NppStatus	nError ;

// Get Memory
cudaMalloc( ( void ** ) & pSrc , nLength * sizeof( float ) ) ;
nError = nppsZero_32f( pSrc , nLength ) ;

// Compute the appropriate size of the scratch-memory buffer
int nBufferSize;
nppsSumGetBufferSize_32f(nLength, &nBufferSize);
Npp8u * pDeviceBuffer;
cudaMalloc((void **)& pDeviceBuffer, nBufferSize);

// Call the primitive with the scratch buffer
Npp32f	Sum = 1.0f ;
nError = nppsSum_32f(pSrc, nLength, & Sum, pDeviceBuffer);

What happens if you add error checking to all CUDA API calls? I am not familiar with NPP. I note that the code passes a pointer to host memory to nppsSum_32f(). Is this as intended, or is NPP expecting a pointer to a device-side variable there?

I have tried your test case, the result is the same as yours. And I tried ippsSum_32f from IPP, it produces a sum 0. It seems a bug of NPP signal, could you please file a bug?

The input pointers are from device memory, but the output pointer must be to host memory. I added error checking to all commands but no error is returned.

It seems to be a bug related to all npps statistical functions. Could you tell me where can I file it?

To file bugs via the registered developer website, please go to

If you scroll halfway down the page, you will see

Members of the CUDA Registered Developer Program can report issues and file bugs
Login or Join Today

If you are not a registered developer yet, your registration reguest should normally be handled within one business day (please send me a message through this forum if you experience undue delay).

Thank you, I did it.

Thank you for your help.

My mistake. The output pointers must be to device memory.

Hi RaminHalavati,

I have seen the bug you filed. Let us discuss it by mail.

Best regards!