problem with nppsAddC_8u_ISfs

Hello,

I wrote the following simple code and ran it on TX2:

#define N_ELEMENTS 1000

err=cudaDriverGetVersion (&version); //version=9000
err=cudaSetDevice (0); //err=0
Npp8u *pDevice = nppsMalloc_8u (N_ELEMENTS);
Npp8u *pHost = (Npp8u *)malloc (N_ELEMENTS);
stat = nppsSet_8u (5, pDevice, N_ELEMENTS); //stat=NPP_SUCCESS
stat = nppsAddC_8u_ISfs (1, pDevice, N_ELEMENTS, 1); //stat=NPP_SUCCESS
cudaMemcpy (pHost, pDevice, N_ELEMENTS, cudaMemcpyDeviceToHost);

I expected the cells in pHost to contain: {6,6,6,6,…}
But pHost contains: {5,5,5,5,…}

Can you please explain why ?

Thank you,
Zvika

The correct result in pHost should be {3,3,3,3,…}

the operation nppsAddC_8u_ISfs adds the requested constant (1) and then multiplies the result by 2^-scale factor

scale factor is the last parameter. You are passing 1 there, so 2^-1 is like multiplying the result of the addition by 0.5

5+1 = 6
6 * 0.5 = 3

If you are getting 5, something is wrong with your setup.

If you want the result to be 6, pass 0 for the scale factor.

stat = nppsAddC_8u_ISfs (1, pDevice, N_ELEMENTS, 0);

https://docs.nvidia.com/cuda/npp/general_conventions_lb.html#integer_result_scaling