nppiFilterRow_32f_C1R gives an incorrect result on the border

I am using nppiFilterRow_32f_C1R to perform convolution , but I get an incorrect result on the border.For simplicity, I have written an example for one line of data. The string contains 10 elements (1.f) and padding (0.f). The kernel consists of 5 elements (1.f). All functions returns NPP_NO_ERROR .

const int input_size=14;
const int output_size=10;
const int kernel_size=5;
int input_size_in_bytes=input_sizesizeof(float);
int output_size_in_bytes=output_size
sizeof(float);
int kernel_size_in_bytes=kernel_size*sizeof(float);

float host_input ={0.f, 0.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 0.f, 0.f}; //with zero padding
float host_kernel ={1.f, 1.f, 1.f, 1.f, 1.f};
float host_output[output_size];

float *dev_input,*dev_output,*dev_kernel;
cudaMalloc(&dev_input,input_size_in_bytes);
cudaMalloc(&dev_output,output_size_in_bytes);
cudaMalloc(&dev_kernel,kernel_size_in_bytes);

//Copy data to device
cudaMemcpy2D(dev_input, input_size_in_bytes, host_input, input_size_in_bytes,
input_size_in_bytes,1,cudaMemcpyHostToDevice);
//Copy kernel to device
cudaMemcpy2D(dev_kernel, kernel_size_in_bytes, host_kernel, kernel_size_in_bytes,
kernel_size_in_bytes,1,cudaMemcpyHostToDevice);

//Filter
int xanchor=kernel_size-1;
NppiSize roi;
roi.width=output_size;
roi.height=1;
nppiFilterRow_32f_C1R(dev_input,input_size_in_bytes,dev_output,output_size_in_bytes,roi,dev_kernel,kernel_size,xanchor);

//Copy result to host
cudaMemcpy2D(host_output, output_size_in_bytes, dev_output, output_size_in_bytes,
output_size_in_bytes,1,cudaMemcpyDeviceToHost);

Thus, at the output I expect (3,4,5,5,5,5,5,5,4,3} but I get (3,4,5,5,5,5,5,5,5,5}

The function was tested using the toolkit version 10.1, 10.2, 11.2. Operating system w10 and ubuntu 20.04

@mtar – thank you for diving into this.

Quick questions –

#1 what happens if you extend the input array further to the right with 0’s (input_size = 15, input_size = 16, etc), but keep the output_size the same?

#2 what happens if you pre-initialize host_output to some impossible value, like -1.0?

#3 have you tried other kernel sizes and do they provide similar behavior on the right edge?

Hi,
#1 I tried expanding the input array further to the right with 0’s ( `input_size = 15,16,17,18) but get the same output (3,4,5,5,5,5,5,5,5,5}

#2 The result is the same (3,4,5,5,5,5,5,5,5,5}

#3 I tried kernel_size=7 and get (5,6,7,7,7,7,7,7,7,7}

Hi mtar,

Following modification in NPP API call will give you correct results
nppiFilterRow_32f_C1R(dev_input,input_size_in_bytes,dev_output,output_size_in_bytes,roi,dev_kernel,2*kernel_size+1,xanchor);

Hope this will resolve your issue.

Thanks, it helped with this example.
Is this behavior normal or is it a bug? Can I count on this behavior in future releases npp?