nppiFilterRow_32f_C1R gives an incorrect result on the border

mtar · March 16, 2021, 9:46am

I am using nppiFilterRow_32f_C1R to perform convolution , but I get an incorrect result on the border.For simplicity, I have written an example for one line of data. The string contains 10 elements (1.f) and padding (0.f). The kernel consists of 5 elements (1.f). All functions returns NPP_NO_ERROR .

const int input_size=14;
const int output_size=10;
const int kernel_size=5;
int input_size_in_bytes=input_sizesizeof(float);
int output_size_in_bytes=output_sizesizeof(float);
int kernel_size_in_bytes=kernel_size*sizeof(float);

float host_input ={0.f, 0.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 0.f, 0.f}; //with zero padding
float host_kernel ={1.f, 1.f, 1.f, 1.f, 1.f};
float host_output[output_size];

float *dev_input,*dev_output,*dev_kernel;
cudaMalloc(&dev_input,input_size_in_bytes);
cudaMalloc(&dev_output,output_size_in_bytes);
cudaMalloc(&dev_kernel,kernel_size_in_bytes);

//Copy data to device
cudaMemcpy2D(dev_input, input_size_in_bytes, host_input, input_size_in_bytes,
input_size_in_bytes,1,cudaMemcpyHostToDevice);
//Copy kernel to device
cudaMemcpy2D(dev_kernel, kernel_size_in_bytes, host_kernel, kernel_size_in_bytes,
kernel_size_in_bytes,1,cudaMemcpyHostToDevice);

//Filter
int xanchor=kernel_size-1;
NppiSize roi;
roi.width=output_size;
roi.height=1;
nppiFilterRow_32f_C1R(dev_input,input_size_in_bytes,dev_output,output_size_in_bytes,roi,dev_kernel,kernel_size,xanchor);

//Copy result to host
cudaMemcpy2D(host_output, output_size_in_bytes, dev_output, output_size_in_bytes,
output_size_in_bytes,1,cudaMemcpyDeviceToHost);

Thus, at the output I expect (3,4,5,5,5,5,5,5,4,3} but I get (3,4,5,5,5,5,5,5,5,5}

The function was tested using the toolkit version 10.1, 10.2, 11.2. Operating system w10 and ubuntu 20.04

cruss · March 16, 2021, 5:37pm

@mtar – thank you for diving into this.

Quick questions –

#1 what happens if you extend the input array further to the right with 0’s (input_size = 15, input_size = 16, etc), but keep the output_size the same?

#2 what happens if you pre-initialize host_output to some impossible value, like -1.0?

#3 have you tried other kernel sizes and do they provide similar behavior on the right edge?

mtar · March 17, 2021, 8:44am

Hi,
#1 I tried expanding the input array further to the right with 0’s ( `input_size = 15,16,17,18) but get the same output (3,4,5,5,5,5,5,5,5,5}

#2 The result is the same (3,4,5,5,5,5,5,5,5,5}

#3 I tried kernel_size=7 and get (5,6,7,7,7,7,7,7,7,7}

mkhadatare · March 19, 2021, 7:58am

Hi mtar,

Following modification in NPP API call will give you correct results
nppiFilterRow_32f_C1R(dev_input,input_size_in_bytes,dev_output,output_size_in_bytes,roi,dev_kernel,2*kernel_size+1,xanchor);

Hope this will resolve your issue.

mtar · March 22, 2021, 9:26am

Thanks, it helped with this example.
Is this behavior normal or is it a bug? Can I count on this behavior in future releases npp?

Topic		Replies	Views
NPP filter row result shifted GPU-Accelerated Libraries	0	414	February 26, 2021
A critical problem with nppiFilter CUDA Programming and Performance	6	7451	February 21, 2013
NPP: nppiCopyConstBorder_8u_C1R() bug? CUDA Programming and Performance	2	4602	July 27, 2010
CUDA 4.0 NPP VS2008 FilterMax and Dilate CUDA Programming and Performance	3	1088	June 14, 2011
NPP TEXTURE BIND ERROR NppiFilter_8u_C1R error CUDA Programming and Performance	2	4340	August 18, 2011
nppiFilterHoughLine_8u32f_C1R returns NPP_CUDA_KERNEL_EXECUTION_ERROR GPU-Accelerated Libraries npp	3	618	October 7, 2022
NPP_TEXTURE_BIND_ERROR error with Canny edge detector.. CUDA Programming and Performance	14	11038	August 23, 2011
(Not really a) Problem with nppiFilterRow function with kernel that has negative values GPU-Accelerated Libraries	2	443	August 22, 2019
nppiFilter_8u_C1R : Error -24 NPP_TEXTURE_BIND_ERROR CUDA Programming and Performance	1	3641	October 28, 2011
nppiFilter functions return NPP_CUDA_KERNEL_EXECUTION_ERROR and cause CPU memory leak GPU-Accelerated Libraries	2	1391	October 5, 2016

nppiFilterRow_32f_C1R gives an incorrect result on the border

Related topics