NPP filter row result shifted


I have been trying to use NPP filter to perform convolution but my results are shifted. I would understand a shift by some whole number of data-points but it seems the results are “just a bit off”. (Since new members can submit only one embedded media I put links for the other plots.)

I decided to use a minimal convolutional kernel for the validation with only two elements (0.5 and 1) and I am using Matlab as my reference solution. I am using nppiFilterRow_32f_C1R function, the input data are stored in the memory in column-major order. (Fast changing dimension represents different signals, the second dimension represents time samples). From this, I assume that I want to filter by rows. I have tried the nppiFilterColumn_32f_C1R but the output is more distorted.

As for the input arguments of the function:

  NppiSize oRoi {samples_count, signals_count};

  NppStatus status = nppiFilterRow_32f_C1R(
    samples_count * sizeof(float),
    samples_count * sizeof(float),
    kernel_data, // 1.0, 0.5
    kernel_size,  // 2

Usually, the result of the convolution is of length samples_count + kernel_size - 1. So my configuration could be wrong by 1, but this could cause an issue in the overlap of the output signals. I am currently looking only at the first signal and it should be fine.

As I understand from the documentation, the anchor should be a shift in the frame of reference. I have used a 0 but this provided distorted results.

Then I used 1 (which probably should be kernel_size - 1 which provided the correct shape of the result, but shifted against the expected solution..
I have tried to move the temporal axis by various offsets to align the reference and the filtered signal, but the data does not align.

The -1 offset is close (I suspect maybe the result is shifted by the anchor somehow?), but the absolute error is still too high for me to use the NPPs function.

Is it possible that I do not understand the interface of the function, or I am missing something? Or is the shift somewhat desired behaviour in image processing?

I think the ROI describes the data layout, the width and the height of the “image” in case of my signals width - length of the signals(samples_count) and height - number of the signals. For the lineStep i am using the number of length of the signals. I can imagine it is rather the offset between two data-lines, but this would be 1 (or rather sizeof(float)) and this would violate the condition for ROI (widthROI * PixelSize) > nLinStep , so I presume the stride is taken from the ROI somehow.

I have tried a bunch of different configurations, but this is as far as I can get to the reference solution.
I will welcome any tips on how to make it work. For now, I have coded up my own filter providing the expected results, but I would still like to know what is going on and why is my result off.