nppiFilterMedian memory out of bounds with y-axis mask

I’m having trouble using the Nppi function, nppiFilterMedian_32f_C1R.

I have been reading the docs on the roi, mask, anchor parameters (here].

What I’m trying to accomplish is to median filter a 2D array with a 1D rectangular mask, in both the x-axis and y-axis directions, e.g.:

[0  0  0] 0   # x-axis 1d rectangular median filter
 0  0  0  0 
 0  0  0  0
 0  0  0  0

[0]  0  0  0  # y-axis 1d rectangular median filter
[0]  0  0  0
[0]  0  0  0
 0   0  0  0

Everything I’ve understood about mask, anchor, roi, and start pixel offset, make sense in the x-axis case. In all cases my test matrix is 9x9, and the mask is 3x1 in the x-axis or 1x3 in the y-axis.

For the x-axis case, I tested 3 scenarios:

  1. Mask extends forward by filter_len, anchored at (0, 0); subtract filter_len from roi.width to avoid overflow at the end of the matrix
  2. Mask extends left and right by filter_len/2, anchored at (filter_len/2, 0); subtract filter_len from roi.width, start with filter_len/2 pixel offset since first pixel will be at (0-filter_len/2, 0)
  3. Mask extends backwards by filter_len, anchored at (filter_len, 0); subtract filter_len from roi.width, start with filter_len pixel offset since first pixel will be at (0-filter_len,0)

I have created a minimum viable code example in a Github gist (here: https://gist.github.com/sevagh/0e0a62faac268cf780a07db66fb88000) for the x-axis case.

My test matrix has a row of “5” in the middle, and a column of “8” in the middle. A correctly working x-axis 1D rectangular filter should preserve the row of 5, and eliminate the column of 8.

The results in the x-axis are as expected, and cuda-memcheck passes:

========= CUDA-MEMCHECK
x-axis median filter extending rightward
before
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
5 5 5 5 8 5 5 5 5
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
after
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
5 5 5 5 5 5 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0

x-axis median filter extending left and right
after
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 5 5 5 5 5 5 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0

x-axis median filter extending leftward
after
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 5 5 5 5 5 5
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0

========= ERROR SUMMARY: 0 errors

These results make sense - the rightward-facing mask with a limited roi leaves 0s at the end of the matrix - the middle leaves 0s at the beginning and end - the leftward-facing mask starts with 0s in the beginning.

Now, switching to the y-axis, the only case that passes cuda-memcheck is anchored at (0,0). Here is the code example for the y-axis case: https://gist.github.com/sevagh/04dd66b9ad5e205f019c05f6097ed680

I tested 3 similar scenarios to the x-axis case:

  1. Mask extends downward by filter_len, anchored at (0, 0); subtract filter_len from roi.height to avoid overflow at the end of the matrix - nb! this is the only one that succeeds cuda-memcheck
  2. Mask extends down and up by filter_len/2, anchored at (0, filter_len/2); subtract filter_len from roi.height, start with (filter_len/2 * nstep) pixel offset since first pixel will be at (0, 0-filter_len/2) - fails cuda-memcheck
  3. Mask extends upwards by filter_len, anchored at (0, filter_len); subtract filter_len from roi.height, start with filter_len pixel offset since first pixel will be at (0,0-filter_len) - fails cuda-memcheck

The results are as follows. The downward filter’s result is plausibly correct, but the middle is way off, and upwards doesn’t work at all. However, since cuda-memcheck fails, I’m concerned that I’ve understood something completely wrong about the roi/mask/anchor parameters.

y-axis median filter extending downwards
before
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
5 5 5 5 8 5 5 5 5
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
after
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0

y-axis median filter extending down and up
after
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0
0 0 0 0 8 0 0 0 0

y-axis median filter extending upwards
after
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0

I appreciate any and all help. Thanks.

I figured it out with trial and error. I was obeying the following from the documentation:

In practice this means that for an image ( pSrc , nSrcStep ) and the start-pixel of the ROI being at location (x, y), one would pass
pSrcOffset = pSrc + y * nSrcStep + x * PixelSize;

I was using yStartPixelOffset = y*nSrcStep, where nSrcStep = width*sizeof(Npp32f) - yStartPixelOffset = y*width*sizeof(Npp32f), and getting the above incorrect behavior.

However, using yStartPixelOffset = y*width, or alternatively, yStartPixelOffset = y*nSrcStep/sizeof(Npp32f), is working correctly.