I am trying to embed nppiFilterMedian_16u_C1R into my real-time processing pipeline, but it works too slow for me. I am using CUDA 10.0 on Jetson Nano.
Where I can find more information about the implementation of nppiFilterMedian_16u_C1R? In particular, I have two questions:
- What algorithm is used to find median values? Is it the most efficient algorithm possible on CUDA?
- How the border case is handled? Is the input matrix padded with zeros?