Could someone give one code example - how to use nppiSqrDistanceValid_Norm_8u32f_C1R_Ctx?

could someone give one code example - how to use nppiSqrDistanceValid_Norm_8u32f_C1R_Ctx (c++ & windows 10 & cuda 10.2)?
The function was running so slow with default stream, since created multipe streams on multiple threads.

Hi Jemma,
In CUDA Toolkit 10.2 - NPP support only default stream, it has a issue with multiple streams handling.
Above issue of multiple streams is fixed in the CUDA Toolkit 11.2. I will recommend to upgrade CUDA Toolkit.

I’ve attached sample example for nppiSqrDistanceValid_Norm_8u32f_C1R_Ctx API use case. SampleTestNPP.7z (3.9 KB)

ty, I will download CUDA Toolkit 11.2 and try to build it.