NppStreamContext usage for nppi_Ctx functions

lanzat.sergei · April 23, 2020, 9:37am

Hello,
I am trying to run nppiFunction_Ctx concurrently in streams. Namely the function nppiSqrDistanceValid_Norm_8u32f_C1R_Ctx for pattern matching of latest Cuda 10.2. The run is slow - gives the same runtime as with non-streams version nppiSqrDistanceValid_Norm_8u32f_C1R. My code is as follows:

int nstreams = 10;

Npp32f** pDst_array = new Npp32f * [nstreams];
int* pPitches_dst = new int[nstreams];

cudaStream_t* streams = new cudaStream_t[nstreams];
NppStreamContext* pNppStreamContext = new NppStreamContext[nstreams];

for (int i = 0; i < nstreams; i++)
{
cudaStreamCreate(&(streams[i]));
}

for (int i = 0; i < nstreams; i++)
{
int width = (roi_src[i]).width - (roi_pattern[i]).width + 1;
int height = (roi_src[i]).height - (roi_pattern[i]).height + 1;

nppSetStream(streams[i]);
nppGetStreamContext(&(pNppStreamContext[i]));

pDst_array[i] = nppiMalloc_32f_C1(width, height, &(pPitches_dst[i]));		

nppiSqrDistanceValid_Norm_8u32f_C1R_Ctx(d_src + (roi_src[i]).y * nSrcPitch + (roi_src[i]).x * sizeof(Npp8u),
										nSrcPitch,
										{ (roi_src[i]).width, (roi_src[i]).height },
										d_patterns_array[i],
										d_patterns_pitch[i],
										{ (roi_pattern[i]).width, (roi_pattern[i]).height },
										pDst_array[i],
										pPitches_dst[i],
										pNppStreamContext[i]);

}

What is wrong with such an implementation. And what is the correct usage of NppStreamContext in order to obtain the concurrent run of the function.

Thanks a lot in advance.

Topic		Replies	Views
NppStreamContext usage for nppi"Name"_Ctx functions CUDA Programming and Performance	0	938	April 22, 2020
How to use streams with npp APIs in CUDA Container: CUDA	0	1368	March 9, 2022
Contexts and streams with multiple threads with TensorRT, NPP and maybe NVENC TensorRT	2	667	October 12, 2021
nppiResize_8u_C1R function CUDA Programming and Performance	2	1600	May 19, 2015
NPP behaviour on CUDA streams created with `cudastreamnonblocking` GPU-Accelerated Libraries	1	536	September 14, 2019
Using nppiMean_StdDev_8u_C1R after setNppStream returns NPP_RANGE_ERROR GPU-Accelerated Libraries	2	1729	March 20, 2018
What's the meaning of error code "NPP_STREAM_CTX_ERROR" GPU-Accelerated Libraries npp	2	134	May 27, 2025
NPP 10.1 npp**_Ctx functions asynchronous? GPU-Accelerated Libraries	2	891	August 29, 2019
Npp with multiple Streams GPU-Accelerated Libraries	1	2032	August 31, 2016
NPP Stream crash GPU-Accelerated Libraries	5	2559	March 21, 2017

NppStreamContext usage for nppi_Ctx functions

Related topics