it takes too long time that nppiMalloc_8u_xx function called first time is completed.


I call nppiMalloc_8u_xx function to resize image and for color space conversion.
it takes too long time that only nppiMalloc_8u_xx function called first time in codes is completed.
it is about 640ms.

Nothing is wrong with it?

Nvida Graphic Adapter : Quadro M2000(m206)

int nSrcStep = 0;
Npp8u * rgbSrcDev;
rgbSrcDev = nppiMalloc_8u_C3(1280,720,&nSrcStep); <-- first time called nppiMalloc_8u_xx function

CUDA itself and most of the libraries that ship with CUDA have internal state (also referred to as a “context”), so the first API call tends to trigger initialization of the entire software stack. That first API call is often a memory allocation call, for obvious reasons. 640ms sounds a bit long, maybe this is a fairly slow computer?

With the correct timing methodology, this one-time startup overhead for context initialization should not matter. If this is a problem, you can trigger initialization of the CUDA context with a call to cudaFree(0) in a more opportune place; note that this doesn’t initialize NPP though, so there may still be residual startup overhead for that.

Also, if you’re not using CUDA 8, switch to CUDA 8. Some modifications were made to npp library structure in CUDA 8 to improve first-time initiialization overhead.