Calling vpi for computation is very slow

My device, Jetson AGX Xavier, has sudo jetson_clocks --max;
The question is: CHECK_STATUS (vpiImageCreateWrapperOpenCVMat (mPic, 0, & image)); It took 20ms,
CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, image, imageNV12, NULL));
CHECK_STATUS(vpiSubmitRescale(stream, backend, imageNV12, outputNV12, VPI_INTERP_LINEAR, VPI_BORDER_CLAMP, 0));
CHECK_STATUS(vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, outputNV12, output, NULL));
CHECK_STATUS(vpiStreamSync(stream)); Waiting for this to complete takes 20ms

Is based on the vpi example rescaleh ttps://

Moving to Jetson AGX forum (Jetson AGX Xavier - NVIDIA Developer Forums)


Please try it with batch execution.

Usually, the first few GPU call tends to be slower for some initialization.
A warmup loop can help to improve the performance.

You can find more info in the below document:


I am running the yolov5 model in the project, and want to use vpi to perform the pre-processing when scaling.
More sure this (vpiImageCreateWrapperOpenCVMat) wrapper spent 20 ms, calculate spent 20 ms, I input image size is 1920 * 1920, after scaling is 640 * 640

VPIImage imageNV12 = nullptr;
VPIImage outputNV12 = nullptr;
VPIImage output = nullptr;
VPIStream stream = nullptr;

All the above are operated before the online process. Only one image was updated and the wrapper was updated every time. Moreover, it took 4ms for the wrapper to be destroyed vpiImageDestroy(image);


If you use a CUDA backend, it will include some memory allocation and transfer since OpenCV uses a CPU buffer.

What kind of preprocessing do you want?
It’s recommended to check Deepstream SDK which can do some simple pre-processing like scaling.


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.