VPI + DeepStream is slower then expected (only wrapping)

• Hardware Platform (Jetson Nano)
• DeepStream Version 6.0.0
• JetPack Version (4.6)
• Issue Type( questions)

I have done some OpenCv prototype in python on my workstation but now I want to transfer algorithm to jetson deepstream with VPI.

I use this patch as reference and want to wrap DS frames to use in VPI algorithm: Deepstream SDK + VPI on Jetson tx2 - #21 by AastaLLL

And its working but not at full capabilities. For example I firstly want to do only wrapping: NvBufSurface → EGLImage->CUDA Inter->VPIWrapper:


    memset(&data, 0, sizeof(data));
    data.format = VPI_IMAGE_FORMAT_RGBA8;
    data.numPlanes = surface->surfaceList[0].planeParams.num_planes;
    for(int i=0; i<data.numPlanes;i++){
        data.planes[i].width = surface->surfaceList[0].planeParams.width[i];
        data.planes[i].height = surface->surfaceList[0].planeParams.height[i];
        data.planes[i].pitchBytes = surface->surfaceList[0].planeParams.pitch[i];
        data.planes[i].data = egl_frame.frame.pPitch[i];

    CHECK_VPI_STATUS(vpiImageCreateCUDAMemWrapper(&data, 0, &img));



And without this I got 60fps as expected but with this code from vpi_wrap.patch I got 30-45fps.
What can cause this drop? Documentation say that there is no copy (only headers are copied) but I see fps dropped to much without any processing.

Best regards,


Please make sure you have maximized the device performance first.
Does the pipeline can reach 60 fps without the VPI wrapping?


I found what cause this slowdown:

    CHECK_VPI_STATUS(vpiImageCreateCUDAMemWrapper(&data, 0, &img));

I don’t check it out yet if it is really same image (NvSurface == vpiImage) but change algorithm to this:

    if(ds_manager->img == nullptr) {
        vpiImageCreateCUDAMemWrapper(&data, 0, &ds_manager->img);
    } else {
        vpiImageSetWrappedCUDAMem(ds_manager->img, &data);

And form me this is not clear for first:

I thought that this is only wrapper but underlay there this function create new image:
[out] img Pointer to memory that will receive the created image handle.

Do I correctly understand that this function then:
is really wrapping? Creating underlay image (new one is really expensive). Copying not that much?

How should I draw lines/rectangle/circles on images? NvOSD or CUDA?

Best regards,

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one.


The wrapping won’t create the buffer but only the wrapper handle.
vpiImageSetWrappedCUDAMem is used for redefining the wrapper to point to another memory.

Based on your use case, you can create the wrapper in the initial time.
And redefine the pointer with vpiImageSetWrappedCUDAMem (if the buffer pointer changes) when runtime.