NvBuffer VPI Interoperability

busch.johannes · June 4, 2021, 12:59pm

Hey,
I want to use VPI for remapping.
Therefore I get Images from Argus EGL Stream. I use the NV::IImageNativeBuffer interface to create NvBuffer fd from it. I use the fd to enque into NvDrmRenderer for display.

Now I want to have an vpi calculation step in between.
So I copy the EGL Image to an NvBuffer mapped as vpiImage for input. And the NvBuffer for display as output. But the output buffer is not effected at all. My code:

To program it faster, I used the first 3 images to create NvBuffers from them and display them right away. I also copy the first image to my vpi_input buffer. Instead of vpiImageSetWrapper I use vpiImageCreateNvBufferWrapper now, since vpiImageSetWrapper I was trying that out. I know vpiImageSetWrapper would be the better option.

if(frame_count == 0){

        vpi_fd_input = iNativeBuffer->createNvBuffer(streamSize,NvBufferColorFormat_YUV420,NvBufferLayout_Pitch);
        vpiImageCreateNvBufferWrapper(vpi_fd_input, &params, VPI_BACKEND_CUDA , &input_vpi);


    }

    VPIImageFormat 	format;
    vpiImageGetFormat(output_vpi,&format);
    printf("Finished init.\n");
    printf("%s\n", vpiImageFormatGetName(format));
    VPIStatus status_vpi;
    int current_fd;
    if(frame_count / 3 < 3) {
        grid_fd[frame_count / 3 ] = iNativeBuffer->createNvBuffer(streamSize,NvBufferColorFormat_YUV420,NvBufferLayout_Pitch);
        current_fd = grid_fd[frame_count / 3];
    }
    else {
        printf("Starting vpi.\n");
        current_fd = drm_renderer->dequeBuffer();
        iNativeBuffer->copyToNvBuffer(vpi_fd_input);
        iNativeBuffer->copyToNvBuffer(current_fd);
        status_vpi = vpiImageCreateNvBufferWrapper(current_fd, &params, VPI_BACKEND_CUDA , &output_vpi);
        if(status_vpi) printf("%s\n",vpiStatusGetName(status_vpi));
        status_vpi = vpiSubmitRemap(stream_vpi, VPI_BACKEND_CUDA, warp_vpi, input_vpi, output_vpi, VPI_INTERP_LINEAR, VPI_BORDER_ZERO, 0);
        if(status_vpi) printf("%s\n",vpiStatusGetName(status_vpi));
        vpiStreamSync(stream_vpi);
        printf("Finished vpi.\n");
    }
drm_renderer->enqueBuffer(current_fd);

I used the example from vpi remap.

int32_t w=1920, h=1080;

vpiStreamCreate(0, &stream_vpi);
memset(&map_vpi, 0, sizeof(map_vpi));
map_vpi.grid.numHorizRegions  = 1;
map_vpi.grid.numVertRegions   = 1;
map_vpi.grid.regionWidth[0]   = w;
map_vpi.grid.regionHeight[0]  = h;
map_vpi.grid.horizInterval[0] = 1;
map_vpi.grid.vertInterval[0]  = 1;
vpiWarpMapAllocData(&map_vpi);

vpiWarpMapGenerateIdentity(&map_vpi);
int i;
for (i = 0; i < map_vpi.numVertPoints; ++i)
{
    VPIKeypoint *row = (VPIKeypoint *)((uint8_t *)map_vpi.keypoints + map_vpi.pitchBytes * i);
    int j;
    for (j = 0; j < map_vpi.numHorizPoints; ++j)
    {
        float x = row[j].x - w / 2.0f;
        float y = row[j].y - h / 2.0f;

        const float R = h / 8.0f; /* planet radius */

        const float r = sqrtf(x * x + y * y);

        float theta = M_PI + atan2f(y, x);
        float phi   = M_PI / 2 - 2 * atan2f(r, 2 * R);

        row[j].x = fmod((theta + M_PI) / (2 * M_PI) * (w - 1), w - 1);
        row[j].y = (phi + M_PI / 2) / M_PI * (h - 1);
    }
}

vpiCreateRemap(VPI_BACKEND_CUDA, &map_vpi, &warp_vpi);
VPIWrapNvBufferParams params;
vpiInitWrapNvBufferParams(&params);

Thanks for any help!

EDIT: I measured time and it takes 2-3ms to run, so it seems the algorithm is calculating something.

AastaLLL · June 7, 2021, 2:45am

Hi,

We have an example for EGL buffer → VPI → EGL buffer below:
https://forums.developer.nvidia.com/t/deepstream-sdk-vpi-on-jetson-tx2/166834/21

...

/* map inbuf -> EGL */
if (NvBufSurfaceMapEglImage (surface, -1) != 0) {
  g_print ("Error: Could not map EglImage from NvBufSurface for dsexample\n");
  goto error;
}
  
if (cuGraphicsEGLRegisterImage (&pResource,
      surface->surfaceList[0].mappedAddr.eglImage,
      CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE) != CUDA_SUCCESS) {
  g_print ("Error: Failed to register EGLImage in cuda\n");
  goto error;
}
 
if (cuGraphicsResourceGetMappedEglFrame (&eglFrame,
      pResource, 0, 0) != CUDA_SUCCESS) {
  g_print ("Error: Failed to get mapped EGL Frame\n");
  goto error;
}
cuCtxSynchronize();


/* inter_buf -> VPI */
memset(&data, 0, sizeof(data));
data.format = VPI_IMAGE_FORMAT_RGBA8;
data.numPlanes = surface->surfaceList[0].planeParams.num_planes;
for(i=0; i<data.numPlanes; i++) {
  data.planes[i].width = surface->surfaceList[0].planeParams.width[i];
  data.planes[i].height = surface->surfaceList[0].planeParams.height[i];
  data.planes[i].pitchBytes = surface->surfaceList[0].planeParams.pitch[i];
  data.planes[i].data = eglFrame.frame.pPitch[i];
}
CHECK_VPI_STATUS(vpiImageCreateCUDAMemWrapper(&data, 0, &img));
CHECK_VPI_STATUS(vpiImageCreate(data.planes[0].width, data.planes[0].height, VPI_IMAGE_FORMAT_RGBA8, 0, &out));
 
...

/* Apply warping */
CHECK_VPI_STATUS(vpiSubmitConvertImageFormat(dsexample->vpi_stream, VPI_BACKEND_CUDA, img, out, NULL));
CHECK_VPI_STATUS(vpiSubmitPerspectiveWarp(dsexample->vpi_stream, 0, dsexample->warp, img, xform, out
                                            , VPI_INTERP_LINEAR, VPI_BORDER_ZERO, 0));
CHECK_VPI_STATUS(vpiSubmitConvertImageFormat(dsexample->vpi_stream, VPI_BACKEND_CUDA, out, img, NULL));
CHECK_VPI_STATUS(vpiStreamSync(dsexample->vpi_stream));

...

Does this meet your requirement?
Or you prefer to wrap a VPI image with vpiImageCreateNvBufferWrapper?

Thanks.

busch.johannes · June 7, 2021, 8:56am

Hey, thanks for this example!
In the end, it is not important how it is done. But I want to have the most low latency for receiving Argus Stream in NV12 and display it with NvDrmRenderer in NV12.

So as output buffer I would need a NvBuffer. As input buffer I could also get the image with CuEGLStreamConsumerAcquireFrame, right? So there is no copy involved?

After creating my output buffer, I could also wrap it in an EGL Image and wrap it into CUDA from there or use vpi egl image wrapper functions.

Since the vpi is support nvBuffer wrapper I thought it would be most convenient, but as I understand you, you recommend wrapping in CUDA, since there are more examples and it is better tested?

Best regards,
jb

AastaLLL · June 22, 2021, 6:49am

Hi,

You can use vpiImageCreateNvBufferWrapper as well.
The implementation should be similar to the sample shared in Jun 7.

Since we don’t have an example for vpiImageCreateNvBufferWrapper.
If you meet an issue when using it, could you attach a complete source for us checking?

Thanks.

busch.johannes · June 22, 2021, 8:59pm

Hey thanks again for your response!

I need to finish another project till end of the week, but I’ll respond next week and share my code!

Thanks!

AastaLLL · July 5, 2021, 4:26am

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

Did you get some time to check this last week?
Thanks.

system · September 12, 2021, 1:44am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using VPI in a Custom DeepStream Plugin DeepStream SDK gstreamer , vpi	4	1741	October 12, 2021
vpiSubmitTemporalNoiseReduction fails with VPI_ERROR_INVALID_ARGUMENT on buffer created by vpiImageCreateWrapper/VPI_IMAGE_BUFFER_CUDA_PITCH_LINEAR Jetson AGX Orin cuda , vpi	5	34	December 30, 2024
How to prevent vpiSubmitConvertImageFormat from calling cudaGraphicsEGLRegisterImage, which kills performance? Jetson AGX Orin cuda	9	71	December 5, 2024
VPI Image wraper around NvBufSurface on Jetson AGX Xavier Jetson AGX Xavier vpi	10	796	December 26, 2023
Vpi nvbuffer question Jetson AGX Xavier jetson-inference	9	650	April 26, 2023
Using VPI in GStreamer Jetson AGX Orin camera , gstreamer , documentation , vpi	51	4931	March 8, 2023
Camera DMA buffer to VPIImage as efficiently as possible Jetson AGX Orin camera , vpi	6	422	April 23, 2024
How do I get image from cudaBayerDemosaic and connect to VPI? Jetson Nano vpi	17	1709	July 12, 2022
Converting an EGLImageKHR to a VPIImage throws an error Jetson AGX Orin vpi	11	81	December 4, 2024
VPI CUDA interop with managed memory Jetson AGX Xavier cuda , vpi	16	1702	October 18, 2021

NvBuffer VPI Interoperability

Related topics