NvBuffer VPI Interoperability

Hey,
I want to use VPI for remapping.
Therefore I get Images from Argus EGL Stream. I use the NV::IImageNativeBuffer interface to create NvBuffer fd from it. I use the fd to enque into NvDrmRenderer for display.

Now I want to have an vpi calculation step in between.
So I copy the EGL Image to an NvBuffer mapped as vpiImage for input. And the NvBuffer for display as output. But the output buffer is not effected at all. My code:

To program it faster, I used the first 3 images to create NvBuffers from them and display them right away. I also copy the first image to my vpi_input buffer. Instead of vpiImageSetWrapper I use vpiImageCreateNvBufferWrapper now, since vpiImageSetWrapper I was trying that out. I know vpiImageSetWrapper would be the better option.

if(frame_count == 0){

        vpi_fd_input = iNativeBuffer->createNvBuffer(streamSize,NvBufferColorFormat_YUV420,NvBufferLayout_Pitch);
        vpiImageCreateNvBufferWrapper(vpi_fd_input, &params, VPI_BACKEND_CUDA , &input_vpi);


    }

    VPIImageFormat 	format;
    vpiImageGetFormat(output_vpi,&format);
    printf("Finished init.\n");
    printf("%s\n", vpiImageFormatGetName(format));
    VPIStatus status_vpi;
    int current_fd;
    if(frame_count / 3 < 3) {
        grid_fd[frame_count / 3 ] = iNativeBuffer->createNvBuffer(streamSize,NvBufferColorFormat_YUV420,NvBufferLayout_Pitch);
        current_fd = grid_fd[frame_count / 3];
    }
    else {
        printf("Starting vpi.\n");
        current_fd = drm_renderer->dequeBuffer();
        iNativeBuffer->copyToNvBuffer(vpi_fd_input);
        iNativeBuffer->copyToNvBuffer(current_fd);
        status_vpi = vpiImageCreateNvBufferWrapper(current_fd, &params, VPI_BACKEND_CUDA , &output_vpi);
        if(status_vpi) printf("%s\n",vpiStatusGetName(status_vpi));
        status_vpi = vpiSubmitRemap(stream_vpi, VPI_BACKEND_CUDA, warp_vpi, input_vpi, output_vpi, VPI_INTERP_LINEAR, VPI_BORDER_ZERO, 0);
        if(status_vpi) printf("%s\n",vpiStatusGetName(status_vpi));
        vpiStreamSync(stream_vpi);
        printf("Finished vpi.\n");
    }
drm_renderer->enqueBuffer(current_fd);

I used the example from vpi remap.

int32_t w=1920, h=1080;

vpiStreamCreate(0, &stream_vpi);
memset(&map_vpi, 0, sizeof(map_vpi));
map_vpi.grid.numHorizRegions  = 1;
map_vpi.grid.numVertRegions   = 1;
map_vpi.grid.regionWidth[0]   = w;
map_vpi.grid.regionHeight[0]  = h;
map_vpi.grid.horizInterval[0] = 1;
map_vpi.grid.vertInterval[0]  = 1;
vpiWarpMapAllocData(&map_vpi);

vpiWarpMapGenerateIdentity(&map_vpi);
int i;
for (i = 0; i < map_vpi.numVertPoints; ++i)
{
    VPIKeypoint *row = (VPIKeypoint *)((uint8_t *)map_vpi.keypoints + map_vpi.pitchBytes * i);
    int j;
    for (j = 0; j < map_vpi.numHorizPoints; ++j)
    {
        float x = row[j].x - w / 2.0f;
        float y = row[j].y - h / 2.0f;

        const float R = h / 8.0f; /* planet radius */

        const float r = sqrtf(x * x + y * y);

        float theta = M_PI + atan2f(y, x);
        float phi   = M_PI / 2 - 2 * atan2f(r, 2 * R);

        row[j].x = fmod((theta + M_PI) / (2 * M_PI) * (w - 1), w - 1);
        row[j].y = (phi + M_PI / 2) / M_PI * (h - 1);
    }
}

vpiCreateRemap(VPI_BACKEND_CUDA, &map_vpi, &warp_vpi);
VPIWrapNvBufferParams params;
vpiInitWrapNvBufferParams(&params);

Thanks for any help!

EDIT: I measured time and it takes 2-3ms to run, so it seems the algorithm is calculating something.

Hi,

We have an example for EGL buffer → VPI → EGL buffer below:
https://forums.developer.nvidia.com/t/deepstream-sdk-vpi-on-jetson-tx2/166834/21

...

/* map inbuf -> EGL */
if (NvBufSurfaceMapEglImage (surface, -1) != 0) {
  g_print ("Error: Could not map EglImage from NvBufSurface for dsexample\n");
  goto error;
}
  
if (cuGraphicsEGLRegisterImage (&pResource,
      surface->surfaceList[0].mappedAddr.eglImage,
      CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE) != CUDA_SUCCESS) {
  g_print ("Error: Failed to register EGLImage in cuda\n");
  goto error;
}
 
if (cuGraphicsResourceGetMappedEglFrame (&eglFrame,
      pResource, 0, 0) != CUDA_SUCCESS) {
  g_print ("Error: Failed to get mapped EGL Frame\n");
  goto error;
}
cuCtxSynchronize();


/* inter_buf -> VPI */
memset(&data, 0, sizeof(data));
data.format = VPI_IMAGE_FORMAT_RGBA8;
data.numPlanes = surface->surfaceList[0].planeParams.num_planes;
for(i=0; i<data.numPlanes; i++) {
  data.planes[i].width = surface->surfaceList[0].planeParams.width[i];
  data.planes[i].height = surface->surfaceList[0].planeParams.height[i];
  data.planes[i].pitchBytes = surface->surfaceList[0].planeParams.pitch[i];
  data.planes[i].data = eglFrame.frame.pPitch[i];
}
CHECK_VPI_STATUS(vpiImageCreateCUDAMemWrapper(&data, 0, &img));
CHECK_VPI_STATUS(vpiImageCreate(data.planes[0].width, data.planes[0].height, VPI_IMAGE_FORMAT_RGBA8, 0, &out));
 
...

/* Apply warping */
CHECK_VPI_STATUS(vpiSubmitConvertImageFormat(dsexample->vpi_stream, VPI_BACKEND_CUDA, img, out, NULL));
CHECK_VPI_STATUS(vpiSubmitPerspectiveWarp(dsexample->vpi_stream, 0, dsexample->warp, img, xform, out
                                            , VPI_INTERP_LINEAR, VPI_BORDER_ZERO, 0));
CHECK_VPI_STATUS(vpiSubmitConvertImageFormat(dsexample->vpi_stream, VPI_BACKEND_CUDA, out, img, NULL));
CHECK_VPI_STATUS(vpiStreamSync(dsexample->vpi_stream));

...

Does this meet your requirement?
Or you prefer to wrap a VPI image with vpiImageCreateNvBufferWrapper?

Thanks.

Hey, thanks for this example!
In the end, it is not important how it is done. But I want to have the most low latency for receiving Argus Stream in NV12 and display it with NvDrmRenderer in NV12.

So as output buffer I would need a NvBuffer. As input buffer I could also get the image with CuEGLStreamConsumerAcquireFrame, right? So there is no copy involved?

After creating my output buffer, I could also wrap it in an EGL Image and wrap it into CUDA from there or use vpi egl image wrapper functions.

Since the vpi is support nvBuffer wrapper I thought it would be most convenient, but as I understand you, you recommend wrapping in CUDA, since there are more examples and it is better tested?

Best regards,
jb

Hi,

You can use vpiImageCreateNvBufferWrapper as well.
The implementation should be similar to the sample shared in Jun 7.

Since we don’t have an example for vpiImageCreateNvBufferWrapper.
If you meet an issue when using it, could you attach a complete source for us checking?

Thanks.

Hey thanks again for your response!

I need to finish another project till end of the week, but I’ll respond next week and share my code!

Thanks!

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

Did you get some time to check this last week?
Thanks.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.