Deepstream SDK + VPI on Jetson tx2

Hi, I am trying to use VPI (0.4.4) with Deepstream 5.0 on a Jetson tx2. I have modified the dsexample gstreamer plugin to call the VPI library. The problem I am encountering seems to be that I have the incorrect memory format. The data comes from the nvvideoconvert as VPI_IMAGE_FORMAT_NV12_BL instead of VPI_IMAGE_FORMAT_NV12.

From a previous thread (Using VPI with VIC backend in Deepstream pipeline on AGX Xavier), it looks like the format should be changed to pitch linear. However, that does not seem to be the case. Below is my command line which was what was recommended in the previous thread.

gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_720p.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! nvvideoconvert ! ‘video/x-raw(memory:NVMM),format=NV12,colorimetry=bt709’ ! dsexample ! nvegltransform ! nveglglessink

Hi,

May I know the error or problem you meet?

In general, you can get a pitch linear buffer with the pipeline shared in the topic 161278.
Then wrap it with vpiImageCreateCUDAMemWrapper.

If there is an issue, please share the detailed information and the corresponding source with us.

Thanks.

Hi,

First, it’s recommended to upgrade your device to JetPack4.5 for our latest VPI v1.0 release.

To get pitch-linear RGBA data, you can try the pipeline below:

gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_720p.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! dsexample ! nvegltransform ! nveglglessink

If you want an NV12 pitch linear buffer, please try to create the buffer through NvBufSurface before mapping it to VPI image.
You can find a demo of NvBufSurface API below:

Thanks

Thanks AastaLLL. I will do that today and get back to you soon.

I have tried your suggested command line, but I still get an error “Error: VPI_ERROR_INTERNAL in gstdsexample.cpp at line 522 ((cudaErrorInvalidResourceHandle))” when I call the vpiImageCreateCUDAMemWrapper() function. This is probably something simple. I have attached the relevant code below.

/* Map the buffer so that it can be accessed by CPU */
if (NvBufSurfaceMap (surface, 0, -1, NVBUF_MAP_READ_WRITE) != 0){
goto error;
}

/* Cache the mapped data for CPU access */
NvBufSurfaceSyncForCpu (surface, 0, -1);

memset(&img_data, 0, sizeof(img_data));

img_data.format = VPI_IMAGE_FORMAT_RGBA8;
img_data.numPlanes = surface->surfaceList[0].planeParams.num_planes;

for(i=0; i<img_data.numPlanes; i++) {
img_data.planes[i].width = surface->surfaceList[0].planeParams.width[i];
img_data.planes[i].height = surface->surfaceList[0].planeParams.height[i];
img_data.planes[i].pitchBytes = surface->surfaceList[0].planeParams.pitch[i];
img_data.planes[i].data = (void *)&((char *)surface->surfaceList[0].mappedAddr.addr[i])[surface->surfaceList[0].planeParams.offset[1]];
}

CHECK_VPI_STATUS(vpiImageCreateCUDAMemWrapper(&img_data, 0, &img));

CHECK_VPI_STATUS(vpiSubmitConvertImageFormat(dsexample->vpi_stream, VPI_BACKEND_CPU, img, img, NULL));

CHECK_VPI_STATUS(vpiStreamSync(dsexample->vpi_stream));

if (NvBufSurfaceUnMap (surface, 0, -1)){
goto error;
}

Hi,

The buffer from vpiImageCreateCUDAMemWrapper is a GPU buffer.
But the backend in the above source is set to VPI_BACKEND_CPU.

Please noted that you need a GPU backend to access a GPU buffer.

Thanks.

Yes, sorry, it appears I missed that. I did change it to VPI_BACKEND_CUDA but the result was still the same. I am running it on a TX2. Do I need to map the buffers differently?

Hi,

Would you mind sharing the complete source with us?
We want to reproduce this in our environment and check with our internal team.

Thanks.

No problem. I have been trying different things, so you may need to uncomment some things. I most recently took out the NvBufSurfaceSyncForCpu since you mentioned it was GPU memory. Let me know if you need anything else.

gst-dsexample.tar.gz (23.4 KB)

Hi,

Thanks for your source.

We are going to reproduce this internally.
Will update more information with you later.

Thanks.

Hi, just seeing if there is an update on this issue.

Hi,

Sorry that we are still checking this.
Will share information with you later.

Thanks.

Hi,

Sorry for keeping you waiting.
We try to reproduce this issue but meet a VPI_ERROR_INVALID_ARGUMENT error rather than VPI_ERROR_INTERNAL.

Not sure if anything different in our deepstream-app pipeline.
Could you also share the pipeline configuration with us?

Thanks.

Thanks, I will look at that this week and get back to you.

Sorry for the delay, here is the pipeline

gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_720p.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! nvvideoconvert ! ‘video/x-raw(memory:NVMM),format=RGBA’ ! dsexample ! nvegltransform ! nveglglessink

Here is the output:
Setting pipeline to PAUSED …

Using winsys: x11
Opening in BLOCKING MODE
Opening in BLOCKING MODE
Pipeline is PREROLLING …
Got context from element ‘eglglessink0’: gst.egl.EGLDisplay=context, display=(GstEGLDisplay)NULL;
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
NVMEDIA_ARRAY: 53, Version 2.1
NVMEDIA_VPI : 172, Version 2.4
Error: VPI_ERROR_INTERNAL in gstdsexample.cpp at line 505 ((cudaErrorInvalidResourceHandle))
ERROR: from element /GstPipeline:pipeline0/GstQTDemux:qtdemux0: Internal data stream error.
Additional debug info:
qtdemux.c(6073): gst_qtdemux_loop (): /GstPipeline:pipeline0/GstQTDemux:qtdemux0:
streaming stopped, reason error (-5)
ERROR: pipeline doesn’t want to preroll.
Setting pipeline to NULL …
Freeing pipeline …

Hi,

Confirmed that we can reproduce the VPI_ERROR_INTERNAL error with the pipeline you shared.
Will share more information with you later.

Thanks.

Hi,

Thanks for your patience.

Deepstream + VPI can work with the vpiImageCreateHostMemWrapper as below:

diff --git a/gstdsexample.cpp b/gstdsexample.cpp
index 98df7c9..c521c46 100644
--- a/gstdsexample.cpp
+++ b/gstdsexample.cpp
@@ -442,6 +442,7 @@ gst_dsexample_transform_ip (GstBaseTransform * btrans, GstBuffer * inbuf)

   VPIImageData img_data;
   VPIImage img = NULL;
+  VPIImage out = NULL;

   cv::Mat in_mat;
   cv::Mat out_mat;
@@ -488,25 +489,42 @@ gst_dsexample_transform_ip (GstBaseTransform * btrans, GstBuffer * inbuf)
   //printf("Num planes %d, Width %d, Height %d, Pitch %d, Offset %d, Size %d, bytesPerPix %d \n", surface->surfaceList[0].planeParams.num_planes, surface->surfaceList[0].planeParams.width[
1],surface->surfaceList[0].planeParams.height[1], surface->surfaceList[0].planeParams.pitch[1], surface->surfaceList[0].planeParams.offset[1], surface->surfaceList[0].planeParams.psize[1],s
urface->surfaceList[0].planeParams.bytesPerPix[1]);
   //Make every other frame average grey.

+  if (surface->surfaceList[0].mappedAddr.addr[0] == NULL){
+    if (NvBufSurfaceMap (surface, 0, 0, NVBUF_MAP_READ_WRITE) != 0){
+      GST_ELEMENT_ERROR (dsexample, STREAM, FAILED,
+        ("%s:buffer map to be accessed by CPU failed", __func__), (NULL));
+      return GST_FLOW_ERROR;
+    }
+  }
+
+  NvBufSurfaceSyncForCpu (dsexample->inter_buf, 0, 0);
+
   memset(&img_data, 0, sizeof(img_data));

   img_data.format = VPI_IMAGE_FORMAT_RGBA8;
   img_data.numPlanes = surface->surfaceList[0].planeParams.num_planes;

+  if(dsexample->inter_buf->memType == NVBUF_MEM_SURFACE_ARRAY)
+    NvBufSurfaceSyncForCpu (surface, 0, 0);

   for(i=0; i<img_data.numPlanes; i++) {
     img_data.planes[i].width = surface->surfaceList[0].planeParams.width[i];
     img_data.planes[i].height = surface->surfaceList[0].planeParams.height[i];
     img_data.planes[i].pitchBytes = surface->surfaceList[0].planeParams.pitch[i];
-    img_data.planes[i].data = surface->surfaceList[0].dataPtr;//(void *)&((char *)surface->surfaceList[0].mappedAddr.addr[i])[surface->surfaceList[0].planeParams.offset[1]];
+    img_data.planes[i].data = surface->surfaceList[0].mappedAddr.addr[0];
   }

-  CHECK_VPI_STATUS(vpiImageCreateCUDAMemWrapper(&img_data, 0, &img));
+  CHECK_VPI_STATUS(vpiImageCreate(img_data.planes[0].width, img_data.planes[0].height, VPI_IMAGE_FORMAT_BGR8, 0, &out));
+  CHECK_VPI_STATUS(vpiImageCreateHostMemWrapper(&img_data, 0, &img));
+  CHECK_VPI_STATUS(vpiSubmitConvertImageFormat(dsexample->vpi_stream, VPI_BACKEND_CUDA, img, out, NULL));
+  CHECK_VPI_STATUS(vpiStreamSync(dsexample->vpi_stream));

-  CHECK_VPI_STATUS(vpiSubmitConvertImageFormat(dsexample->vpi_stream, VPI_BACKEND_CUDA, img, img, NULL));
+  VPIImageData data;
+  CHECK_VPI_STATUS(vpiImageLock(out, VPI_LOCK_READ, &data));
+
+  out_mat = cv::Mat (data.planes[0].height, data.planes[0].width, CV_8UC3, data.planes[0].data, data.planes[0].pitchBytes);
+  cv::imwrite("/home/nvidia/output.png", out_mat);

-  CHECK_VPI_STATUS(vpiStreamSync(dsexample->vpi_stream));
-

   //if (NvBufSurfaceUnMap (surface, 0, -1)){
   //  goto error;

We are working on feeding the CUDA buffer to VPI with vpiImageCreateCudaMemWrapper directly.
Hope to give you an update soon.

Thanks.