Jetson Nano: Deepstream Plugin Memory Management for OpenVX

My setup:

• Jetson Nano
• DeepStream 5.0
• JetPack 4.4
• TensorRT 7.1

Hello fellow nvidia developers,
I have read the amazing documentations of “Deepstream”, “CUDA for Tegra” and “Gstreamer” but failed to grasp the key concepts of shared SoC DRAM management. As far as I understand even if the memory is the same Hardware-wise, the cache usage and addressing differs between Device Memory, Pageable Host Memory, Pinned Memory, Unified Memory and Surface Array . I get that I should choose the memory type according to the process device, for example; Surface Array for Jetson iGPU operations. Please correct me if I am wrong so far.
So with these in mind I am trying to write a custom plugin for Deepstream and while inspecting the provided gst-dsexample I have failed to understand which type of memory does GstBuffer uses initially. Also I want to process frames on iGPU using OpenVX and I know OpenVX works on GPU but since I don’t know the initial memory type, I am clueless on how to pass frame data.
So basically my questions are : How to pass frame from GstBuffer to vximage while maximizing performance on iGPU? Do I need to use an EglImage instance or is an NvBufSurface instance?
I have also noticed concept of Zero-copy but I don’t know if that can be applied here.
Thanks in advance. :^)

Hi,
We don’t have experience of using vximage. Is it a CUDA buffer? If yes, you may refer to the sample code:

It demonstrates calling NvBufSurfaceMapEglImage() to get EglImage and cuGraphicsEGLRegisterImage(), cuGraphicsResourceGetMappedEglFrame to get CUDA pointer. If vximage is CUDA buffer, you can move data through the pointer.

Hello again,
I have managed to create an openvx image using surface address pointers with code below:

  NvBufSurfaceMap (surface, frame_meta->batch_id, 0, NVBUF_MAP_READ_WRITE);
  NvBufSurfaceSyncForDevice(surface, frame_meta->batch_id, 0);

  mat_addr.dim_x = dsexample->processing_width;
  mat_addr.dim_y = dsexample->processing_height;
  mat_addr.stride_x = RGBA_BYTES_PER_PIXEL;
  mat_addr.stride_y = RGBA_BYTES_PER_PIXEL * dsexample->processing_width;

  src1 = vxCreateImage(dsexample->context,dsexample->processing_width,dsexample->processing_height,VX_DF_IMAGE_U8);
  dsexample->vxInp = vxCreateImageFromHandle(dsexample->context, VX_DF_IMAGE_RGBX, &mat_addr, (void* const*)surface->surfaceList[0].mappedAddr.addr[0], VX_IMPORT_TYPE_HOST );

  NvBufSurfaceSyncForDevice(surface, frame_meta->batch_id, 0);
  
  status = vxGetStatus((vx_reference)dsexample->vxInp);

  status1 = vxGetStatus((vx_reference)src1);

I get “0” from status values which indicates everything is OK but when I try to manipulate data using openvx functions I get segmentation error. I suspect it is because I am failing to Map the Gstbuffer data to a NvBufSurface instance as a read/write buffer accessible by GPU. Is it because I don’t use EglImage ? Or am I doing everything wrong?

Hi,
Not sure how vxCreateImageFromHandle() works. Does it work if you allocate a buffer with malloc() like

void *ptr = malloc(width*height*4); // RGBA
vxCreateImageFromHandle(dsexample->context, VX_DF_IMAGE_RGBX, &mat_addr, (void* const*)ptr, VX_IMPORT_TYPE_HOST );

Hello @DaneLLL ,
thank you for your time, I have figured it out thanks to one of your older posts. For those who come across the same issue here is the proper way to create vxImage from NvBufSurface.

First comes the convertion from NvBufSurface to EGLImage. Original code is from DaneLLL’s older post

1->NvBufSurfaceMemSet ();
2->NvBufSurfaceMapEglImage ();
3->cuGraphicsEGLRegisterImage();
4->cuGraphicsResourceGetMappedEglFrame);
5->cuCtxSynchronize();

then;

vx_image vxInp = vxCreateImageFromHandle(context, VX_DF_IMAGE_RGBX, &vx_imagepatch_addressing, eglFrame.frame.pPitch, NVX_MEMORY_TYPE_CUDA);

After this NvBufSurface should be accessible by OpenVX properly for processing without any need of CPU access or OpenCV convertion.

1 Like

Many thanks for the sharing.