Are there any examples for pyds.NvBufSurface objects and methods?

I’d like to know how these new bindings could be used. From the description of pyds.NvBufSurfaceMap() I got the impression that it is possible in python app to edit frame data located in NVMM memory. Am I correct? Right now I’m moving frames to host and do my custom drawing through standard gstreamer buffer map. But it would be nice to avoid host-device copies.

I saw the deepstream-imagedata-multistream example but it seems that changes made to frame in array returned from pyds.get_nvds_buf_surface() do not sync back to device and do not propagate downstream.

Hi,
Now we have C sample in gst-dsexample to demonstrate NvBufSurfaceMap(). Don’t have python sample. Will check and update.

2 Likes

Thank you, I’ve looked at gst-dsexample plugin and deepstream-opencv-test sample app, it all works great.

Then I’ve tried to map and read\write NvBufSurface in a python plugin using ctypes mappings, since I couldn’t find a way to get NvBufSurface bindings object from Gst.Buffer, but wasn’t able to successfully sync surface for CPU

Here’s my plugin code

_GST_PADDING = 4
_NVBUF_MAX_PLANES = 4


class _GstMapInfo(ctypes.Structure):
    _fields_ = [
        ("memory", ctypes.c_void_p),
        ("flags", ctypes.c_int),
        ("data", ctypes.POINTER(ctypes.c_ubyte)),
        ("size", ctypes.c_size_t),
        ("maxsize", ctypes.c_size_t),
        ("user_data", ctypes.c_void_p * 4),
        ("_gst_reserved", ctypes.c_void_p * _GST_PADDING)
    ]


class _NvBufSurfaceMappedAddr(ctypes.Structure):
    _fields_ = [
        ('addr', ctypes.c_void_p * _NVBUF_MAX_PLANES),
        ('eglImage', ctypes.c_void_p),
        ("_reserved", ctypes.c_void_p * _GST_PADDING)
    ]


class _NvBufSurfacePlaneParams(ctypes.Structure):
    _fields_ = [
        ('num_planes', ctypes.c_uint32),
        ('width', ctypes.c_uint32 * _NVBUF_MAX_PLANES),
        ('height', ctypes.c_uint32 * _NVBUF_MAX_PLANES),
        ('pitch', ctypes.c_uint32 * _NVBUF_MAX_PLANES),
        ('offset', ctypes.c_uint32 * _NVBUF_MAX_PLANES),
        ('psize', ctypes.c_uint32 * _NVBUF_MAX_PLANES),
        ('bytesPerPix', ctypes.c_uint32 * _NVBUF_MAX_PLANES),
        ("_reserved", ctypes.c_void_p * (_GST_PADDING * _NVBUF_MAX_PLANES))
    ]


class _NvBufSurfaceParams(ctypes.Structure):
    _fields_ = [
        ('width', ctypes.c_uint32),
        ('height', ctypes.c_uint32),
        ('pitch', ctypes.c_uint32),
        ('colorFormat', ctypes.c_int),
        ('layout', ctypes.c_int),
        ('bufferDesc', ctypes.c_uint64),
        ('dataSize', ctypes.c_uint32),
        ('dataPtr', ctypes.c_void_p),
        ('planeParams', _NvBufSurfacePlaneParams),
        ('mappedAddr', _NvBufSurfaceMappedAddr),
        ("_reserved", ctypes.c_void_p * _GST_PADDING)
    ]


class _NvBufSurface(ctypes.Structure):
    _fields_ = [
        ('gpuId', ctypes.c_uint32),
        ('batchSize', ctypes.c_uint32),
        ('numFilled', ctypes.c_uint32),
        ('isContiguous', ctypes.c_bool),
        ('memType', ctypes.c_int),
        ('surfaceList', ctypes.POINTER(_NvBufSurfaceParams)),
        ("_gst_reserved", ctypes.c_void_p * _GST_PADDING)
    ]


_GST_MAP_INFO_POINTER = ctypes.POINTER(_GstMapInfo)
_libgst = ctypes.CDLL("libgstreamer-1.0.so.0")
_libgst.gst_buffer_map.argtypes = [ctypes.c_void_p, _GST_MAP_INFO_POINTER, ctypes.c_int]
_libgst.gst_buffer_map.restype = ctypes.c_bool

_libgst.gst_buffer_unmap.argtypes = [ctypes.c_void_p, _GST_MAP_INFO_POINTER]
_libgst.gst_buffer_unmap.restype = None

_libnvbufsurface = ctypes.CDLL('libnvbufsurface.so')
_libnvbufsurface.NvBufSurfaceMap.argtypes = [
    ctypes.POINTER(_NvBufSurface),
    ctypes.c_int,  # index     Index of a buffer in the batch (frame_meta.batch_id), -1 for all
    ctypes.c_int,  # plane     Index of a plane in the buffer (0?), -1 for all
    ctypes.c_int  # NvBufSurfaceMemMapFlags type, 0 READ, 1 WRITE, 2 READ_WRITE
]
_libnvbufsurface.NvBufSurfaceMap.restype = ctypes.c_int  # 0 if successful, or -1 otherwise

_libnvbufsurface.NvBufSurfaceSyncForCpu.argtypes = [
    ctypes.POINTER(_NvBufSurface),
    ctypes.c_int,  # index     Index of a buffer in the batch (frame_meta.batch_id), -1 for all
    ctypes.c_int,  # plane     Index of a plane in the buffer (0?), -1 for all
]
_libnvbufsurface.NvBufSurfaceSyncForCpu.restype = ctypes.c_int  # 0 if successful, or -1 otherwise

class SurfTest(GstBase.BaseTransform):
    GST_PLUGIN_NAME = 'surf'

    __gstmetadata__ = (
        'test',
        'test',
        'test',
        'test')

    __gsttemplates__ = (
        Gst.PadTemplate.new('src', Gst.PadDirection.SRC, Gst.PadPresence.ALWAYS, Gst.Caps.new_any()),
        Gst.PadTemplate.new('sink', Gst.PadDirection.SINK, Gst.PadPresence.ALWAYS, Gst.Caps.new_any())
    )

    def do_transform_ip(self, buffer):

        map_info = _GstMapInfo()
        res = _libgst.gst_buffer_map(hash(buffer), map_info, Gst.MapFlags.READ)
        Gst.log(f'gst buffer map {res}')  # prints True

        nvbuf_surf_p = ctypes.cast(map_info.data, ctypes.POINTER(_NvBufSurface))
        nvbuf_surf = nvbuf_surf_p.contents
        Gst.log(f'surface mem type {nvbuf_surf.memType}')  # prints 3

        batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(buffer))
        l_frame = batch_meta.frame_meta_list
        while l_frame is not None:
            try:
                frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
            except StopIteration:
                break

            nvbuf_surf_params = nvbuf_surf.surfaceList[frame_meta.batch_id]
            nvbuf_surf_mappedaddr = nvbuf_surf_params.mappedAddr

            if nvbuf_surf_mappedaddr.addr[0] is None:
                res = _libnvbufsurface.NvBufSurfaceMap(nvbuf_surf_p, frame_meta.batch_id, 0, 2)
                Gst.log(f'NvBufSurfaceMap {res}')  # prints 0
                if res == 0:
                    res = _libnvbufsurface.NvBufSurfaceSyncForCpu(nvbuf_surf_p, frame_meta.batch_id, 0)
                    Gst.log(f'sync for cpu {res}')  # prints -1

            try:
                l_frame = l_frame.next
            except StopIteration:
                break

        _libgst.gst_buffer_unmap(hash(buffer), map_info)
        return Gst.FlowReturn.OK

And my run command

GST_DEBUG=python:6 gst-launch-1.0 \
uridecodebin uri=file:///opt/app/data/sample_720p.mp4 \
! nvvideoconvert nvbuf-memory-type=3 \
! video/x-raw\(memory:NVMM\), format=RGBA \
! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 nvbuf-memory-type=3 \
! surf \
! fakesink sync=false

I understand this is hardly a supported case, but if there’s any advice I’d very much appreciate it.

Hi,
We will check with teams and see if we can work out a python sample for demonstrating the APIs.

Hi,
You should be using DeepStream SDK 5.0. Please confirm this. And please share you use Jetson platforms or desktop GPUs.

Yes, this is inside nvcr.io/nvidia/deepstream:5.0-dp-20.04-base container on desktop with RTX 2080.

It turned out that returning -1 from NvBufSurfaceSyncForCpu() doesn’t hurt my case, in fact, same happens in gst-dsexample plugin, it just never checks the return value. So, code listed above basically works, and it’s possible to get access to mapped memory in python with code like this (place below NvBufSurfaceMap() call)

shape = (
    nvbuf_surf_params.planeParams.height[0],
    nvbuf_surf_params.planeParams.width[0],
    nvbuf_surf_params.planeParams.pitch[0] // nvbuf_surf_params.planeParams.width[0]
)
ctypes_arr = ctypes.cast(
    nvbuf_surf_mappedaddr.addr[0],
    ctypes.POINTER(ctypes.c_uint8 * shape[2] * shape[1] * shape[0])
).contents
np_arr = np.ctypeslib.as_array(ctypes_arr)

I’ve also found out that calling NvBufSurfaceSyncForCpu() isn’t necessary at all, at least for unified memory. Documentation for this functionality is rather confusing. I guess it’s correct in stating that NvBufSurfaceSyncForCpu() is Valid only for NVBUF_MEM_SURFACE_ARRAY and NVBUF_MEM_HANDLE memory types which explains -1 return value in my case, but then I don’t understand why NvBufSurfaceMap() doc says that The client must call NvBufSurfaceSyncForCpu() with the virtual address populated by this function before accessing the mapped memory in CPU. I mean unified memory must be an exception?

Another thing I don’t understand about the example usage of this API in gst-dsexample plugin is when I should call NvBufSurfaceUnMap(). As far as I see in the code for blur-object type of processing, UnMap gets called only in case blur_objects() returns error, otherwise we go and map the next frame in the batch without unmapping previous one. Is this usage correct? What does NvBufSurfaceUnMap() do?

Hi,
We are working on giving out an example. Will update.

Hi @DaneLLL,

Any updates on the python example?

Thanks.

Hi,
The sample is under development. We have not included it in latest DS5.0.1.