How I can parallel access the meta data for each stream?

Please provide complete information as applicable to your setup.

**• Hardware Platform ---------> GPU
**• DeepStream Version -----------> 7.0
• TensorRT Version ------------> 8.9
**• NVIDIA GPU Driver Version --------> 545

In deepstream every test app has a “add_probe” and how we can access metadata.My question is If I have 100 stream and extracting the metadata out of it (“frame_meta.pad_index or frame_meta.source_id” ) taking of 100 cameras metadata sequentially It would take more time !
can You suggest us some good approach where I can do parallel processing for all stream and reduce the total time ?

After the streams being combined into the batch inside nvstreammux, there is only one batch meta in the pipeline, to read the frame metas from the batch meta takes very little time. Why do you think “taking of 100 cameras metadata sequentially It would take more time !”?

Hi @Fiona.Chen

For each stream I taking frame out of it. It’s taking much more time than without frame. when it’s sequentially doing at the end of 100 stream it’s taking huge time and frame drops also coming.
How am I accessing the frame ?
# if user_data==0:
data_type, shape, strides, dataptr, size = pyds.get_nvds_buf_surface_gpu(hash(gst_buffer), frame_meta.batch_id)
# dataptr is of type PyCapsule → Use ctypes to retrieve the pointer as an int to pass into cupy
ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.c_void_p
ctypes.pythonapi.PyCapsule_GetPointer.argtypes = [ctypes.py_object, ctypes.c_char_p]
# Get pointer to buffer and create UnownedMemory object from the gpu buffer
c_data_ptr = ctypes.pythonapi.PyCapsule_GetPointer(dataptr, None)
unownedmem = cp.cuda.UnownedMemory(c_data_ptr, size, owner)
# Create MemoryPointer object from unownedmem, at index 0
memptr = cp.cuda.MemoryPointer(unownedmem, 0)
# Create cupy array to access the image data. This array is in GPU buffer
n_frame_gpu = cp.ndarray(shape=shape, dtype=data_type, memptr=memptr, strides=strides, order=‘C’).get()

like this I’m accessing the frame for each stream and accessing frame is necessary for us.
If you can give some suggestion to taking the frame more faster way or if I can optimised it some other way !!!

To copy the video frames one by one out from multiple streams will not be fast even if you copy the frames in parallel.

There are multiple frames in the batch if you set the “batch-size” as to the more than 1 value. Maybe you can consider to copy the frames in parallel threads. threading — Thread-based parallelism — Python 3.12.4 documentation

Hi, as per the example deepstream_test_3.py in pgie_src_pad_buffer_probe. Is there any way we can copy NvDsBatchMeta instead of pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer)) and return Gst.PadProbeReturn.OK. And use that BatchMeta for our business logic afterwards

Consideration. We are handling 100 cameras looping through all frames in batches and then looping through all the objects in each frame is delaying the whole pipeline

Aldso considering the suggested approach use of Thread creates a extra overhead when using it for frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)

@snehashish.debnath
Are you the co-worker with the poster of this topic? If not, please create your own topic. Thank you!

Hi @Fiona.Chen

@snehashish.debnath is my co-worker. If you can answer his question that would be great to solve our problem.

1 Like

The batch meta is inside GstBuffer. pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer)) just get the batch meta from GstBuffer. It is very fast and it is the correct and efficient way to get the batch meta.

@debjit.adak
I think your request is to get the frames data from the 100 video streams. What will you do with these frames?

I have a requirement to run the pipeline at 25 fps with 100 cameras and each camera has a minimum of 20 obj

So while I was sequentially taking the batch meta which means looping through each frame → each object → then from each obj we need to loop again for classifier_meta_list, obj_user_meta_list
Which is a big bottleneck and then only I can return the Gst.PadProbeReturn.OK

So I want to bypass this entire process by copying and returning Gst.PadProbeReturn.OK

What is the GPU? What is the video format in the streams? The resolution and framerate?

What do you want to do with the object meta you got? The loop of getting the object meta from the batch meta will not take too much time.

With Object meta, then we have our business logic … what we use the object data

Our fps drops from 25fps to 18 fps

Is there any way to copy NvDsBatchMeta and use it later after returning Gst.PadProbeReturn.OK

If you do your business logic in the GStreamer probe function(callback), please make sure that the operation should be fast enough, the probe function will block the pipeline.

You’d better implement your business function in other threads than the pipeline thread.

I do understand this …but the thread is also an extra overhead, like creating and managing this thread will do everything in a round-robin way.

So I don’t want this bottleneck itself and eliminate the hole waiting by coping the BatchMeta and returning Gst.PadProbeReturn.OK

To get the object meta and frame meta from the batch meta itself will not be the bottleneck. You need to make sure your business logic with the metadata will not take too much time, or else, you need to do the business logic in another thread to make sure the metadata and the GstBuffer is released back to the pipeline before your business logic finishes.

As to @debjit.adak mentioned in another topic that you will get every frame data from the pipeline of the 100 streams. I’m wondering why you want to do some operation to every frame outside the pipeline? Can the operation be done inside some customized plugin with CUDA acceleration?

I understand that for my business logic, 100 cameras will lead to 100 threads and do everything in a round-robin way.

I want an approach which copies the BatchMeta and returns Gst.PadProbeReturn.OK, and use the BatchMeta later.

@debjit.adak mentioned will get “every frame data from the pipeline of the 100 streams” We are trying a different approach for it. As you mentioned “Can the operation be done inside some customized plugin with CUDA acceleration?”. Pls suggest some links to this approach

We will not encourage you to copy the whole batch meta. You may copy some parts which are useful for you by the APIs configure_source_for_ntp_sync — Deepstream Deepstream Version: 7.0 documentation

For the frame data operation, the python API is of no use. You need to customize in c/c++ with Gst-nvdsvideotemplate — DeepStream documentation 6.4 documentation, it is open source.