Please provide complete information as applicable to your setup.
**• Hardware Platform ---------> GPU
**• DeepStream Version -----------> 7.0 • TensorRT Version ------------> 8.9
**• NVIDIA GPU Driver Version --------> 545
In deepstream every test app has a “add_probe” and how we can access metadata.My question is If I have 100 stream and extracting the metadata out of it (“frame_meta.pad_index or frame_meta.source_id” ) taking of 100 cameras metadata sequentially It would take more time !
can You suggest us some good approach where I can do parallel processing for all stream and reduce the total time ?
After the streams being combined into the batch inside nvstreammux, there is only one batch meta in the pipeline, to read the frame metas from the batch meta takes very little time. Why do you think “taking of 100 cameras metadata sequentially It would take more time !”?
For each stream I taking frame out of it. It’s taking much more time than without frame. when it’s sequentially doing at the end of 100 stream it’s taking huge time and frame drops also coming.
How am I accessing the frame ?
# if user_data==0:
data_type, shape, strides, dataptr, size = pyds.get_nvds_buf_surface_gpu(hash(gst_buffer), frame_meta.batch_id)
# dataptr is of type PyCapsule → Use ctypes to retrieve the pointer as an int to pass into cupy
ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.c_void_p
ctypes.pythonapi.PyCapsule_GetPointer.argtypes = [ctypes.py_object, ctypes.c_char_p]
# Get pointer to buffer and create UnownedMemory object from the gpu buffer
c_data_ptr = ctypes.pythonapi.PyCapsule_GetPointer(dataptr, None)
unownedmem = cp.cuda.UnownedMemory(c_data_ptr, size, owner)
# Create MemoryPointer object from unownedmem, at index 0
memptr = cp.cuda.MemoryPointer(unownedmem, 0)
# Create cupy array to access the image data. This array is in GPU buffer
n_frame_gpu = cp.ndarray(shape=shape, dtype=data_type, memptr=memptr, strides=strides, order=‘C’).get()
like this I’m accessing the frame for each stream and accessing frame is necessary for us.
If you can give some suggestion to taking the frame more faster way or if I can optimised it some other way !!!
Hi, as per the example deepstream_test_3.py in pgie_src_pad_buffer_probe. Is there any way we can copy NvDsBatchMeta instead of pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer)) and return Gst.PadProbeReturn.OK. And use that BatchMeta for our business logic afterwards
Consideration. We are handling 100 cameras looping through all frames in batches and then looping through all the objects in each frame is delaying the whole pipeline
Aldso considering the suggested approach use of Thread creates a extra overhead when using it for frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
The batch meta is inside GstBuffer. pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer)) just get the batch meta from GstBuffer. It is very fast and it is the correct and efficient way to get the batch meta.
I have a requirement to run the pipeline at 25 fps with 100 cameras and each camera has a minimum of 20 obj
So while I was sequentially taking the batch meta which means looping through each frame → each object → then from each obj we need to loop again for classifier_meta_list, obj_user_meta_list
Which is a big bottleneck and then only I can return the Gst.PadProbeReturn.OK
So I want to bypass this entire process by copying and returning Gst.PadProbeReturn.OK
If you do your business logic in the GStreamer probe function(callback), please make sure that the operation should be fast enough, the probe function will block the pipeline.
You’d better implement your business function in other threads than the pipeline thread.
To get the object meta and frame meta from the batch meta itself will not be the bottleneck. You need to make sure your business logic with the metadata will not take too much time, or else, you need to do the business logic in another thread to make sure the metadata and the GstBuffer is released back to the pipeline before your business logic finishes.
As to @debjit.adak mentioned in another topic that you will get every frame data from the pipeline of the 100 streams. I’m wondering why you want to do some operation to every frame outside the pipeline? Can the operation be done inside some customized plugin with CUDA acceleration?
I understand that for my business logic, 100 cameras will lead to 100 threads and do everything in a round-robin way.
I want an approach which copies the BatchMeta and returns Gst.PadProbeReturn.OK, and use the BatchMeta later.
@debjit.adak mentioned will get “every frame data from the pipeline of the 100 streams” We are trying a different approach for it. As you mentioned “Can the operation be done inside some customized plugin with CUDA acceleration?”. Pls suggest some links to this approach
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks