Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU): GeForce RXT 3090
• DeepStream Version: 5.1
• TensorRT Version: 7.2
• NVIDIA GPU Driver Version (valid for GPU only): 470
• Issue Type( questions, new requirements, bugs): question
I currently have a pipeline that run a single detector and multiple classifiers sequentially afterwards:
sources -> muxer -> detector -> classifier1 -> classifier2 -> classifier3 -> [process NvDsBatchMeta]
Is it possible to modify the pipeline as follow to run the classifiers in parallel after the detector?
sources -> muxer -> detector |-> classifier1 |
|-> classifier2 | -> [process NvDsBatchMeta]
|-> classifier3 |
It’s technically possible. But its details may be very ugly.
To my understanding, copy-on-write strategies are used everywhere in deepstream’s pipeline. So the inference results would be attached to the same copy of downstream metadata by all classifiers.
If you attached these metadata one-by-one, everything should be OK. But when you used multiple-threads (tee + queue) to modify the same copy of downstream metadata, there would be a problem. Things should be done with caution.
I also noticed that there would be a big problem to use tee. tee is a evil, but it’s another story: )
Thank for the insights @neoragex2002 , I was thinking along the line of using tee + queue as well but I don’t know how to merge 3 “versions” of
NvDsBatchMeta, my plan was:
sources -> muxer -> detector -> tee |-> queue -> classifier1 | \
|-> queue -> classifier2 | ---- > [using GstAggregator?] -> [subsequent processing]
|-> queue -> classifier3 | /
Thanks for mentioning some associated gotchas with the tee + queue approach. You mention “tee is a evil”, do you mind elaborate or point me some resources that I can read?
We are working on parallel inference pipeline. But it can’t available currently. Please add video convert before classifier to avoid modify the same batch meta between the parallel classifier.
Sorry there is very little information on the internet about the limits of tee in deepstream scenario.
Original gstreamer element seems not support batched buffer natively. But in deepstream this feature it is a must. So there would be some complex issues that should be considered.
In my experience, if you tee-queue some single frame, anything would be OK. But if you tee-queue some batched frames in deepstream, weired things gonna happen. I don’t exactly know why because deepstream elements are basically black-boxes. but maybe the troubles lie in the reference-counting fact of the batching buffer.
“add video convert before branch to avoid modify the same batch meta”. Very interesting hints!
Does this mean that we can duplicate (instead of ref-count) the frame buffers (batched or not) by using
nvvideoconvert to convert them into some different format?
Yes, nvvideoconvert to different resolution will generate one new buffer instead of add refcount of gstbuffer. Tee will add refcount of gstbuffer, gst_buffer_make_writable() will unref gstbuffer, but share gstmemory, which will cause gstbuffer return to buffer pool. It is gstreamer limitation.
It’s great to know that parallel inference pipeline is in the work. The motivation I have to move from linear to parallel is for performance boost. I need inference results from all 3 classifiers to make decision on certain task. In linear mode, the total wait time is
(t1 + t2 + t3) where
ti is the time taken by
classifier i to finish its inference. In parallel mode, theoretically, the total wait time is reduced to
max(t1, t2, t3).
Currently, is there anyway to combine the results from 3 parallel branches to proceed as if the 3 classifiers are linked linearly?
We are working on one metamux plugin to combine the result from 3 parallel inference.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.