If we use deepstream in real system, This question must be answered:
- Since the DeepStream manage the frame pool, how can we customise the batch size(frame number) come from the input module? we really need this function as we analysis video stream with some special batch size, or we need invent another tiny frame pool in customer module.
- How to calculate or evaluate the memory using of PacketCache and FRAME POOL in DeepStream, can it scale in product environment.
- There is only one Flexible Pipeline to do the frame job generated by multiply video stream, Is there a strategy to do the real video overflow control? I mean if the Flexible Pipeline calculate speed is slow than whole real video stream generation, how to handle it, by who?
Could you share more information about the batch size mentioned here?
Is it for deep learning inference or the number of decoded frames?
You can use nvprof to monitor GPU memory
In our sample, we show how to get the information of pipeline performance.
And we don’t contain the implementation of the slow pipeline case you mentioned.
We have a deep network which detect the human’s action, it analysis 10 frame for batch, and the video fps is setting to 10. Yes, we analysis every one second. It is for deep learning inference module in DeepStream library.
We want to control the Host and GPU memory usage, not only profile them. We hope the PacketCache and FRAME POOL doesn’t consume too much memory.
Then in you detection sample, how many channel 1080p video max supported with one inference instance?
We can control inference module GPU memory consuming by using inferenceParams.workspaceSize_, what PacketCache and FRAME POOL memory consuming depends, GPU or Host memory?. Let me guess: PacketCache use host memory, and it depends on channel and packet size(average, min, max?). more accurate form: PacketCache Size = size of each one video packet * N packets for each video stream * channel. How about FRAME POOL?
We need to calculate them because we need assemble the hardware and report to the boss: This is the hardware and software configuration(price$) and it got this performance: bla, bla, bla…
For inference module, we have several implementations and can automatically select via the maximal workspace value from users.
For decoder module, the implementation is fixed.
More, decoder use GPU memory.
But if the input stream is located in CPU memory, it also consumes some CPU memory.
Thanks, I got a clue to estimate resource requirements for DeepStream.