I am using AGX Xavier 32G. I see in the datasheet the Maximum number of streams of Encoder is 1080p30(32).
So I ran 32 streams of 1080p30 of Encoder, it works pretty good.
But when I simutaniously ran 30 streams of 1080p30 NvBufferComposite using VIC, the Encoder throughput is largely affected. The compostion is in different process.
So does the VIC composition really affects the Encoder throughput? If yes, why?
In every stream of the 32 streams Encoder, there is a Decoder before it, the type of Decoder capture plane and the type of Encoder output plane are both V4L2_MEMORY_DMABUF. And after the Decoder, there is a NvBufferTransform for transformation from BlockLinear to PitchLinear.
Hi,
The tables in Xavier module data sheet is decoding only and encoding only. For decoding + encoding, it may not achieve 32 instances. So you have tried decoding + encoding without VIC and it can run up to 32 instances?
Hi,
We don’t have existing sample to check this and need your help to share information. So you can run 32 instances in decoding + encoding, decoding + NvBufferComposite, cannot achieve 32 instances in decoding + NvBufferComposite + encoding?
Generally NvBufferComposite() is called to composite frames from each source into one video frame. Is this your usecase? Or you use it in different way?
Our complete use case is that in every instance, we want to run decoding + resampling/convert + composite + (some other image process) + encoding. We want to run as many instances as possible. We expect every instance has a stable 30 fps. The NvBufferComposite() is used to composite different frames to one frame.
When I tried to run all the components together, I didn’t get a good stable result compared to the datasheet. So I tried to take out one or some of the components to check which one affects most, that leads to my question.
I even tried runing 32 instances of only encoding in one process (every instance considered as a separate thread), and 30 instances of only NvBufferComposite() in another process simutanously. The performance droped.
So I wonder if there might be some inner resource/buffer/dmabuf management shared by both of them? This is just my guess.
Any advice to improve the performance is highly appreciated. Thanks a lot.
Hi,
Please execute sudo nvpmodel -m 0 and sudo jetson_clocks for a try. It is MAXN mode listed in development guide.
And there might be improvement if you have all buffers in block linear format. If you send pitch linear buffers to encoder, please try the case of sending block linear buffers.
I tried sending block linear to encoder. The result were the same. I wonder if there is something conflicts between encoder and NvBufferComposite. Or there is somewhere I use it wrong.
Hi,
So the usecase may hit system limitation. Please execute sudo tegrastats to get system loading. If there is still room on GPU, you can try to implement downscale/resampling/compositing functions through CUDA, so that the loading can be shifted from VIC to GPU. This might bring some improvement.
Thanks for the advice. We are actually evaluating possible ways to make the most stable instances. Considering this problem, I simply want to know the reason for better using encoder and VIC composite.