Good morning!
I am working on Jetson AGX application for my client. The core of it consists of four cameras. Each one is being captured @30FPS using Argus, and then YUV images of the frames are fed into NvEncoder. There is shared frame pool so the interface is zero-copy. NvEncoder is configured to create h.265 stream with our custom configuration (slowest encoding preset, and other specifics). The application is very latency-sensitive, we strive to get lowest latency possible.
Now, the first version of the application consisted of four separate processes, each one capturing video from one camera using one Argus session, and feeding one encoder. Like this:
- Wait for the next frame from camera session
- Feed the frame from the encoder
- Wait for bitstream to be available at the encoder output
When we measured encoder latency (basically the time it took to complete two last steps) it was pretty stable at 10ms or so.
Right now we are working on second version of the application. Among the other things, we would like the option to software synchronise the cameras, so right now the process can create Argus session with two or three cameras, and the frames are fed to separate encoder instances at the same time. For example with two frames I have:
- Wait for next set of frames from recoding session
- Feed frame 0 into encoder 0
- Feed frame 1 into encoder 1
- Wait for the bitstream to be available from encoder 0 output
- Wait for the bitstream to be available from encoder 1 output
Now, when we measure the encoding latency for stream 0 and stream 1 we are getting about 15ms between feed 0 and output 0, and some 4+ms more before output 1 becomes available. To be honest, this is a serious problem for us, because we thought that h.265 video encoder of Jetson AGX can operate in parallel, ie we expected both stream 0 and stream 1 bitstream output to be available after 10ms.
At first, we thought that we are doing something wrong (single-process camera encoders reported 10ms after all). We did a lot of testing and it turned out that 10ms latency is only possible if hardware is solely dedicated to one process. When single process encoders do their work at the same time, we are getting very similar latency pattern, with one of the encoders taking more time to encode.
We also performed throughput testing, and it seems that AGX can peform some 230 frames/second for our stream configuration, so something like 4ms/frame (obviously when working in parallel, in saturated fashion).
So our conclusions are:
- At least part of the hardware encoder is “shared” between separate NvEncoder instances, even if they are in separated process. That shared part causes physical encoding to become “serialised”, and the encoder that starts encoding later has to wait for the encoder that started earlier
- Therefore, in order to get lowest latency possible for single frame, only single encoder instance should encode at any given time.
- On the other hand, getting maximum throughput requires several encoders with enough frames supplied so that the encoding engine will be busy all the time.
I will be very grateful for any comments, hints or explanations related to the above. Especially with respect to how to get the lowest frame latency possible for several h.265 streams at the same time.
Best
Michal