Inconsistent and very long H.264 encoding latency

This is a follow-up post from the previous post: https://devtalk.nvidia.com/default/topic/1028317/jetson-tx2/h264-encoder-latency/post/5260699/#5260699. I actually replied in that post, but for some reasons, it mentions “hidden posts not numbered”, so I need to start a new thread here.

Follow this previous question, I’ve run the camera recording sample from the tegra multimedia api for several times and keep getting different encoding profiling results (I runned the jetson_clock.sh script.):

The first one:
----------- Element = enc0 -----------
Total Profiling time = 0.865153
Average FPS = 31.2084
Total units processed = 28
Average latency(usec) = 1703
Minimum latency(usec) = 110
Maximum latency(usec) = 3464

The result seems pretty nice. Based on the profiling time, the average latency is 1.7ms. However, based on the log, it seems that the encoding latency is still larger than a frame period (33 ms). To help us understand the problem, I add another log in the encoderCapturePlaneDqCallback function, which should be called when a frame is finished encoding and waits inside a capture plane. We got the following result (I put some orange comments to help us understand):

The log is:

CONSUMER: Acquired Frame. 1828717843//1st frame acquired from the camera.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 22.//This is probably some metadata because it is only 22 bytes.
CONSUMER: Acquired Frame. 1828717844//2nd frame acquired from the camera.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 22020.//1st frame get encoded, which happens after the 2nd frame acquired from the camera.
CONSUMER: Acquired Frame. 1828717864//3rd frame acquired from the camera.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 25105.//2nd frame get encoded.
CONSUMER: Acquired Frame. 1828717866//4th frame acquired from the camera.

CONSUMER: Acquired Frame. 1828717844//nst frame acquired from the camera.
CONSUMER: Released frame. 1828717864
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 19179.//(n-1)th frame get encoded.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 19732.//nth frame get encoded.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 0.
CONSUMER: Got EOS, exiting…
CONSUMER: Done.

We find that the encoding latency from the time frame acquired from the camera to the time the frame finishes encoding is longer than a frame time (33ms).

The second result:
----------- Element = enc0 -----------
Total Profiling time = 0.861473
Average FPS = 31.3417
Total units processed = 28
Average latency(usec) = 32612
Minimum latency(usec) = 260
Maximum latency(usec) = 35255

The average encoding latency is 32.6ms, which is around a frame period, but the log shows the latency is actually larger than 2 frame periods (66.7ms).

CONSUMER: Acquired Frame. 1828717843//1st frame.
CONSUMER: Acquired Frame. 1828717844//2nd frame.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 22.//This is probably some metadata because it is only 22 bytes.
CONSUMER: Acquired Frame. 1828717864//3rd frame.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 22020.//1st frame get encoded.
CONSUMER: Acquired Frame. 1828717866//4th frame.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 25008.//2nd frame get encoded.

CONSUMER: Acquired Frame. 1828717844//nst frame acquired from the camera.
CONSUMER: Released frame. 1828717864
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 24758.//(n-2)th frame get encoded.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 25853.//(n-1)th frame get encoded.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 18977.//nth frame get encoded.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 0.
CONSUMER: Got EOS, exiting…
CONSUMER: Done.
PRODUCER: Done – exiting.

[b]From this result, we verify the argument that each frame will finish encoding after 1 or 2 frame periods + some time. Since I’m capturing 30fps video with 640x480 resolution, the time of 2 frame periods is 66.7ms.

So the results are really wired, here are three main questions:

  1. Why does the encoding profiling result keep changing?
  2. Why does the log show longer encoding latency than the profiling result?
  3. Can we really force every frame to be decoded straight away without waiting for the next frame?

Current high-end smartphones can encode a 640*480 frame within 10ms, it will be embarrassing if Jetson requires more than 66ms to do so…[/b]

Hi,
We have profiler open source tegra_multimedia_api\samples\common\classes\NvApplicationProfiler.cpp

You should get consistent result after jetson_clocks.sh being correctly run. Please check tegrastats.

Hi Danell,

Let’s leave the consistency problem away for now. I found that there’s a logging issue exist here. I dug into the detail of the startProcessing and stopProcessing function in the NvElementProfiler.cpp file. The latency of encoding a single frame is actually more than one frame based on my logs, but the profiler gives the wrong result. I’ll explain the reason for that using the following logs.

The Encoder log is here, result looks perfect. The minimum latency is only 110 microsecond, which is increadible.
----------- Element = enc0 -----------
Total Profiling time = 0.865153
Average FPS = 31.2084
Total units processed = 28
Average latency(usec) = 1703
Minimum latency(usec) = 110
Maximum latency(usec) = 3464

However, the reason for this perfect result is the starting time and ending time used to calculate the latency does not correspond. Here’s my own log.

CONSUMER: Acquired Frame. 1828717843//1st frame acquired from the camera.

CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 22.//This is probably some metadata because it is only 22 bytes. However, the profiler uses this as the ending time of the 1st frame, which gives us a very good result.

CONSUMER: Acquired Frame. 1828717844//2nd frame acquired from the camera.

CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 22020.//1st frame get encoded, which happens after the 2nd frame acquired from the camera. This is the correct encoding end time of 1st frame, which is more than 33ms after the frame is queued into the encoder.

CONSUMER: Acquired Frame. 1828717864//3rd frame acquired from the camera.

CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 25105.//2nd frame get encoded.

CONSUMER: Acquired Frame. 1828717866//4th frame acquired from the camera.


CONSUMER: Acquired Frame. 1828717844//nst frame acquired from the camera.

CONSUMER: Released frame. 1828717864
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 19179.//(n-1)th frame get encoded.

CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 19732.//nth frame get encoded. This also proves our hypothesis, since we get two encoded frame out after the last frame is filled into the encoder.

CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 0.
CONSUMER: Got EOS, exiting…
CONSUMER: Done.

This is a big problem that should be solved in the next version…

I’ll keep checking what is the reason to have this large latency…

Hi,

The time between the two should be very small and neglectable. Are you sure it is over 33ms?

Hi Danel,

Here’s another log I get when running this camera recording sample. To clarify, I add three more kinds of logs start with ********.

“startProcessing” is logged in the NvElementProfiler::startProcessing() function to record the unit_id_counter and time.tv_usec(the unit is microsecond) respectively.
“finishProcessing” is logged in the NvElementProfiler::finishProcessing() function to record the id and stop_time.tv_usec.
“Latency” is the latency calculated in the finishProcessing() function using TIMESPEC_DIFF_USEC(…) function.

I do see that the first two “encoderCapturePlaneDqCallback” logs have a time difference of around 124245 - 120718 = 3.527ms.

CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 22.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 25593.

However, start from the second frame, we see that the time difference is pretty huge. And the latency recorded by the profiler is fake. I’ll explain in the following logs.

Still the same, I show the profiling result first, which looks perfect.

----------- Element = enc0 -----------
Total Profiling time = 0.86948
Average FPS = 32.2032
Total units processed = 29
Average latency(usec) = 1738
Minimum latency(usec) = 1520
Maximum latency(usec) = 3474

Here’s my log.

CONSUMER: Acquired Frame. 1828717843//1st frame queue.
*************startProcessing Id 1, time: 117244(unit is microsecond).
*************finishProcessing Id 0, time: 120718.
**************Latency: 3474 us.//This is the time difference between 1st frame in queue to the metadata comes out, which is definitly not the real latency. But it is used by the profiler to provide the profiling result…
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 22.//Metadata.

CONSUMER: Acquired Frame. 1828717844//2nd frame queue.
*************startProcessing Id 2, time: 122644.
*************finishProcessing Id 0, time: 124245.
**************Latency: 1601 us.//Fake latency.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 25593.//1st frame dequeue. Real Latency: 124.245 - 117.244 = 7.001ms

CONSUMER: Acquired Frame. 1828717864//3rd frame queue.
*************startProcessing Id 3, time: 155451.
*************finishProcessing Id 0, time: 157110.
**************Latency: 1659 us.//Fake latency.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 24465.//2nd frame dequeue. Real Latency: 157.110 - 122.644 = 34.466ms

CONSUMER: Acquired Frame. 1828717866//4th frame queue.
*************startProcessing Id 4, time: 189096.
*************finishProcessing Id 0, time: 190748.
**************Latency: 1652 us.//Fake latency.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 25907.//3rd frame dequeue. Real Latency: 190.748 - 155.451 = 35.297ms

CONSUMER: Acquired Frame. 1828717867//5th frame queue.
*************startProcessing Id 5, time: 222298.
CONSUMER: Released frame. 1828717843
*************finishProcessing Id 0, time: 223907.
**************Latency: 1609 us.//Fake latency.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 24719.//4th frame dequeue. Real Latency: 223.907 - 189.096 = 34.811ms

CONSUMER: Acquired Frame. 1828717843
*************startProcessing Id 6, time: 255481.
CONSUMER: Released frame. 1828717844
*************finishProcessing Id 0, time: 257001.
**************Latency: 1520 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 19659.

CONSUMER: Acquired Frame. 1828717844
*************startProcessing Id 7, time: 289321.
CONSUMER: Released frame. 1828717864
*************finishProcessing Id 0, time: 290879.
**************Latency: 1558 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 6418.

CONSUMER: Acquired Frame. 1828717864
*************startProcessing Id 8, time: 322140.
CONSUMER: Released frame. 1828717866
*************finishProcessing Id 0, time: 323795.
**************Latency: 1655 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 12378.

CONSUMER: Acquired Frame. 1828717866
*************startProcessing Id 9, time: 355505.
CONSUMER: Released frame. 1828717867
*************finishProcessing Id 0, time: 357263.
**************Latency: 1758 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 23309.

CONSUMER: Acquired Frame. 1828717867
*************startProcessing Id 10, time: 388823.
CONSUMER: Released frame. 1828717843
*************finishProcessing Id 0, time: 390519.
**************Latency: 1696 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 32957.

CONSUMER: Acquired Frame. 1828717843
*************startProcessing Id 11, time: 422157.
CONSUMER: Released frame. 1828717844
*************finishProcessing Id 0, time: 423957.
**************Latency: 1800 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 28651.

CONSUMER: Acquired Frame. 1828717844
*************startProcessing Id 12, time: 455404.
CONSUMER: Released frame. 1828717864
*************finishProcessing Id 0, time: 457164.
**************Latency: 1760 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 37629.

CONSUMER: Acquired Frame. 1828717864
*************startProcessing Id 13, time: 488762.
CONSUMER: Released frame. 1828717866
*************finishProcessing Id 0, time: 490514.
**************Latency: 1752 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 34089.

CONSUMER: Acquired Frame. 1828717866
*************startProcessing Id 14, time: 522157.
CONSUMER: Released frame. 1828717867
*************finishProcessing Id 0, time: 523938.
**************Latency: 1781 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 33791.

CONSUMER: Acquired Frame. 1828717867
*************startProcessing Id 15, time: 555366.
CONSUMER: Released frame. 1828717843
*************finishProcessing Id 0, time: 557080.
**************Latency: 1714 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 33963.

CONSUMER: Acquired Frame. 1828717843
*************startProcessing Id 16, time: 588730.
CONSUMER: Released frame. 1828717844
*************finishProcessing Id 0, time: 590457.
**************Latency: 1727 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 30010.

CONSUMER: Acquired Frame. 1828717844
*************startProcessing Id 17, time: 621895.
CONSUMER: Released frame. 1828717864
*************finishProcessing Id 0, time: 623616.
**************Latency: 1721 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 30679.

CONSUMER: Acquired Frame. 1828717864
*************startProcessing Id 18, time: 655377.
CONSUMER: Released frame. 1828717866
*************finishProcessing Id 0, time: 657152.
**************Latency: 1775 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 30062.

CONSUMER: Acquired Frame. 1828717866
*************startProcessing Id 19, time: 688628.
CONSUMER: Released frame. 1828717867
*************finishProcessing Id 0, time: 690288.
**************Latency: 1660 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 35747.

CONSUMER: Acquired Frame. 1828717867
*************startProcessing Id 20, time: 722028.
CONSUMER: Released frame. 1828717843
*************finishProcessing Id 0, time: 723700.
**************Latency: 1672 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 23973.

CONSUMER: Acquired Frame. 1828717843
*************startProcessing Id 21, time: 755001.
CONSUMER: Released frame. 1828717844
*************finishProcessing Id 0, time: 756684.
**************Latency: 1683 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 25421.

CONSUMER: Acquired Frame. 1828717844
*************startProcessing Id 22, time: 788415.
CONSUMER: Released frame. 1828717864
*************finishProcessing Id 0, time: 790080.
**************Latency: 1665 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 26108.

CONSUMER: Acquired Frame. 1828717864
*************startProcessing Id 23, time: 821699.
CONSUMER: Released frame. 1828717866
*************finishProcessing Id 0, time: 823355.
**************Latency: 1656 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 24217.

CONSUMER: Acquired Frame. 1828717866
*************startProcessing Id 24, time: 854978.
CONSUMER: Released frame. 1828717867
*************finishProcessing Id 0, time: 856644.
**************Latency: 1666 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 24523.

CONSUMER: Acquired Frame. 1828717867
*************startProcessing Id 25, time: 888413.
CONSUMER: Released frame. 1828717843
*************finishProcessing Id 0, time: 890098.
**************Latency: 1685 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 25526.

CONSUMER: Acquired Frame. 1828717843
*************startProcessing Id 26, time: 921591.
CONSUMER: Released frame. 1828717844
*************finishProcessing Id 0, time: 923217.
**************Latency: 1626 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 26859.

CONSUMER: Acquired Frame. 1828717844
*************startProcessing Id 27, time: 954942.
CONSUMER: Released frame. 1828717864
*************finishProcessing Id 0, time: 956574.
**************Latency: 1632 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 22028.

CONSUMER: Acquired Frame. 1828717864
*************startProcessing Id 28, time: 988390.
CONSUMER: Released frame. 1828717866
*************startProcessing Id 29, time: 988609.
*************finishProcessing Id 0, time: 990049.
**************Latency: 1659 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 22829.

*************finishProcessing Id 0, time: 990198.
**************Latency: 1589 us.
CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 23752.

CONSUMER: encoderCapturePlaneDqCallback, encoded frame size: 0.
CONSUMER: Got EOS, exiting…
CONSUMER: Done.
PRODUCER: Done – exiting.

Hi,
In default configuration, it encodes in I P P P …

So for encoding (n) th frame, encoder keeps (n-1) th frame as reference frame. What you have observed is normal and not an issue.

I definitely understand the encoding mechanism, but you do not answer the question.

What I care is latency. In my understanding, encoding latency means the time interval from the moment a raw camera frame is queued into the encoder to the time the encoded frame is dequeued from the encoder.

What I observe is that when I insert the (n) th frame into the encoder, I got the corresponded encoded frame after more than 33ms. And more importantly, your profiler still reports me an average 1.7ms latency, which sound pretty bad to me…

I don’t understand how does this problem related to reference frames.

Hi luyangijj9c,

For your reference.

My application log (3840x2160/30fps):

1524655407.351970: [ENC] inBuf_to_enc():  qbuf output  plane[00] :1524655406.330847 q:00
1524655407.378639: [ENC] inBuf_to_enc():  qbuf output  plane[01] :1524655406.370724 q:01
1524655407.383165: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[00] :1524655406.330847 q:02 bytesused=[23]
1524655407.383526: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[01] :1524655406.330847 q:02 bytesused=[36770]
1524655407.395200: [ENC] inBuf_to_enc():  qbuf output  plane[02] :1524655406.404069 q:01
1524655407.409742: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[02] :1524655406.370724 q:02 bytesused=[209633]
1524655407.422859: [ENC] inBuf_to_enc():  qbuf output  plane[03] :1524655407.370778 q:00
1524655407.436245: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[00] :1524655406.404069 q:02 bytesused=[9320]
1524655407.454680: [ENC] inBuf_to_enc():  qbuf output  plane[04] :1524655407.404085 q:00
1524655407.467590: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[01] :1524655407.370778 q:02 bytesused=[8036]
1524655407.487062: [ENC] inBuf_to_enc():  qbuf output  plane[05] :1524655407.437429 q:00
1524655407.499965: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[02] :1524655407.404085 q:02 bytesused=[7144]
1524655407.520027: [ENC] inBuf_to_enc():  qbuf output  plane[00] :1524655407.470776 q:00
1524655407.532914: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[00] :1524655407.437429 q:02 bytesused=[4590]
1524655407.553424: [ENC] inBuf_to_enc():  qbuf output  plane[01] :1524655407.504121 q:00
1524655407.566291: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[01] :1524655407.470776 q:02 bytesused=[4764]
1524655407.586249: [ENC] inBuf_to_enc():  qbuf output  plane[02] :1524655407.537466 q:00
1524655407.599134: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[02] :1524655407.504121 q:02 bytesused=[4862]
1524655407.620334: [ENC] inBuf_to_enc():  qbuf output  plane[03] :1524655407.570760 q:00
1524655407.633219: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[00] :1524655407.537466 q:02 bytesused=[6420]
1524655407.645873: [ENC] inBuf_to_enc():  qbuf output  plane[04] :1524655407.604105 q:00
1524655407.658788: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[01] :1524655407.570760 q:02 bytesused=[6198]
1524655407.678935: [ENC] inBuf_to_enc():  qbuf output  plane[05] :1524655407.637451 q:00
1524655407.691857: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[02] :1524655407.604105 q:02 bytesused=[6245]
1524655407.712065: [ENC] inBuf_to_enc():  qbuf output  plane[00] :1524655407.670798 q:00
1524655407.725022: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[00] :1524655407.637451 q:02 bytesused=[7074]
1524655407.745870: [ENC] inBuf_to_enc():  qbuf output  plane[01] :1524655407.704092 q:00
1524655407.758822: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[01] :1524655407.670798 q:02 bytesused=[7695]
1524655407.806912: [ENC] controlEnc(): sendEOStoEncoder(): index=[2]
1524655407.807111: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[02] :1524655407.704092 q:02 bytesused=[7589]
1524655407.807282: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[00] :         0.000000 q:02 bytesused=[0]

The output NAL unit that bytesused is 22 or 23 bytes are StartCode/SPS/StartCode/PPS.
Its next NAL unit is an IDR that has same timestamp.
The other NAL units are P-pictures.

The latency for IDR is larger than 33msec, but the interval for P-picture is almost less than 33msec.
I assume that 1 frame is kept internally as a result (wating for the end of encoding the previous picture).

Best Regards

Hi mynaemi,

Thank you very much for your reply!
Now we have your log. And I observe the same issue in your log. I marked the latency of each frame in the log. Several main takeaways:

  1. all frames have encoding latency larger than 30msec.
  2. Frame 3-14 have longer than 40msec latency which is so long.
  3. I still observe the problem that the (n)th frame will dq after the (n+1)th frame is queued into the encoder, which is really an overhead of encoding latency.
  4. For each item in your log, I can find two timestamps (one in the front and another one in the end). I assume the front timestamp to be the correct timestamp because it follows the correct time sequence (the second timestamp is jumping back and forth).
1524655407.351970: [ENC] inBuf_to_enc():  qbuf output  plane[00] :1524655406.330847 q:00//Frame 1 in queue.
1524655407.378639: [ENC] inBuf_to_enc():  qbuf output  plane[01] :1524655406.370724 q:01//Frame 2 in queue.
1524655407.383165: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[00] :1524655406.330847 q:02 bytesused=[23]//NAL unit dequeue. 
1524655407.383526: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[01] :1524655406.330847 q:02 bytesused=[36770]//Frame 1  (the IDR frame) dequeue, <b>Latency is 1524655407.383526 - 1524655407.351970 = 31.556ms.</b>
1524655407.395200: [ENC] inBuf_to_enc():  qbuf output  plane[02] :1524655406.404069 q:01//Frame 3 in queue.
1524655407.409742: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[02] :1524655406.370724 q:02 bytesused=[209633]//Frame 2 (a P frame) dequeue, <b>Latency is 1524655407.409742 - 1524655407.378639 = 31.103ms.</b>
1524655407.422859: [ENC] inBuf_to_enc():  qbuf output  plane[03] :1524655407.370778 q:00//Frame 4 in queue.
1524655407.436245: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[00] :1524655406.404069 q:02 bytesused=[9320]//Frame 3 (a P frame) dequeue, <b>Latency is 1524655407.436245 - 1524655407.395200 = 41.045ms.</b>
1524655407.454680: [ENC] inBuf_to_enc():  qbuf output  plane[04] :1524655407.404085 q:00//Frame 5 in queue.
1524655407.467590: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[01] :1524655407.370778 q:02 bytesused=[8036]//Frame 4 (a P frame) dequeue, <b>Latency is 1524655407.436245 - 1524655407.395200 = 44.731ms.</b>
1524655407.487062: [ENC] inBuf_to_enc():  qbuf output  plane[05] :1524655407.437429 q:00//Frame 6 in queue.
1524655407.499965: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[02] :1524655407.404085 q:02 bytesused=[7144]//Frame 5 (a P frame) dequeue, <b>Latency is 1524655407.499965 - 1524655407.454680 = 45.285ms.</b>
1524655407.520027: [ENC] inBuf_to_enc():  qbuf output  plane[00] :1524655407.470776 q:00//Frame 7 in queue.
1524655407.532914: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[00] :1524655407.437429 q:02 bytesused=[4590]//Frame 6 (a P frame) dequeue, <b>Latency is 1524655407.532914 - 1524655407.487062 = 45.852ms.</b>
1524655407.553424: [ENC] inBuf_to_enc():  qbuf output  plane[01] :1524655407.504121 q:00//Frame 8 in queue.
1524655407.566291: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[01] :1524655407.470776 q:02 bytesused=[4764]//Frame 7 (a P frame) dequeue, <b>Latency is 1524655407.566291 - 1524655407.520027 = 46.264ms.</b>
1524655407.586249: [ENC] inBuf_to_enc():  qbuf output  plane[02] :1524655407.537466 q:00//Frame 9 in queue.
1524655407.599134: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[02] :1524655407.504121 q:02 bytesused=[4862]//Frame 8 (a P frame) dequeue, <b>Latency is 1524655407.599134 - 1524655407.553424 = 45.710ms.</b>
1524655407.620334: [ENC] inBuf_to_enc():  qbuf output  plane[03] :1524655407.570760 q:00//Frame 10 in queue.
1524655407.633219: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[00] :1524655407.537466 q:02 bytesused=[6420]//Frame 9 (a P frame) dequeue, <b>Latency is 1524655407.633219 - 1524655407.586249 = 46.970ms.</b>
1524655407.645873: [ENC] inBuf_to_enc():  qbuf output  plane[04] :1524655407.604105 q:00//Frame 11 in queue.
1524655407.658788: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[01] :1524655407.570760 q:02 bytesused=[6198]//Frame 10 (a P frame) dequeue, <b>Latency is 1524655407.658788 - 1524655407.620334 = 38.454ms.</b>
1524655407.678935: [ENC] inBuf_to_enc():  qbuf output  plane[05] :1524655407.637451 q:00//Frame 12 in queue.
1524655407.691857: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[02] :1524655407.604105 q:02 bytesused=[6245]//Frame 11 (a P frame) dequeue, <b>Latency is 1524655407.691857 - 1524655407.645873 = 45.984ms.</b>
1524655407.712065: [ENC] inBuf_to_enc():  qbuf output  plane[00] :1524655407.670798 q:00//Frame 13 in queue.
1524655407.725022: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[00] :1524655407.637451 q:02 bytesused=[7074]//Frame 12 (a P frame) dequeue, <b>Latency is 1524655407.725022 - 1524655407.678935 = 46.087ms.</b>
1524655407.745870: [ENC] inBuf_to_enc():  qbuf output  plane[01] :1524655407.704092 q:00//Frame 14 in queue.
1524655407.758822: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[01] :1524655407.670798 q:02 bytesused=[7695]//Frame 13 (a P frame) dequeue, <b>Latency is 1524655407.758822 - 1524655407.712065 = 46.757ms.</b>
1524655407.806912: [ENC] controlEnc(): sendEOStoEncoder(): index=[2]
1524655407.807111: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[02] :1524655407.704092 q:02 bytesused=[7589]//Frame 14 (a P frame) dequeue, <b>Latency is 1524655407.807111 - 1524655407.745870 = 61.241ms.</b>
1524655407.807282: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[00] :         0.000000 q:02 bytesused=[0]//EOS Frame

Sorry that I could not understand your answer. Actually, I can already accept the poor performance of Jetson’s video encoding capability. But the main issue is that the profiler is not reporting this poor performance. I need to repeat this again, encoding latency means the time interval from the moment a raw camera frame is queued into the encoder to the time the encoded frame of this raw frame is dequeued from the encoder. So the key thing is that the start time and end time should be one frame. The profiler should not use the frame (n)th in queue time and frame (n-1)th dequeue time to calculate the latency of encoding.

Best regards,

Please check the case of IPBBPBBPBB…

diff --git a/multimedia_api/ll_samples/samples/10_camera_recording/main.cpp b/multimedia_api/ll_samples/samples/10_camera_recording/main.cpp
index 6f531b8..8d78d30 100644
--- a/multimedia_api/ll_samples/samples/10_camera_recording/main.cpp
+++ b/multimedia_api/ll_samples/samples/10_camera_recording/main.cpp
@@ -72,6 +72,10 @@ static bool         DO_STAT = false;
 static bool         VERBOSE_ENABLE = false;
 static bool         DO_CPU_PROCESS = false;
 
+#define SLOT 6
+static uint64_t timestamp[SLOT];
+static int32_t capture_plane_count = 0;
+
 // Debug print macros.
 #define PRODUCER_PRINT(...) printf("PRODUCER: " __VA_ARGS__)
 #define CONSUMER_PRINT(...) printf("CONSUMER: " __VA_ARGS__)
@@ -214,6 +218,7 @@ bool ConsumerThread::threadExecute()
 
     bufferIndex = 0;
 
+int frameCount = 0;
     // Keep acquire frames and queue into encoder
     while (!m_gotError)
     {
@@ -298,6 +303,11 @@ bool ConsumerThread::threadExecute()
             NvBufferMemUnMap(fd, Y_INDEX, &ptr_y);
         }
 
+struct timeval start_time;
+gettimeofday(&start_time, NULL);
+timestamp[(frameCount%SLOT)] = start_time.tv_sec * 1000000 + start_time.tv_usec;
+frameCount ++;
+
         // Push the frame into V4L2.
         v4l2_buf.m.planes[0].m.fd = fd;
         v4l2_buf.m.planes[0].bytesused = 1; // byteused must be non-zero
@@ -379,6 +389,8 @@ bool ConsumerThread::createVideoEncoder()
     if (ret < 0)
         ORIGINATE_ERROR("Could not set m_VideoEncoderoder HW Preset");
 
+m_VideoEncoder->setNumBFrames(2);
+
     // Query, Export and Map the output plane buffers so that we can read
     // raw data into the buffers
     ret = m_VideoEncoder->output_plane.setupPlane(V4L2_MEMORY_DMABUF, 10, true, false);
@@ -414,8 +426,21 @@ bool ConsumerThread::encoderCapturePlaneDqCallback(struct v4l2_buffer *v4l2_buf,
         ORIGINATE_ERROR("Failed to dequeue buffer from encoder capture plane");
     }
 
-    thiz->m_outputFile->write((char *) buffer->planes[0].data,
-                              buffer->planes[0].bytesused);
+if (capture_plane_count > 0) {
+    struct timeval end_time;
+    gettimeofday(&end_time, NULL);
+    uint64_t now_ts = end_time.tv_sec * 1000000 + end_time.tv_usec;
+    int32_t index;
+
+    index = (capture_plane_count - 1) % SLOT;
+    printf("[%d]timediff %lu \n", capture_plane_count, now_ts - timestamp[index]);
+}
+if (capture_plane_count == 0)
+    printf("SPSPPS size %u \n", buffer->planes[0].bytesused);
+capture_plane_count ++;
+
+//    thiz->m_outputFile->write((char *) buffer->planes[0].data,
+//                              buffer->planes[0].bytesused);
 
     if (thiz->m_VideoEncoder->capture_plane.qBuffer(*v4l2_buf, NULL) < 0)
     {

You shall see different GOP affects latency.

Hi luyangijj9c,

4. For each item in your log, I can find two timestamps (one in the front and another one in the end). I assume the front timestamp to be the correct timestamp because it follows the correct time sequence (the second timestamp is jumping back and forth).

Our system:
UVC Camera --(USB)–> V4L2 Caputure —> NvVideoConverter —> NvVideoEncoder —> RTP Packetise —> Ethernet

1524655407.351970: [ENC] inBuf_to_enc():  qbuf output  plane[00] :1524655406.330847

The first timestamp “1524655407.351970” is logging time.
The second timestamp “1524655406.330847” is picture captured time at V4L2 Capture.

I check timestamps between the output plane and the capture plane of NvVideoEncoder by picture captured time.
If picture captured timestamps are same, I think that they are same pictures as input and output.

1524655407.383165: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[00] :1524655406.330847 q:02 bytesused=[23] 
1524655407.383526: [ENC] encoder_capture_plane_dq_callback(): dqbuf capture plane[01] :1524655406.330847 q:02 bytesused=[36770]

These second timestamps “1524655406.330847” are same as above qbuf log.
Then I think that these are encoded log of the above picture to qbuf.

At start of GOP, input picture is 1, but output H.264 code are 3 (SPS/PPS/IDR).
Before R28.1, NvVideoEncoder has an timestamp issue, we could not compare timestamps of input and output.

I cannot understand that your system has large latency because my system use larger resolution and larger bit rate.

I assume that …
qbuf —> output plane queue —> encoding engine —> capture plane queue —> dqCallBack

Encoding the first picture spends more than frame interval (33msec),
the next picture will wait in the output plane queue.
In the capture plane queue, H.264 code (NAL unit) will not wait for so long time.
After dqBuff of SPS/PPS/IDR, the second picture will go into the encode engine, then encode time of the 2nd picture is difference between log time of dqBuff, I think.

At the system performance, encoding latency is between qbuf and dqbuf include waiting time in the both queue.
But at the encoder performance, encoding lantency is between dqbuf and previous dequbuf without waiting time in the output plane queue.

Best Regared

Hi Danel,

Since I’m doing real-time video streaming, I’ll not allow B frame. All frames will be I or P frame. So I’ve no reason to test the latency when having B frame. It will be definitely very large due to the bidirectional inter-frame prediction.

I still need to repeat my question. I observe that the real latency for encoding a frame is much large than the latency reported by the profiler. I want you to check this issue.

The main mistake you guys made is that you use the StartCode/SPS/StartCode/PPS NAL unit (22 bytes or 23 bytes) as the ending time of the 1st frame, which breaks all following latency calculation.

If you look at my log, you will find that the first latency is calculated using the StartCode/SPS/StartCode/PPS NAL unit dequeuing time minus the 1st frame enqueuing time.

The second latency is calculated using the 1st frame dequeuing time minus the 2nd frame enqueuing time. The same thing happens continuously. Thus, the profiler always calculates the (n)th latency using the (n-1)th frame dequeuing time minus the (n)th enqueuing time.

If you understand to this point, you will know that the latency you calculated are not the real latency. The real latency is calculated using the (n)th frame dequeuing time minus the (n)th enqueuing time.

Please let me know if there’s any problem with all the above statements. Let’s forget your GOP stuff for now. Thanks.

Hi mynaemi,

Thank you so much for your reply. It’s much better.

I can understand the encoding latency you mentioned previous, namely the time from dqbuf of frame n to dqbuf of frame n+1.

But my question is about the encoder profiler. The encoder profiler is actually using the time between qbuf and dqbuf to calculate the latency for each frame, which is the system perspective of encoding latency you mentioned.

However, the issue is that the pair of qbuf and dqbuf time used to calculate the encoding latency are not from the same frame. If you look at my log in the previous post, you’ll find the (n)th latency is calcuated using the qbuf time of the (n)th frame and the dqbuf time of the (n-1)th frame, which gives us a very low latency, but the actual latency is much longer(1 frame time longer) than it.

Another question, is it the case that the (n)th frame will dq only after the (n+1)th frame is queued? Can I get the encoded (n)th frame before queue the (n+1)th frame? Thanks.

Hi luyangijj9c,

Can I get the encoded (n)th frame before queue the (n+1)th frame?

You can make it a psude BLOCKING mode by using a semaphore.

  • Lock when qBuf output plane
  • Unlock when dqBuf capture plane

Best Regards

Hi mynaemi,

Thank you very much for your reply.

Can you please explain this in detail? I’m not quite understand how this can help me dq the frame earlier. Thanks.

Hi luyangijj9c,

This is a simple way.

#include <mutex>
std::mutex mtx_encode;
mtx_encode.lock();
enc->output_plane.qbuff(...);
printLog("%lu.%lu: Start Encoding", nowtime.tvsec, nowtime.tvusec, ...);
capturePlane_dqCallBackFunc() {
printLog("%lu.%lu: End Encoding", nowtime.tvsec, nowtime.tvusec, ...);
mtx_encoder.unlock();
process_nal(nvbuffer);
}

An image can not be queued into output plane during encoding previous image.
When a nal unit is dequeued from capture plane, the mutex is unlocked and the next image can be queued into output plane.

There is only one image in the encoder.

Best Regards

Hi guys,

I found out the problem actually exists in the kernel function v4l2_ioctl(). I don’t really know why now. But the problem is that the (n)th frame can never dq before the (n+1)th frame enqueue. There is a lock inside the v4l2_ioctl() function that block the dqbuffer function of the (n)th frame, and after the (n+1)th frame is enqueued, the dqbuffer function will continue to run.

Is this the case for you guys?

Hi luyangijj9c,

What value do you set the second argument of createVideoEncoder() ?

In the document “L4T Multimedia API Reference 28.2”,
it is written that this object is opened in BLOCKING mode.

I have not yet compare the behavior in NON_BLOCKING mode with in BLOCKING mode.
In old versions, the NON_BLOCKING flag was ignored.

Best Regards