video encode speed

Hi ALL,

I try to encode 4K YUV video to H.265 stream by Multimedia API, and hope to get information about video encode speed.
I set read each frame time stamp after read_video_frame(), also set write each frame time stamp before write_encoder_output_frame(), than compare write time with the corresponding frame read time to get frame encode time cost, but the results tell me each frame encode time cost is about 20ms~60ms, which beyond the TX2 encode feature 3840x2160 60fps, our encode command and log is following:

ubuntu@tegra-ubuntu:~/tegra_multimedia_api/samples/01_video_encode$ ./video_encode ShakeNDry_3840x2160_100.yuv 3840 2160 H265 ShakeNDry_3840x2160_vbr_2M.h265 --insert-spspps-idr -ifi 10000 -idri 10000 -fps 60 1 -br 2097152 -rc vbr -MinQpI 10 -MaxQpI 50 -MinQpP 10 -MaxQpP 50
Failed to query video capabilities: Inappropriate ioctl for device
NvMMLiteOpen : Block : BlockType = 8
===== MSENC =====
NvMMLiteBlockCreate : Block : BlockType = 8
892744264
842091865
Read finished! Read i 0 read_time 4201092181
Read finished! Read i 1 read_time 4201109645
Read finished! Read i 2 read_time 4201125511
write begin! frame num 0 write_time 4201134349
write begin! frame num 1 write_time 4201134400
Read finished! Read i 3 read_time 4201141992
write begin! frame num 2 write_time 4201149743
Read finished! Read i 4 read_time 4201158630
write begin! frame num 3 write_time 4201165421
Read finished! Read i 5 read_time 4201175158
write begin! frame num 4 write_time 4201181419
Read finished! Read j 6 read_time 4201187414
write begin! frame num 5 write_time 4201197464
Read finished! Read j 7 read_time 4201199554
Read finished! Read j 8 read_time 4201211634
write begin! frame num 6 write_time 4201213342
Read finished! Read j 9 read_time 4201223898
write begin! frame num 7 write_time 4201228303
Read finished! Read j 10 read_time 4201236169
write begin! frame num 8 write_time 4201243233
Read finished! Read j 11 read_time 4201248352
write begin! frame num 9 write_time 4201258166
Read finished! Read j 12 read_time 4201260536
Read finished! Read j 13 read_time 4201272381
write begin! frame num 10 write_time 4201273128
Read finished! Read j 14 read_time 4201284591
write begin! frame num 11 write_time 4201287958
Read finished! Read j 15 read_time 4201296766
write begin! frame num 12 write_time 4201302823
Read finished! Read j 16 read_time 4201308873
write begin! frame num 13 write_time 4201317702
Read finished! Read j 17 read_time 4201321005
write begin! frame num 14 write_time 4201332655
Read finished! Read j 18 read_time 4201333062
Read finished! Read j 19 read_time 4201345096
write begin! frame num 15 write_time 4201347573
Read finished! Read j 20 read_time 4201359836
write begin! frame num 16 write_time 4201362541
Read finished! Read j 21 read_time 4201374760
write begin! frame num 17 write_time 4201377438
Read finished! Read j 22 read_time 4201389544
write begin! frame num 18 write_time 4201392304
Read finished! Read j 23 read_time 4201404297
write begin! frame num 19 write_time 4201407164
Read finished! Read j 24 read_time 4201419287
write begin! frame num 20 write_time 4201422042
Read finished! Read j 25 read_time 4201434215
write begin! frame num 21 write_time 4201436939
Read finished! Read j 26 read_time 4201449251
write begin! frame num 22 write_time 4201451908
Read finished! Read j 27 read_time 4201464331
write begin! frame num 23 write_time 4201466808
Read finished! Read j 28 read_time 4201479226
write begin! frame num 24 write_time 4201481656
Read finished! Read j 29 read_time 4201493676
write begin! frame num 25 write_time 4201496508
Read finished! Read j 30 read_time 4201508551
write begin! frame num 26 write_time 4201511333
Read finished! Read j 31 read_time 4201523449
write begin! frame num 27 write_time 4201526212
Read finished! Read j 32 read_time 4201538194
write begin! frame num 28 write_time 4201541149
Read finished! Read j 33 read_time 4201553171
write begin! frame num 29 write_time 4201556080
Read finished! Read j 34 read_time 4201568130
write begin! frame num 30 write_time 4201570982
Read finished! Read j 35 read_time 4201583056
write begin! frame num 31 write_time 4201585803
Read finished! Read j 36 read_time 4201597841
write begin! frame num 32 write_time 4201600619
Read finished! Read j 37 read_time 4201612695
write begin! frame num 33 write_time 4201615453
Read finished! Read j 38 read_time 4201627489
write begin! frame num 34 write_time 4201630345
Read finished! Read j 39 read_time 4201642412
write begin! frame num 35 write_time 4201645246
Read finished! Read j 40 read_time 4201657281
write begin! frame num 36 write_time 4201660127
Read finished! Read j 41 read_time 4201672091
write begin! frame num 37 write_time 4201675025
Read finished! Read j 42 read_time 4201687035
write begin! frame num 38 write_time 4201689895
Read finished! Read j 43 read_time 4201702045
write begin! frame num 39 write_time 4201704754
Read finished! Read j 44 read_time 4201716710
write begin! frame num 40 write_time 4201719645
Read finished! Read j 45 read_time 4201731641
write begin! frame num 41 write_time 4201734557
Read finished! Read j 46 read_time 4201746652
write begin! frame num 42 write_time 4201749486
Read finished! Read j 47 read_time 4201761430
write begin! frame num 43 write_time 4201764407
Read finished! Read j 48 read_time 4201776385
write begin! frame num 44 write_time 4201779325
Read finished! Read j 49 read_time 4201791460
write begin! frame num 45 write_time 4201794188
Read finished! Read j 50 read_time 4201806232
write begin! frame num 46 write_time 4201808993
Read finished! Read j 51 read_time 4201823107
write begin! frame num 47 write_time 4201823915
Read finished! Read j 52 read_time 4201835979
write begin! frame num 48 write_time 4201838838
Read finished! Read j 53 read_time 4201851042
write begin! frame num 49 write_time 4201853756
Read finished! Read j 54 read_time 4201865894
write begin! frame num 50 write_time 4201868663
Read finished! Read j 55 read_time 4201880901
write begin! frame num 51 write_time 4201883524
Read finished! Read j 56 read_time 4201895600
write begin! frame num 52 write_time 4201898346
Read finished! Read j 57 read_time 4201910354
write begin! frame num 53 write_time 4201913185
Read finished! Read j 58 read_time 4201925352
write begin! frame num 54 write_time 4201928028
Read finished! Read j 59 read_time 4201940152
write begin! frame num 55 write_time 4201942925
Read finished! Read j 60 read_time 4201955224
write begin! frame num 56 write_time 4201957837
Read finished! Read j 61 read_time 4201969979
write begin! frame num 57 write_time 4201972720
Read finished! Read j 62 read_time 4201984787
write begin! frame num 58 write_time 4201987607
Read finished! Read j 63 read_time 4201999821
write begin! frame num 59 write_time 4202002456
Read finished! Read j 64 read_time 4202014613
write begin! frame num 60 write_time 4202017269
Read finished! Read j 65 read_time 4202029420
write begin! frame num 61 write_time 4202032090
Read finished! Read j 66 read_time 4202044167
write begin! frame num 62 write_time 4202046969
Read finished! Read j 67 read_time 4202059064
write begin! frame num 63 write_time 4202061884
Read finished! Read j 68 read_time 4202074130
write begin! frame num 64 write_time 4202076782
Read finished! Read j 69 read_time 4202088920
write begin! frame num 65 write_time 4202091680
Read finished! Read j 70 read_time 4202103829
write begin! frame num 66 write_time 4202106549
Read finished! Read j 71 read_time 4202118993
write begin! frame num 67 write_time 4202121425
Read finished! Read j 72 read_time 4202133461
write begin! frame num 68 write_time 4202136280
Read finished! Read j 73 read_time 4202148601
write begin! frame num 69 write_time 4202151146
Read finished! Read j 74 read_time 4202163317
write begin! frame num 70 write_time 4202166029
Read finished! Read j 75 read_time 4202178092
write begin! frame num 71 write_time 4202180937
Read finished! Read j 76 read_time 4202193024
write begin! frame num 72 write_time 4202195814
Read finished! Read j 77 read_time 4202208070
write begin! frame num 73 write_time 4202210686
Read finished! Read j 78 read_time 4202222798
write begin! frame num 74 write_time 4202225535
Read finished! Read j 79 read_time 4202237667
write begin! frame num 75 write_time 4202240354
Read finished! Read j 80 read_time 4202252507
write begin! frame num 76 write_time 4202255215
Read finished! Read j 81 read_time 4202267705
write begin! frame num 77 write_time 4202274166
Read finished! Read j 82 read_time 4202282422
write begin! frame num 78 write_time 4202285045
Read finished! Read j 83 read_time 4202297200
write begin! frame num 79 write_time 4202299978
Read finished! Read j 84 read_time 4202312006
write begin! frame num 80 write_time 4202314850
Read finished! Read j 85 read_time 4202326973
write begin! frame num 81 write_time 4202329685
Read finished! Read j 86 read_time 4202341828
write begin! frame num 82 write_time 4202344468
Read finished! Read j 87 read_time 4202356483
write begin! frame num 83 write_time 4202359306
Read finished! Read j 88 read_time 4202371594
write begin! frame num 84 write_time 4202374206
Read finished! Read j 89 read_time 4202386271
write begin! frame num 85 write_time 4202389098
Read finished! Read j 90 read_time 4202401175
write begin! frame num 86 write_time 4202404001
Read finished! Read j 91 read_time 4202416129
write begin! frame num 87 write_time 4202418873
Read finished! Read j 92 read_time 4202431072
write begin! frame num 88 write_time 4202433729
Read finished! Read j 93 read_time 4202445813
write begin! frame num 89 write_time 4202448533
Read finished! Read j 94 read_time 4202460668
write begin! frame num 90 write_time 4202463379
Read finished! Read j 95 read_time 4202475506
write begin! frame num 91 write_time 4202478281
Read finished! Read j 96 read_time 4202490486
write begin! frame num 92 write_time 4202493194
Read finished! Read j 97 read_time 4202505269
write begin! frame num 93 write_time 4202508090
Read finished! Read j 98 read_time 4202520129
write begin! frame num 94 write_time 4202522947
Read finished! Read j 99 read_time 4202535061
write begin! frame num 95 write_time 4202537753
Could not read complete frame from input file
write begin! frame num 96 write_time 4202552569
write begin! frame num 97 write_time 4202567376
write begin! frame num 98 write_time 4202582231
write begin! frame num 99 write_time 4202597165
write begin! frame num 100 write_time 4202612092
App run was successful

Could anyone who can help to explain more, thanks.

Hi Yugui,
Please apply the patch
https://devtalk.nvidia.com/default/topic/1004950/jetson-tx2/multimedia-api-scale-encode/post/5135482/#5135482
And set ‘-hpt 1’

You also need to eliminate the effect of file read/write.

Note that throughput and latency are different things.

It’s quite likely the driver and hardware is pipelined, and needs a number of frames of data to work with to achieve highest throughput.

So, if you spawn two threads, and pump frames in through thread 1, and read frames out through thread 2, and time the time it takes to encode, say, 1000 frames, what throughput do you get?

Hi Dene,

Thanks for your patch, just now i already set ‘-htp 1’ to encode, i got the following logs:

ubuntu@tegra-ubuntu:~/tegra_multimedia_api/samples/01_video_encode$ ./video_encode ShakeNDry_3840x2160_100.yuv 3840 2160 H265 ShakeNDry_3840x2160_vbr_2M.h265 --insert-spspps-idr -ifi 10000 -idri 10000 -fps 60 1 -br 2097152 -rc vbr -MinQpI 10 -MaxQpI 50 -MinQpP 10 -MaxQpP 50 -hpt 1
Failed to query video capabilities: Inappropriate ioctl for device
NvMMLiteOpen : Block : BlockType = 8
===== MSENC =====
NvMMLiteBlockCreate : Block : BlockType = 8
892744264
842091865
Read finished! Read i 0 read_time 3832364997
===== NVENC blits (mode: 1) into block linear surfaces =====
Read finished! Read i 1 read_time 3832382515
write begin! frame num 0 write_time 3832383738
write begin! frame num 1 write_time 3832383786
write begin! frame num 2 write_time 3832395440
Read finished! Read i 2 read_time 3832399315
write begin! frame num 3 write_time 3832411361
Read finished! Read i 3 read_time 3832415955
write begin! frame num 4 write_time 3832427996
Read finished! Read i 4 read_time 3832432580
write begin! frame num 5 write_time 3832444854
Read finished! Read i 5 read_time 3832449316
write begin! frame num 6 write_time 3832461271
Read finished! Read j 6 read_time 3832461687
write begin! frame num 7 write_time 3832473016
Read finished! Read j 7 read_time 3832474065
write begin! frame num 8 write_time 3832485431
Read finished! Read j 8 read_time 3832486455
write begin! frame num 9 write_time 3832497822
Read finished! Read j 9 read_time 3832498822


write begin! frame num 97 write_time 3833583160
Read finished! Read j 97 read_time 3833584091
write begin! frame num 98 write_time 3833595481
Read finished! Read j 98 read_time 3833596392
write begin! frame num 99 write_time 3833607777
Read finished! Read j 99 read_time 3833608689
Could not read complete frame from input file
Read finished! Read j 100 read_time 3833608828
File read complete.
write begin! frame num 100 write_time 3833620063
App run was successful

It confused me that write the encoded data(*.h265) will output before read its YUV data, note that i set the time stamps after read finish each frame YUV data and before write *.h265 data, so that we can eliminate the effect of file read/write.

Hi snarky,

Do you mean if I input 1000 frames YUV files to encoder, that will perform more quickly than input 100 frames?

I haven’t read this particular sample, but I do know that code structured to get the best possible throughput will keep more than one frame in flight at a time.

Thus, the output from such code (which is maybe not this sample?) would look like:

Write frame 0
Write frame 1
Write frame 2
Read frame 0
Write frame 3
Read frame 1
Write frame 4
Read frame 2

The time between “read frame 1” and “read frame 2” will be 16.7 milliseconds or less at 60 Hz processing rate.

When you (or the sample code) are pushing a frame into the one end, and then waiting to read it out the other end before enqueuing the next frame, then you are measuring latency of a single frame through the pipeline, which is not the same thing as throughput of a fully-occupied pipeline.

Hi snarky,

Thanks for your remind, i really agree with the difference between latency and throughput.

Now based on my top topic logs and your suggestion, i find the time between write frame x and write frame x+1 is about 16ms, that conform to the 4KP60 encode speed, but this throughput is beyond my expectation, because 16ms also include read YUV and write bitstream data, that means pure encode time will cost less than 16ms.

Do you have any idea to measure pure encode time?

Hi Yugui,
Please refer to video_encode_main.cpp attached

ubuntu@tegra-ubuntu:~/tegra_multimedia_api/samples/01_video_encode$ gst-launch-1.0 videotestsrc pattern=1 num-buffers=10 ! 'video/x-raw,format=I420,width=3840,height=2160' ! filesink location= ~/4k.yuv
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:01.394058592
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...
ubuntu@tegra-ubuntu:~/tegra_multimedia_api/samples/01_video_encode$ ./video_encode ~/4k.yuv 3840 2160 H265 4k.265 -hpt 1
Failed to query video capabilities: Inappropriate ioctl for device
NvMMLiteOpen : Block : BlockType = 8
===== MSENC =====
NvMMLiteBlockCreate : Block : BlockType = 8
892744264
842091865
===== NVENC blits (mode: 1) into block linear surfaces =====
Could not read complete frame from input file
File read complete.
----------- Element = enc0 -----------
Total Profiling time = 34.7168
Average FPS = 86.7016
Total units processed = 3011
Average latency(usec) = 68966
Minimum latency(usec) = 4563
Maximum latency(usec) = 70620
-------------------------------------
App run was successful

The result looks good

video_encode_main.cpp (34.2 KB)

Hi Dane,

Yeah, the result is good, but i’m not sure whether the average FPS include read YUV data time, if yes, maybe the pure encode speed will be faster.

Hi Yugui, only the first 10 frames read YUV data.

Why did the other frames need not to read? please help explain more, many thanks.

Hi Yugui, there are six output_plane buffers. We fill them with YUV data and then keep feeding to the encoder.

Hi Dane,

Thus, we have to read first six frames YUV data to output_plane buffers, but how to read other frames from input file to out_plane buffers? and how to understand “only the first 10 frames read YUV data”?

Hi Yugui,
Please refer to the sample and do your profiling. I just give an example to show capability. If it does not fit yours, please adapt it.

Hi DaneLLL,

I’m sorry, i should read your attached sample first.

I have adapted your sample into my test case, and input a YUV file(300 frames totally) to encoder, i observe CPU loading performs differently between first 300 frames and last 3000 frames encoding.

During first 300 frames, CPU loading is about 30%, however, after that, when encoding last 3000 frames, CPU loading is about 2%, does 30% or 2% can represent the pure encode loading?

Thanks.

The difference between 30% and 2% seems to be related to the difference between the CPU having to load-and-push the data into the driver buffers, and the CPU just telling the driver to re-use existing buffers.

Than, maybe i can say, 30% including read YUV data to driver buffers and video encode process, but 2% is the CPU loading just for video encode.

That seems to be what the data are saying, yes.

If the buffers come from somewhere other than the CPU (such as DMA through video capture?) then I would expect the CPU wouldn’t have to read/fill the buffers, and thus the specific load of a specific capture/encode setup would depend on exactly how the pipeline was setup.