xavier encode and decode do not match official description

xavier
Tegra_Multimedia_API_R32.1.0_aarch64.tbz2
tx2
Tegra_Multimedia_API_R28.2.1_aarch64.tbz2

# /home/nvidia/jetson_clocks.sh
# nvpmodel -m 0
# nvpmodel -q –-verbose  
    NV Power Mode: MAXN
    0

encode
I Modify the sample code,

code like this:
static int64_t av_gettime(void)
{
    struct timeval tv;
    gettimeofday(&tv, NULL);
    return (int64_t)tv.tv_sec * 1000000 + tv.tv_usec;
}

// in function encoder_capture_plane_dq_callback()
static int64_t frame1_ts = 0;
static int64_t frame101_ts = 0;

if(num_encoded_frames == 101) {
    frame1_ts = av_gettime();
    printf("[%d] encode out time=[%lldus]\n", num_encoded_frames, frame1_ts);
} 

if(num_encoded_frames == 201) {
    frame101_ts = av_gettime();
    int64_t diff = frame101_ts-frame1_ts;
    printf("[%d] encode out time=[%lldus]\n", num_encoded_frames, frame101_ts);
    printf("100 frames diff=[%lldus] = [%lld]ms, one frame=[%lld]ms, fps=[%lld]\n", diff, diff/1000, diff/100000, 100000*1000/diff);
} 

// write_encoder_output_frame(ctx->out_file, buffer);
num_encoded_frames++;


I record 100 out frames times which from 101 frame  to 201, then calculate the Average FPS

test case 1: h264 high slow

tx2
./video_encode /run/1_10s.yuv 1920 1080 H264 /dev/null -p high -hpt 4
Failed to query video capabilities: Inappropriate ioctl for device
NvMMLiteOpen : Block : BlockType = 4
===== MSENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
875967048
842091865
===== MSENC blits (mode: 1) into tiled surfaces =====
[101] encode out time=[1563953079644380us]
[201] encode out time=[1563953080113849us]
100 frames diff=[469469us] = [469]ms, one frame=[4]ms, fps=[213]

xavier
./video_encode /run/1_10s.yuv 1920 1080 H264 /dev/null -p high -hpt 4
Creating Encoder in blocking mode
Failed to query video capabilities: Inappropriate ioctl for device
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
875967048
842091865
H264: Profile = 100, Level = 51
encoder_proc_blocking
[101] encode out time=[1563548857378434us]
[201] encode out time=[1563548857982769us]
100 frames diff=[604335us] = [604]ms, one frame=[6]ms, fps=[165]

test case 2: h265 main slow

tx2
./video_encode /run/1_10s.yuv 1920 1080 H265 /dev/null -p main -hpt 4
Failed to query video capabilities: Inappropriate ioctl for device
NvMMLiteOpen : Block : BlockType = 8
===== MSENC =====
NvMMLiteBlockCreate : Block : BlockType = 8
892744264
842091865
===== NVENC blits (mode: 1) into block linear surfaces =====
[101] encode out time=[1563953596532325us]
[201] encode out time=[1563953597289380us]
100 frames diff=[757055us] = [757]ms, one frame=[7]ms, fps=[132]

xavier
./video_encode /run/1_10s.yuv 1920 1080 H265 /dev/null -p main -hpt 4
Creating Encoder in blocking mode
Failed to query video capabilities: Inappropriate ioctl for device
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 8
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 8
892744264
842091865
NVMEDIA: H265 : Profile : 1
encoder_proc_blocking
[101] encode out time=[1563548959778863us]
[201] encode out time=[1563548960694426us]
100 frames diff=[915563us] = [915]ms, one frame=[9]ms, fps=[109]

test case 3: h264 baseline ultrafast

tx2
./video_encode /run/1_10s.yuv 1920 1080 H264 /dev/null -p baseline -hpt 1
Failed to query video capabilities: Inappropriate ioctl for device
NvMMLiteOpen : Block : BlockType = 4
===== MSENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
875967048
842091865
===== MSENC blits (mode: 1) into tiled surfaces =====
[101] encode out time=[1563953729986357us]
[201] encode out time=[1563953730266781us]
100 frames diff=[280424us] = [280]ms, one frame=[2]ms, fps=[356]

xavier
/video_encode /run/1_10s.yuv 1920 1080 H264 /dev/null -p baseline -hpt 1
Creating Encoder in blocking mode
Failed to query video capabilities: Inappropriate ioctl for device
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
875967048
842091865
H264: Profile = 66, Level = 51
encoder_proc_blocking
[101] encode out time=[1563549100991580us]
[201] encode out time=[1563549101227567us]
100 frames diff=[235987us] = [235]ms, one frame=[2]ms, fps=[423]

so result:
Average FPS:

h264 high slow: tx2: 213,  xavier: 165
h265 main slow: tx2: 132,  xavier: 109
h264 baseline ultrafast: tx2: 356,  xavier: 423

these results do not match that in this link http://connecttech.com/xavier-tx2-comparison/

test video: 1_10s.h264 1920x1080@25fps high

decode:
xavier
./video_decode H264 --disable-rendering --stats -o /dev/null /run/1_10s.h264
Set governor to performance before enabling profiler
Creating decoder in blocking mode
Failed to query video capabilities: Inappropriate ioctl for device
Opening in BLOCKING MODE
Set governor to performance before enabling profiler
NvMMLiteOpen : Block : BlockType = 261
NvMMLiteBlockCreate : Block : BlockType = 261
Setting frame input mode to 1
Starting decoder capture loop thread
Input file read complete
Video Resolution: 1920x1080
supported colorspace details not available, use default
Decoder colorspace ITU-R BT.601 with standard range luma (16-235)
Query and set capture successful
----------- Element = dec0 -----------
Total Profiling time = 0.402865
Average FPS = 620.555
Total units processed = 251


Total Profiling Time = 0 sec


Exiting decoder capture loop thread
App run was successful

tx2:
./video_decode H264 --disable-rendering --stats -o /dev/null /run/1_10s.h264
Set governor to performance before enabling profiler
Failed to query video capabilities: Inappropriate ioctl for device
Set governor to performance before enabling profiler
NvMMLiteOpen : Block : BlockType = 261
TVMR: NvMMLiteTVMRDecBlockOpen: 7647: NvMMLiteBlockOpen
NvMMLiteBlockCreate : Block : BlockType = 261
Starting decoder capture loop thread
Input file read complete
TVMR: NvMMLiteTVMRDecDoWork: 6531: NVMMLITE_TVMR: EOS detected
TVMR: cbBeginSequence: 1179: BeginSequence 1920x1088, bVPR = 0
TVMR: LowCorner Frequency = 0
TVMR: cbBeginSequence: 1529: DecodeBuffers = 5, pnvsi->eCodec = 4, codec = 0
TVMR: cbBeginSequence: 1600: Display Resolution : (1920x1080)
TVMR: cbBeginSequence: 1601: Display Aspect Ratio : (1920x1080)
TVMR: cbBeginSequence: 1669: ColorFormat : 5
TVMR: cbBeginSequence:1683 ColorSpace = NvColorSpace_YCbCr601
TVMR: cbBeginSequence: 1809: SurfaceLayout = 3
TVMR: cbBeginSequence: 1902: NumOfSurfaces = 12, InteraceStream = 0, InterlaceEnabled = 0, bSecure = 0, MVC = 0 Semiplanar = 1, bReinit = 1, BitDepthForSurface = 8 LumaBitDepth = 8, ChromaBitDepth = 8, ChromaFormat = 5
TVMR: cbBeginSequence: 1904: BeginSequence ColorPrimaries = 2, TransferCharacteristics = 2, MatrixCoefficients = 2
Video Resolution: 1920x1080
Query and set capture successful
TVMR: TVMRBufferProcessing: 5486: Processing of EOS
TVMR: TVMRBufferProcessing: 5563: Processing of EOS Done
----------- Element = dec0 -----------
Total Profiling time = 0.396803
Average FPS = 630.036
Total units processed = 251


Total Profiling Time = 0 sec


Exiting decoder capture loop thread
TVMR: TVMRFrameStatusReporting: 6132: Closing TVMR Frame Status Thread -------------
TVMR: TVMRVPRFloorSizeSettingThread: 5942: Closing TVMRVPRFloorSizeSettingThread -------------
TVMR: TVMRFrameDelivery: 5982: Closing TVMR Frame Delivery Thread -------------
TVMR: NvMMLiteTVMRDecBlockClose: 7815: Done
App run was successful

so result:
Average FPS, xavier is 620.555 , tx2 is 630.036. this result do not match that in this link Xavier & TX2 Comparison - Connect Tech Inc.

i reflash the system use JetPack4.2.1 and run the cmd jetson_clocks and nvpmodel -m 0

head -n 1 /etc/nv_tegra_release

R32 (release), REVISION: 2.0, GCID: 15966166, BOARD: t186ref, EABI: aarch64, DATE: Wed Jul 17 00:26:04 UTC 2019

nvpmodel -q –-verbose

NV Fan Mode:quiet
NV Power Mode: MAXN
0

test xavier, the result become worse

encode
case 1
./video_encode /run/1_10s.yuv 1920 1080 H264 /dev/null -p high -hpt 4
Creating Encoder in blocking mode
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
875967048
842091865
H264: Profile = 100, Level = 51
[101] encode out time=[1564026133833636us]
[201] encode out time=[1564026135610796us]
100 frames diff=[1777160us] = [1777]ms, one frame=[17]ms, fps=[56]
Could not read complete frame from input file
File read complete.
Got 0 size buffer in capture
App run was successful

case 2
./video_encode /run/1_10s.yuv 1920 1080 H265 /dev/null -p main -hpt 4
Creating Encoder in blocking mode
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 8
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 8
892744264
842091865
NVMEDIA: H265 : Profile : 1
[101] encode out time=[1564026503292714us]
[201] encode out time=[1564026505923325us]
100 frames diff=[2630611us] = [2630]ms, one frame=[26]ms, fps=[38]
Could not read complete frame from input file
File read complete.
Got 0 size buffer in capture
App run was successful

case 3
./video_encode /run/1_10s.yuv 1920 1080 H264 /dev/null -p baseline -hpt 1
Creating Encoder in blocking mode
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
875967048
842091865
H264: Profile = 66, Level = 51
[101] encode out time=[1564026892211131us]
[201] encode out time=[1564026892521400us]
100 frames diff=[310269us] = [310]ms, one frame=[3]ms, fps=[322]
Could not read complete frame from input file
File read complete.
Got 0 size buffer in capture
App run was successful

decode
./video_decode H264 --disable-rendering --stats -o /dev/null /run/1_10s.h264
Set governor to performance before enabling profiler
Creating decoder in blocking mode
Opening in BLOCKING MODE
Set governor to performance before enabling profiler
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
Setting frame input mode to 1
Starting decoder capture loop thread
Input file read complete
Video Resolution: 1920x1080
Decoder colorspace ITU-R BT.601 with standard range luma (16-235)
Query and set capture successful
----------- Element = dec0 -----------
Total Profiling time = 0.468515
Average FPS = 533.601
Total units processed = 251


Total Profiling Time = 0 sec


Exiting decoder capture loop thread
App run was successful

Hi,
Please run ‘sudo nvpmodel -m 0’ first and then ‘sudo jetson_clocks’.

For comparing 1080p encoding, please apply the following patch:

diff --git a/multimedia_api/ll_samples/samples/01_video_encode/video_encode_main.cpp b/multimedia_api/ll_samples/samples/01_video_encode/video_encode_main.cpp
index 393ffb3..770a0ac 100644
--- a/multimedia_api/ll_samples/samples/01_video_encode/video_encode_main.cpp
+++ b/multimedia_api/ll_samples/samples/01_video_encode/video_encode_main.cpp
@@ -133,7 +133,7 @@ void CloseCrc(Crc **phCrc)
 static int
 write_encoder_output_frame(ofstream * stream, NvBuffer * buffer)
 {
-    stream->write((char *) buffer->planes[0].data, buffer->planes[0].bytesused);
+    //stream->write((char *) buffer->planes[0].data, buffer->planes[0].bytesused);
     return 0;
 }
 
@@ -1162,6 +1162,7 @@ encode_proc(context_t& ctx, int argc, char *argv[])
         ctx.enc = NvVideoEncoder::createVideoEncoder("enc0", O_NONBLOCK);
     }
     TEST_ERROR(!ctx.enc, "Could not create encoder", cleanup);
+    ctx.enc->enableProfiling();
 
     // It is necessary that Capture Plane format be set before Output Plane
     // format.
@@ -1729,6 +1730,7 @@ cleanup:
             }
         }
     }
+    ctx.enc->printProfilingStats(std::cout);
 
     delete ctx.enc;
     delete ctx.in_file;

And run 8 encoding processes:

./video_encode 1080p.yuv 1920 1080 H264 test.h264 --max-perf -hpt 1 & ./video_encode 1080p1.yuv 1920 1080 H264 test1.h264 --max-perf -hpt 1 & ./video_encode 1080p2.yuv 1920 1080 H264 test2.h264 --max-perf -hpt 1 & ./video_encode 1080p3.yuv 1920 1080 H264 test3.h264 --max-perf -hpt 1 & ./video_encode 1080p4.yuv 1920 1080 H264 test4.h264 --max-perf -hpt 1 & ./video_encode 1080p5.yuv 1920 1080 H264 test5.h264 --max-perf -hpt 1 & ./video_encode 1080p6.yuv 1920 1080 H264 test6.h264 --max-perf -hpt 1 & ./video_encode 1080p7.yuv 1920 1080 H264 test7.h264 --max-perf -hpt 1

It is better to have 1000+ frames in the input YUVs.