Performance of trt-pose on Jetson Nano

I am trying to duplicate the performance results of GitHub - NVIDIA-AI-IOT/trt_pose: Real-time pose estimation accelerated with NVIDIA TensorRT.

I made a clean JetPack 4.3 card for the specific purpose of testing the performance of this demo.

After installing the pack and running the live_demo.ipynb on my Nano, I can’t get anywhere the throughput mentioned in the README.

For resnet18, I get 12-14 FPS at best, after TRT optimization. In the README, the official performance results on resnet18 is 22 FPS.

I made sure to run jetson_clocks before, but other than that, I’m out of ideas on what could be the cause of such a large performance gap between my results and the official benchmark.

Would appreciate any thoughts on how to reach the same throughput in the benchmark achieved.

Not sure this is the root cause, but you would make sure to use MAXN mode prior to jetson clocks script:

sudo nvpmodel -m0

This is the output of sudo jetson_clocks --show:

SOC family:tegra210 Machine:NVIDIA Jetson Nano Developer Kit
Online CPUs: 0-3
CPU Cluster Switching: Disabled
cpu0: Online=1 Governor=schedutil MinFreq=1479000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0
cpu1: Online=1 Governor=schedutil MinFreq=1479000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0
cpu2: Online=1 Governor=schedutil MinFreq=1479000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0
cpu3: Online=1 Governor=schedutil MinFreq=1479000 MaxFreq=1479000 CurrentFreq=1479000 IdleStates: WFI=0 c7=0
GPU MinFreq=921600000 MaxFreq=921600000 CurrentFreq=921600000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=1
Fan: speed=255
NV Power Mode: MAXN

MAXN appears to be enabled. The performance is the same as above, nowhere near the 22 FPS benchmark.

Hi,

Thanks for reporting this.

We are checking this issue internally.
Will update more information with you later.

Thanks.

Hi,

Sorry to keep you waiting.
We are still checking this issue internally.

Please noticed that the 22fps is the performance for neural network execution alone.
Do you use the timing loop in the notebook, or some other method of benchmarking?

Thanks.

Thanks for checking on this!

That’s correct, I am using the timing loop in the notebook, which outputs a FPS measurement.

Also, it is strange that the notebook FPS measurement does not change whether jetson_clocks is run or not.

I can verify that the clock frequency does change when I run the script and set the Nano to 10W mode, but the notebook FPS measurement does not change.

Let me know if I can help with anything else!

Hi,

Thanks for your information.
Our internal team is still checking this. Will update more information with you later.

Thanks.