Bad gpu performance on some nano boards (jetpack-4.6)

Hello,

I have a devkit and some custom nano boards equipped with jetpack-4.6 and exactly the same kernel. On some of the custom boards, the performances measured by the tests at GitHub - NVIDIA-AI-IOT/jetson_benchmarks: Jetson Benchmark are really bad, compared with the expected performances shown by the table at Jetson Benchmarks | NVIDIA Developer. I do not find error or different messages in kernel logs. Where should I look to understand and fix that performance problem ?

                Model Name        FPS
0             inception_v4   5.303644
1                 vgg19_N2   3.684099
2  super_resolution_bsd500   6.972859
3        unet-segmentation   9.363284
4          pose_estimation   5.187021
5          yolov3-tiny-416  18.252407
6         ResNet50_224x224  13.376283
7         ssd-mobilenet-v1  14.952389

Hi,
Do you observe the issue on Jetson Nano developer kit? Or it is specific to the custom board?

The issue did not happen up to now on the devkit, and it happens only on some custom boards but seem not be hardware-dependent : On one board, after a reflash, the issue disappeared but we do not find why. All the firmware partitions are identical (except the DTB* of course) between a ‘fast’ one and a ‘slow’ one.
The clocks in /sys/devices/57000000.gpu/devfreq/57000000.gpu are identical between a ‘fast’ and a ‘slow’ boards.
Which other debug entries or tools should I use ?

Hi,
Please run sudo tegrastats on the boards to know the status. To check if anything is suspicious from the prints.

On a ‘fast’ one :
RAM 1239/3964MB (lfb 13x2MB) IRAM 0/252kB(lfb 252kB) CPU [12%@1479,3%@1479,9%@1479,2%@1479] EMC_FREQ 0%@1600 GR3D_FREQ 0%@76 APE 25 PLL@16.5C CPU@21C PMIC@50C GPU@20C AO@25.5C thermal@20.25C POM_5V_GPU 2667/2667 POM_5V_IN 0/0 POM_5V_CPU 875/875

On a ‘slow’ one :
RAM 300/3964MB (lfb 783x4MB) IRAM 0/252kB(lfb 252kB) CPU [2%@102,1%@204,0%@204,0%@204] EMC_FREQ 0%@204 GR3D_FREQ 0%@76 APE 25 PLL@24C CPU@26.5C PMIC@50C GPU@26C AO@36.5C thermal@26.25C POM_5V_GPU 1190/1190 POM_5V_IN 0/0 POM_5V_CPU 119/119

Those stats were taken without the tests running

Hi,
From the prints it looks like one has sudo jetson_clocks executed and the other is not. Please run either model(such ad YoloV3 tiny) and share the prints. See if there is deviation while a model is running.

Quick answer (yolo test results will come later)

nvidia@nano-jp46:~$ sudo tegrastats
[sudo] password for nvidia:
RAM 246/3964MB (lfb 813x4MB) IRAM 0/252kB(lfb 252kB) CPU [1%@204,0%@204,0%@204,0%@204] EMC_FREQ 0%@204 GR3D_FREQ 0%@76 APE 25 PLL@14.5C CPU@17C PMIC@50C GPU@16.5C AO@26C thermal@16.75C POM_5V_GPU 1150/1150 POM_5V_IN 0/0 POM_5V_CPU 119/119
RAM 246/3964MB (lfb 813x4MB) IRAM 0/252kB(lfb 252kB) CPU [0%@204,0%@204,0%@204,0%@204] EMC_FREQ 0%@204 GR3D_FREQ 0%@76 APE 25 PLL@14C CPU@17.5C PMIC@50C GPU@16C AO@26C thermal@16.75C POM_5V_GPU 1152/1151 POM_5V_IN 0/0 POM_5V_CPU 79/99
RAM 246/3964MB (lfb 813x4MB) IRAM 0/252kB(lfb 252kB) CPU [1%@102,1%@204,0%@204,0%@204] EMC_FREQ 0%@204 GR3D_FREQ 0%@76 APE 25 PLL@14.5C CPU@17.5C PMIC@50C GPU@16.5C AO@26.5C thermal@16.75C POM_5V_GPU 1150/1150 POM_5V_IN 0/0 POM_5V_CPU 79/92
^C
nvidia@nano-jp46:~$ sudo jetson_clocks
Can't access Fan!
nvidia@nano-jp46:~$ sudo tegrastats
RAM 246/3964MB (lfb 813x4MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1479,0%@1479,0%@1479,0%@1479] EMC_FREQ 0%@204 GR3D_FREQ 0%@921 APE 25 PLL@15C CPU@17.5C PMIC@50C GPU@16.5C AO@27C thermal@17C POM_5V_GPU 1428/1428 POM_5V_IN 119/119 POM_5V_CPU 238/238
RAM 246/3964MB (lfb 813x4MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1479,0%@1479,0%@1479,0%@1479] EMC_FREQ 0%@204 GR3D_FREQ 0%@921 APE 25 PLL@15C CPU@17.5C PMIC@50C GPU@17C AO@27C thermal@17.25C POM_5V_GPU 1428/1428 POM_5V_IN 119/119 POM_5V_CPU 238/238
RAM 246/3964MB (lfb 813x4MB) IRAM 0/252kB(lfb 252kB) CPU [0%@1479,0%@1479,0%@1479,0%@1479] EMC_FREQ 0%@204 GR3D_FREQ 0%@921 APE 25 PLL@15C CPU@18C PMIC@50C GPU@17C AO@27C thermal@17.25C POM_5V_GPU 1428/1428 POM_5V_IN 119/119 POM_5V_CPU 238/238
^C
nvidia@nano-jp46:~$

If not running ‘sudo jetson_clocks’ is the culprit, which startup script or service is responsible for running it ?

Hi,
It is an option and you may set up startup script to execute it. May set up rc.local by referring to
How to Enable /etc/rc.local with Systemd - LinuxBabe

Here are the results of YoloV3 tiny after the call to ‘sudo jetson_clocks’

------------Executing yolov3-tiny-416------------

--------------------------

Model Name: yolov3-tiny-416
FPS:18.31

--------------------------

Wall Time for running model (secs): 771.2218413352966

As you can see, there is no improvement.

I also ran ‘sudo tegrastats’ while the test was running. Here are the results :

nvidia@nano-jp46:~$ tegrastats
RAM 1873/3964MB (lfb 295x4MB) CPU [2%@1479,8%@1479,18%@1479,100%@1479] EMC_FREQ 0% GR3D_FREQ 96% PLL@42C CPU@45C PMIC@50C GPU@43.5C AO@56C thermal@44C
RAM 1874/3964MB (lfb 295x4MB) CPU [1%@1479,1%@1479,25%@1479,100%@1479] EMC_FREQ 0% GR3D_FREQ 99% PLL@41.5C CPU@44.5C PMIC@50C GPU@43C AO@55C thermal@44C
RAM 1874/3964MB (lfb 295x4MB) CPU [0%@1479,0%@1479,2%@1479,100%@1479] EMC_FREQ 0% GR3D_FREQ 99% PLL@41.5C CPU@45C PMIC@50C GPU@43.5C AO@55C thermal@43.75C
RAM 1874/3964MB (lfb 295x4MB) CPU [1%@1479,0%@1479,0%@1479,100%@1479] EMC_FREQ 0% GR3D_FREQ 99% PLL@42C CPU@44.5C PMIC@50C GPU@43.5C AO@55C thermal@44C
^C

Hi,
Please run sudo tegrastats. It doesn’t show complete information. You can see that EMC and GPU clocks are absent.

Sorry about the missing 'sudo;

Here is the result of ‘sudo tegrastats’

RAM 1777/3964MB (lfb 307x4MB) IRAM 0/252kB(lfb 252kB) CPU [1%@1479,0%@1479,0%@1479,100%@1479] EMC_FREQ 54%@204 GR3D_FREQ 99%@921 APE 25 PLL@41C CPU@44C PMIC@50C GPU@42.5C AO@54C thermal@43.25C POM_5V_GPU 3803/3803 POM_5V_IN 1267/1267 POM_5V_CPU 1030/1030
RAM 1777/3964MB (lfb 307x4MB) IRAM 0/252kB(lfb 252kB) CPU [1%@1479,0%@1479,0%@1479,100%@1479] EMC_FREQ 57%@204 GR3D_FREQ 99%@921 APE 25 PLL@41C CPU@44C PMIC@50C GPU@42.5C AO@54C thermal@43.25C POM_5V_GPU 3644/3723 POM_5V_IN 1111/1189 POM_5V_CPU 990/1010
RAM 1777/3964MB (lfb 307x4MB) IRAM 0/252kB(lfb 252kB) CPU [1%@1479,0%@1479,0%@1479,100%@1479] EMC_FREQ 60%@204 GR3D_FREQ 94%@921 APE 25 PLL@40.5C CPU@44C PMIC@50C GPU@42.5C AO@54.5C thermal@43.25C POM_5V_GPU 3803/3750 POM_5V_IN 1307/1228 POM_5V_CPU 1030/1016

Hi,
The issue looks to be in EMC frequency. It runs at 204MHz only. On fast one, it can reach 1.6GHz. Could you check this? Probably the voltage or current to EMC is not sufficient.

Can that be caused by some bad or missing entry in the DT or is it the other way around : missing nodes (related to ‘external-memory-controller@7001b000’) are caused by not getting enough voltage or current ?

Hi,
If you flash all custom boards with same image/DT and the issue happens on specific custom boards, it is more like an issue in hardware.

Are the following messages in low-level boot related to the problem ?

[0001.355] get_emc_table_offset: failed to find EMC node
[0001.360] LPDDR4 Training: Can't find emc-table node

Is this a custom board or nv devkit?

A custom board, but of course copied from the devkit

Please use devkit to test first.
Everyone here says their custom board is a copy from devkit but everyone filed a topic to ask question on their board.

Sorry.

Let me rephrase my last question :

What do the following messages, issued by a low-level boot firmware before u-boot, mean exactly ?

[0001.355] get_emc_table_offset: failed to find EMC node
[0001.360] LPDDR4 Training: Can't find emc-table node

Share your whole log instead of just partial logs please.