Degraded performance on new Xavier NX 8GB 900-83668-0000-000 affected by PCN206980

Hello,

We are seeing degraded performance when running our commercial software pipeline on Jetson Xavier NX module, with custom designed carrier board and 6x4K cameras on Jetpack 4.6.1. It’s seems that the new board fails to handle heavy traffic and CPU is quickly caps to 100% utilization.

After contacting our supplier’s technical contact, they informed us that our boards are affected by the PCN206980, since Hynix memory & Hynix eMMC components have introduced to the BOM.

The recommended actions describe that we need to include the Appropriate BCT and DVFS changes required by the Hynix memory device on the software image and re-flash. These changes have been included in the Jetpack 4.4.1 and later releases though, and as I described above, we are using Jetpack 4.6.1.

We would appreciate some help to understand what exactly is the problem.

PS: We noticed that after boot, the EMC clock is locked at 204MHz instead of the expected 1600MHz. We can’t change that since the max frequency is also locked at 204MHz

$ sudo jetson_clocks --show
SOC family:tegra194 Machine:NVIDIA Jetson Xavier NX Developer Kit
Online CPUs: 0-5
cpu0: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu1: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu2: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu3: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu4: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu5: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
GPU MinFreq=1109250000 MaxFreq=1109250000 CurrentFreq=1109250000
EMC MinFreq=204000000 MaxFreq=204000000 CurrentFreq=204000000 FreqOverride=1
Fan: PWM=0
NV Power Mode: MODE_20W_6CORE

Hi,
Do you mean your Xavier NX is not PCN206980 and performance is impacted since we enable PCN206980 in Jetpack 4.6.1?

PCN206980 is the Product Change Notice we got from NVIDIA about the new memory that changed in our Xavier NX BOM.

Hello. We would appreciate a fast responsive on this, since it’s a serious bottleneck for the production line of our product.

Hi,
Do you mean you observe performance downgrade in all modules? Or specific to PCN206980 modules? It is uncertain what the issue is.

We see performance issues specific to modules that affected by PCN206980. Below is a list with modules that dont work as expected

1422122080122,48B02D7A929D,699-13668-0001-301
1422122054471,48B02D7A8D56,699-13668-0001-301
1422122055782,48B02D7A928B,699-13668-0001-301
1422122033874,48B02D7A9607,699-13668-0001-301
1422122033872,48B02D7A960D,699-13668-0001-301
1422122055560,48B02D7A9615,699-13668-0001-301
1422122080220,48B02D7A9176,699-13668-0001-301
1422122033870,48B02D7A95FE,699-13668-0001-301
1422122033866,48B02D7A95F9,699-13668-0001-301
1422122080244,48B02D7A91A3,699-13668-0001-301
1422122033877,48B02D7A9611,699-13668-0001-301
1422122055557,48B02D7A961A,699-13668-0001-301
1422122053476,48B02D7A9400,699-13668-0001-301
1422122080429,48B02D7A9013,699-13668-0001-301
1422122080219,48B02D7A9180,699-13668-0001-301

Hi,
Please share a method to replicate the issue on Xavier NX developer kit. Please insert either module to the developer kit and check if it is possible to replicate the issue. So that we can follow the steps to reproduce it and check.

Hello,

Please check the results that I am getting when I run the matrixMul cuda sample (/usr/local/cuda/samples/0_Simple/matrixMul) in the old and new modules

Working module: 421821020798,48B02D384B91,699-13668-0001-300

$ sudo /usr/local/cuda/samples/0_Simple/matrixMul/matrixMul
[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “Xavier” with compute capability 7.2

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
done
Performance= 207.92 GFlop/s, Time= 0.630 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.

$ sudo jetson_clocks --show
SOC family:tegra194 Machine:NVIDIA Jetson Xavier NX Developer Kit
Online CPUs: 0-5
cpu0: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1420800 CurrentFreq=1190400 IdleStates: C1=0 c6=0
cpu1: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1420800 CurrentFreq=1190400 IdleStates: C1=0 c6=0
cpu2: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1420800 CurrentFreq=1190400 IdleStates: C1=0 c6=0
cpu3: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1420800 CurrentFreq=1344000 IdleStates: C1=0 c6=0
cpu4: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1420800 CurrentFreq=1344000 IdleStates: C1=0 c6=0
cpu5: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1420800 CurrentFreq=1190400 IdleStates: C1=0 c6=0
GPU MinFreq=1109250000 MaxFreq=1109250000 CurrentFreq=1109250000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=1
Fan: PWM=0
NV Power Mode: MODE_15W_6CORE

Not working module: 1422122054471,48B02D7A8D56,699-13668-0001-301

$ sudo /usr/local/cuda/samples/0_Simple/matrixMul/matrixMul
[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “Xavier” with compute capability 7.2

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
done
Performance= 61.73 GFlop/s, Time= 2.123 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.

$ sudo jetson_clocks --show
SOC family:tegra194 Machine:NVIDIA Jetson Xavier NX Developer Kit
Online CPUs: 0-5
cpu0: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1420800 CurrentFreq=1344000 IdleStates: C1=0 c6=0
cpu1: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1420800 CurrentFreq=1190400 IdleStates: C1=0 c6=0
cpu2: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1420800 CurrentFreq=1190400 IdleStates: C1=0 c6=0
cpu3: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1420800 CurrentFreq=1190400 IdleStates: C1=0 c6=0
cpu4: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu5: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=1420800 CurrentFreq=1190400 IdleStates: C1=0 c6=0
GPU MinFreq=114750000 MaxFreq=1109250000 CurrentFreq=114750000
EMC MinFreq=204000000 MaxFreq=204000000 CurrentFreq=204000000 FreqOverride=1
Fan: PWM=0
NV Power Mode: MODE_15W_6CORE

Update 2

Forgot to run the #jetson_clocks in the not working module. Still the EMC clock is low. After running that I am getting the same poor performance (Performance= 64.47 GFlop/s, Time= 2.033 msec) in the matrixMul example.

$ sudo jetson_clocks --show
SOC family:tegra194 Machine:NVIDIA Jetson Xavier NX Developer Kit
Online CPUs: 0-5
cpu0: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu1: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu2: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu3: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu4: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
cpu5: Online=1 Governor=schedutil MinFreq=1420800 MaxFreq=1420800 CurrentFreq=1420800 IdleStates: C1=0 c6=0
GPU MinFreq=1109250000 MaxFreq=1109250000 CurrentFreq=1109250000
EMC MinFreq=204000000 MaxFreq=204000000 CurrentFreq=204000000 FreqOverride=1
Fan: PWM=0
NV Power Mode: MODE_15W_6CORE

Hi,
Thanks for the steps. We would need your help to get the uart log of the device:

1422122054471,48B02D7A8D56,699-13668-0001-301

Would like to get RAM code of the device, which is printed in uart log.

Hi,
We try it on Jetpack 4.6.1 but do not hit the issue. The Xavier NX module we use:

$ cat /etc/nv_boot_control.conf 
TNSPEC 3668-301-0001-A.0-1-2-jetson-xavier-nx-devkit-emmc-mmcblk0p1
COMPATIBLE_SPEC 3668-301---1--jetson-xavier-nx-devkit-emmc-
TEGRA_CHIPID 0x19
TEGRA_OTA_BOOT_DEVICE /dev/mtdblock0
TEGRA_OTA_GPT_DEVICE /dev/mtdblock0

Ram Code: 0x1

We run flash.sh command to flash system image.

Hello,

We manage to narrow down the issue.

We see the issue when we flash with the generated mass flashing script ./nvmflash.sh. We produce this script with the following command

sudo BOARDID=3668 BOARDSKU=0001 FAB=100 FUSELEVEL=fuselevel_production ./nvmassflashgen.sh jetson-xavier-nx-devkit-emmc mmcblk0p1

We don’t see the issue when we flash with the regular flash command

sudo ./flash.sh jetson-xavier-nx-devkit-emmc mmcblk0p1

Can you help us understand if we need to specify something different when we generate this nvmflash.sh tool so it can work for all modules ?

As our previous comment, please share the uart log between these two cases.
It will help identify the cause.

Ofcourse.
uart.zip (2.8 KB)

Your log is not completed and also we are talking about the working and NG case.

You should share 2 logs. Not only one.

Hello.

The non working module
48B02D7A8D56-uart.zip (3.6 KB)

And the working module
48B02D384B91-uart.zip (3.4 KB)

I generated this log with

sudo minicom -D /dev/ttyUSB0 -b 115200 -C X-uart.log

I had the program open and then powered on the modules to make sure that I will capture the very first messages. Then I waited until it was stable and no more messages where generated (a few minutes).

Hi,

Are you sure the cable and board are fine? Is this conducted on devkit?
This is a totally not completed log in either working and non working case.

Could you also try other console tool instead of minicom?

For example, using picocom.

I think I fixed the issue. I can see the ram code inside each log.

The non working module
48B02D7A8D56-uart.zip (10.0 KB)

And the working module
48B02D384B91-uart.zip (10.0 KB)

Hi,

Thanks. The log is completed now.
I think there is a mistake here. It is not “working” and “NG” modules.

The real issue should be when you run flash.sh, it will give you one ram code.
But when you use massflash, it gives you another ram code.

My point here is you should use the same module to prove what I said is correct or not. But not 2 different modules.

So you want me to give you 2 logs that are generated with the same module and each case

  • Flash normally with flash.sh tool
  • Mass flash with nvmflash.sh tool

And we should see different RAM code in each log. Is that correct ?

Yes, that is what I want to prove. The different RAM code causes the performance issue you saw.

1 Like