We are seeing degraded performance when running our commercial software pipeline on Jetson Xavier NX module, with custom designed carrier board and 6x4K cameras on Jetpack 4.6.1. It’s seems that the new board fails to handle heavy traffic and CPU is quickly caps to 100% utilization.
After contacting our supplier’s technical contact, they informed us that our boards are affected by the PCN206980, since Hynix memory & Hynix eMMC components have introduced to the BOM.
The recommended actions describe that we need to include the Appropriate BCT and DVFS changes required by the Hynix memory device on the software image and re-flash. These changes have been included in the Jetpack 4.4.1 and later releases though, and as I described above, we are using Jetpack 4.6.1.
We would appreciate some help to understand what exactly is the problem.
PS: We noticed that after boot, the EMC clock is locked at 204MHz instead of the expected 1600MHz. We can’t change that since the max frequency is also locked at 204MHz
Hi,
Please share a method to replicate the issue on Xavier NX developer kit. Please insert either module to the developer kit and check if it is possible to replicate the issue. So that we can follow the steps to reproduce it and check.
Please check the results that I am getting when I run the matrixMul cuda sample (/usr/local/cuda/samples/0_Simple/matrixMul) in the old and new modules
Working module: 421821020798,48B02D384B91,699-13668-0001-300
$ sudo /usr/local/cuda/samples/0_Simple/matrixMul/matrixMul
[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “Xavier” with compute capability 7.2
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
done Performance= 207.92 GFlop/s, Time= 0.630 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS
NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.
Not working module: 1422122054471,48B02D7A8D56,699-13668-0001-301
$ sudo /usr/local/cuda/samples/0_Simple/matrixMul/matrixMul
[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “Xavier” with compute capability 7.2
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
done Performance= 61.73 GFlop/s, Time= 2.123 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS
NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.
Forgot to run the #jetson_clocks in the not working module. Still the EMC clock is low. After running that I am getting the same poor performance (Performance= 64.47 GFlop/s, Time= 2.033 msec) in the matrixMul example.
I had the program open and then powered on the modules to make sure that I will capture the very first messages. Then I waited until it was stable and no more messages where generated (a few minutes).