Reduced GPU performance with latest manufacturer hardware revision

The company I work for uses the Xavier NX production modules in our line of cameras. I’ve found that the latest hardware revisions are failing our image processing GPU benchmarks (the tests are running on average 4-5 times slower than the earlier hardware revisions). Here are some more details:

Hardware part numbers in question: 699-13668-0001-301 G.0
I’ve tested 5 units with this part number and they all exhibit the same slow behaviour. Serial numbers for the slow units that I’ve tested are: 1422322079323, 1422322064315, 1422322079142, 1422322063973, 1422322063775

Other units I’ve tested have the part numbers 699-13668-0001-300 B.0 and 699-13668-0001-301 F.0. These hardware revisions pass our benchmarks

All the units used the same operating system image and mounting board hardware. The only difference between the tested units is this manufacturing revision.

One test I’ve tried based on another thread is to check the output of the command

cat /sys/kernel/debug/bpmp/debug/emc/dram_info

to check the dram model. This command returns a result on the working systems (B.0 and F.0 manufacturing revision), but this file doesn’t exist on the systems with the slow GPU performance (G.0).

Thanks for the help.

I had the same problem with TX2 modules. We have very custom hardware and a rather old software release, and on certain newer TX2 modules we only had 4-6 frames per second instead of 30 with the same firmware and the same carrier board…

The recommended way is only to use the latest Jetpack. There are product change notices (PCNs) that state what hardware revision requires what Jetpack revision. If you want to have a supported configuration you need to follow this note.

For my TX2 problem I found out that it was not a problem of the software on the main CPU cores where the Linux system is running, but it was a problem with firmware running on the other cores not accessible by the user, in my case the BPMP (Boot & Power Management Processor). I managed to use the firmware and the flash scripts of a later Jetpack version with our old UBoot, Kernel, Root FS and dtbs. It took me several days but eventually managed to get a system that also runs our very old Linux image.
Warning: If you choose this method you are completely out of support and on your own, and NVidia won’t help you. And there is no guarantee that you will succeed.

Lesson learnt: With Jetson modules you can’t freeze your images forever. You absolutely must apply software updates regularly in order to be able to use the latest modules.

Hi,
Please try to run this sample on the modules:

NVIDIA_CUDA-11.4_Samples/0_Simple/matrixMulCUBLAS$ sudo jetson_clocks
NVIDIA_CUDA-11.4_Samples/0_Simple/matrixMulCUBLAS$ ./matrixMulCUBLAS

And see if it can show the performance deviation. Would like to see if there is a method for reproducing the phenomenon.

And please share which Jetpack version you are using.

Thanks, that was what I was worried had happened. I’ll test out upgrading the BSP and get out code building in the new operating system (we are still on Ubuntu 18/Jetpack 4.6).

I’ll compare the output of the matrixMulCUBLAS on both jetpack versions and get back to you with the result.

Upgrading to 4.6.4 (the latest or the last 4.x Jetpack) might be enough.

I’ve run the matrixMulCUBLAS after the jetson clocks with the following results:

G.0 hardware revision - Jetpack 4.6.0 - average 168 GFlops/s
E.0 hardware revision - Jetpack 4.6.0 - average 623 GFlops/s

Upgrading the new boards to 5.6.1:
G.0 hardware revision - Jetpack 5.1.1 - average 305 GLops/s

I’ll continue my testing with a few other configurations and post the results here, but what I’m seeing is that the latest software improves the system performance, but even with the latest software package, the new hardware revision is still much slower than the old revisions.

Please include Jetpack 4.6.4 in your tests.

Here is an updated summary. I’ve expanded the testing to 16 GB modules too.

Upgrading to 5.1.1 fixes the performance problem. My previous results showed a half power difference because one system was in 10W mode, and the other in 20W mode.

8GB Xaviers with a hardware revision of G.0 and 16GB Xaviers with a hardware revision of C.0 exhibit slow performance (~150 GFlops/s) with Jetpack 4.6.0. This is compared to ~620 GFlops/s for older board revisions of both types.

All hardware revisions perform at the expected ~620 GFlops/s when running 5.1.1.

I’ll check the 4.6.4 performance tomorrow, I have to put together a 18.04 system to run the sdk manager.

1 Like

Finally got a system flashed with the 4.6.4. It looks like this Jetpack version does work with the G.0 hardware revision.

I guess this is a lesson in keeping up with the software updates when working with NVIDIA hardware and stocking up on spares when new PCN’s come out.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.