EMC Scaling Differences between Xavier Production and Dev Kit

I recently encountered an issue (that might actually be two issues) that was exposed when using a Xavier production SOM.
Our system utilizes an ethernet based camera using a Mellanox ConnectX-5 NIC, the vendor SDK utilized rivermax for data transfer. When tested on JP 5.0.2 on a Dev Kit Xavier, the system performed normally. However, when the same SW was loaded to a production Xavier SOM on a dev kit carrier board, the camera was unable to deliver frames at the normal rate. Through extensive debugging we found that setting the EMC to FreqOverride was needed on the production SOM (but is not needed or set on the Dev Kit).
When observing the systems on a clean reboot, it’s noted that the production SOM is able to scale the EMC clock down much lower than the Dev Kit Xavier.

This raises 2 questions:

  1. why is the behaviour different between the production and dev kit units? (it is possible that production is a misnomer here as this could simply be a binning difference between parts)
  2. why is the EMC scaling unable to detect the need to scale up? Is this an issue with the vendor SDK that we need to report back to them? if so, can you direct me to any documentation they might need to consult to ensure the EMC scaling responds correctly.

Thanks!

Hi,
Do you use Jetpack 4.6.2 or 4.6.3? Would like to know if you use Jetpack release.

This is using JP 5.0.2 installed from SDK manager. I’m unsure what you mean by release, this is not the ‘runtime’ variant.
Previously the production unit was running JP 4.4.1 and did not have this issue. (We did not test 4.6.2 or 4.6.3)
We want to move to JP 5.0.2 as we have parallel development on Orin’s and need the environment to be as closely matched as possible.

Hi,
Please run the command on production module and developer kit module and share the information for reference:

$ cat /etc/nv_boot_control.conf

And please share the steps so that we can try to reproduce it on Xavier production module.

Thanks, I will get this information as soon as possible. Unfortunately the production unit was needed in field testing (so was sent out once the freqoverride work around was found). I have an alternative unit that also showed the same symptoms and will need to to mount on a carrier and should be able to do early next week.

For steps to reproduce, that is difficult to share as we are using a vendor SDK and I am not able to share those details on a public forum. I can share the high level steps but not those specific details:

  1. Install JP 5.0.2. using sdkmanager according to documentation (completing both the OS install and loading of all JP packages + Deepstream)
  2. Install vendor SDK. This includes shared libraries, and rivermax (the vendor does not yet support the newest version of rivermax available with JP 5.0.2)
  3. Attempt a speed benchmark from the camera. On Dev Kit the Xavier is able to achieve full frame rate available from the camera, on Production Xavier the camera is only able to achieve 0.65fps
  4. Run set “FreqOverride” on the EMC clock using echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked on production Xavier
    4.1) Re-attemp speed benchmark and achieve full frame rate.

From the dev kit Xavier:

TNSPEC 2888-400-0001-J.0-1-2-jetson-agx-xavier-devkit-
COMPATIBLE_SPEC 2888-400-0001-E.0-1-2-jetson-agx-xavier-devkit-
TEGRA_CHIPID 0x19
TEGRA_OTA_BOOT_DEVICE /dev/mmcblk0boot0
TEGRA_OTA_GPT_DEVICE /dev/mmcblk0boot1

From the production module

TNSPEC 2888-400-0004-L.0-1-2-jetson-agx-xavier-devkit-
COMPATIBLE_SPEC 2888-400-0004--1-2-jetson-agx-xavier-devkit-
TEGRA_CHIPID 0x19
TEGRA_OTA_BOOT_DEVICE /dev/mmcblk0boot0
TEGRA_OTA_GPT_DEVICE /dev/mmcblk0boot1

Hi,
Could you share which release the vendor SDK supports? For each modules, the DRAM vendor is different and probably the vendor SDK does not include the new DRAM. This could be the reason the EMC scaling does not work.

The vendor SDK was authored for JP 5.0. However, this unit was previously able to achieve full frame rates when using JP 4.4.1 (this difference is a new change in JP 5.0.2). As the HW has remained unchanged, the underlying DRAM has not changed.

I just want to follow up on this so the auto moderator doesn’t close the topic. I’m still looking for an answer to the original two questions.

Hi,
Please check this config file and see if there is any clue:

nvidia@tegra-ubuntu:/$ ll /etc/nvpmodel.conf
lrwxrwxrwx 1 root root 32 Sep  8 09:58 /etc/nvpmodel.conf -> /etc/nvpmodel/nvpmodel_t194.conf

Not sure but probably certain device nodes are set in the vendor SDK. However, the nodes are not present in Jetpack 5 so emc clock is not well set. The power modes are listed in the config file. Please take a look and see if there is any clue.

Hi,
I think I need to clarify that the ‘vendor SDK’ I am referring to is to drive our camera and not the Xavier or its carrier board. It does not make any changes to the nvpmodel. It does install the rivermax and mellanox dependencies, but these are official releases (though it’s possible they are modifying these files).
I can check if there is any difference between the available dev nodes on the two different modules.

I have checked both the working dev kit, and the non working module and both have the same location for nvpmodel.conf

xavier@ubuntu:~$ ll /etc/nvpmodel.conf
lrwxrwxrwx 1 root root 32 Apr 21  2022 /etc/nvpmodel.conf -> /etc/nvpmodel/nvpmodel_t194.conf

Additionally there is no diff between the two files.

We have recently identified a dev kit unit where this effect is visible.
cat /etc/nv_boot_control.conf reports:

xavier@xavier-beetle:~$ cat /etc/nv_boot_control.conf
TNSPEC 2888-400-0004-L.0-1-2-jetson-agx-xavier-devkit-
COMPATIBLE_SPEC 2888-400-0004--1-2-jetson-agx-xavier-devkit-
TEGRA_CHIPID 0x19
TEGRA_OTA_BOOT_DEVICE /dev/mmcblk0boot0
TEGRA_OTA_GPT_DEVICE /dev/mmcblk0boot1

Just keeping this thread safe from the automod.

Hi,
Are you able to try Jetpack 5.1? Certain performance issues are fixed in 5.1 and probably it helps. Would be great if you can upgrade and give it a try.

I will see if we can find some time to test this. However, we did recently encounter issues upgrading a platform to JP 5.1 so we may be blocked until that is resolved (issue reported here: Cannot update to JP5.1 due to Multimedia API version mismatch)

As the above issue with JP5.1 was resolved we were able to test with JP5.1 on both units and confirm that the issue is resolved in the newest release.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.