I recently encountered an issue (that might actually be two issues) that was exposed when using a Xavier production SOM.
Our system utilizes an ethernet based camera using a Mellanox ConnectX-5 NIC, the vendor SDK utilized rivermax for data transfer. When tested on JP 5.0.2 on a Dev Kit Xavier, the system performed normally. However, when the same SW was loaded to a production Xavier SOM on a dev kit carrier board, the camera was unable to deliver frames at the normal rate. Through extensive debugging we found that setting the EMC to FreqOverride was needed on the production SOM (but is not needed or set on the Dev Kit).
When observing the systems on a clean reboot, it’s noted that the production SOM is able to scale the EMC clock down much lower than the Dev Kit Xavier.
This raises 2 questions:
why is the behaviour different between the production and dev kit units? (it is possible that production is a misnomer here as this could simply be a binning difference between parts)
why is the EMC scaling unable to detect the need to scale up? Is this an issue with the vendor SDK that we need to report back to them? if so, can you direct me to any documentation they might need to consult to ensure the EMC scaling responds correctly.
This is using JP 5.0.2 installed from SDK manager. I’m unsure what you mean by release, this is not the ‘runtime’ variant.
Previously the production unit was running JP 4.4.1 and did not have this issue. (We did not test 4.6.2 or 4.6.3)
We want to move to JP 5.0.2 as we have parallel development on Orin’s and need the environment to be as closely matched as possible.
Thanks, I will get this information as soon as possible. Unfortunately the production unit was needed in field testing (so was sent out once the freqoverride work around was found). I have an alternative unit that also showed the same symptoms and will need to to mount on a carrier and should be able to do early next week.
For steps to reproduce, that is difficult to share as we are using a vendor SDK and I am not able to share those details on a public forum. I can share the high level steps but not those specific details:
Install JP 5.0.2. using sdkmanager according to documentation (completing both the OS install and loading of all JP packages + Deepstream)
Install vendor SDK. This includes shared libraries, and rivermax (the vendor does not yet support the newest version of rivermax available with JP 5.0.2)
Attempt a speed benchmark from the camera. On Dev Kit the Xavier is able to achieve full frame rate available from the camera, on Production Xavier the camera is only able to achieve 0.65fps
Run set “FreqOverride” on the EMC clock using echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked on production Xavier
4.1) Re-attemp speed benchmark and achieve full frame rate.
Could you share which release the vendor SDK supports? For each modules, the DRAM vendor is different and probably the vendor SDK does not include the new DRAM. This could be the reason the EMC scaling does not work.
The vendor SDK was authored for JP 5.0. However, this unit was previously able to achieve full frame rates when using JP 4.4.1 (this difference is a new change in JP 5.0.2). As the HW has remained unchanged, the underlying DRAM has not changed.
Not sure but probably certain device nodes are set in the vendor SDK. However, the nodes are not present in Jetpack 5 so emc clock is not well set. The power modes are listed in the config file. Please take a look and see if there is any clue.
I think I need to clarify that the ‘vendor SDK’ I am referring to is to drive our camera and not the Xavier or its carrier board. It does not make any changes to the nvpmodel. It does install the rivermax and mellanox dependencies, but these are official releases (though it’s possible they are modifying these files).
I can check if there is any difference between the available dev nodes on the two different modules.