Issue with 2888-0004-401 F.0 AGX Xavier modules

Hello,

We are currently facing a major production issue regarding AGX Xaviers, currently still running on BSP 32.7.2, with A/B redundant rootfs enabled.

First, less critical issue, is that since some time (how long is unknown, investigation is ongoing), the AGX Xavier we receive have the product number 2888-0004-401 instead of the usual 2888-0004-400.
This is causing an issue to OTA reflash the rootfs as the ota_board_specs.conf config file doesn’t allow for the 401 FAB. So currently all newer machines cannot have their rootfs reflashed OTA (BSP 32.7.2 → BSP 32.7.2) for system upgrade until we generate new update artifacts.

We haven’t received nor seen any notification that the product number will change on newly shipped AGX Xavier modules.
We don’t now how many such modules are already installed in machines at client’s premises. Investigation is also ongoing.

Secondly a very critical issue is that the newest F.0 board revision of the 2888-0004-401 modules seem to be unable to run properly to a point where they are unable to run our software.
Board revision B.0 of 2888-0004-401 seems fine (beside the first issue).
They seem to have performance issues. One symptom is that SSD IO is limited to only 1 GB/s, 2 to 3 times less than usual on the same SSD. This is by itself not preventing our software to run, so is only a symptom of some other underlying issue. More investigation are ongoing to understand the issue more.

Is there a known problem with the F.0 revision or is there a PCN we didn’t receive that requires some patching of the rootfs / kernel / device tree to support the F.0 revision ?
We haven’t received, seen nor were unable to find a PCN for this.

Currently we still have some stock of B.0 revision in one of our factories so production is not yet completely halted. We don’t know yet how long.

Best regards,

Martin

Hi,

PCN update should be this one.

Please try to upgrade the BSP and check if the perf issue is still there.

If it is, please show us how to reproduce this issue locally.

1 Like

Yes, we are already on this BSP 32.7.2 because of this PCN 208560 which is the last one we received.
So all our modules have either been directly flashed with this one or received an image based OTA upgrad from BSP 32.6.1 to BSP 32.7.2.

We are performing more tests, including trying to retro-fit a 2888-0004-0401 F.0 AGX Xavier module on a devkit carrier board to try to reproduce it there.

I would suggest you can try some new BSP like 32.7.5 if a debug is required here.

1 Like

Thanks Wayne,

Further investigations showed the following running a 2888-0004-401 F.0 AGX Xavier module:

Carrier board \ BSP Nvidia’s vanilla BSP 32.7.2 Third party’s carrier board specific BSP 32.7.2
Nvidia’s devkit carrier board Full performance Strongly degraded performance
Third party’s carrier board Full performance (even if board not fully supported) Strongly degraded performance

So the issue is clearly coming from our third party carrier board provider’s BSP patches to add support for their carrier board.
Either they broke something doing their own patches or they didn’t properly integrate Nvidia’s updates when upgrading jetpacks.

We now have to work with our carrier board provider as it is clearly a bug in their software.

We might have to come back to Nvidia in case we need more details about how exactly PCN 208560 was added to BSP 32.7.2 (device tree change, kernel config change, kernel patch ?) in case our provider is unable to fix the issue and we’ll have to fix it ourselves.

Thanks for your support !

/Martin

Edit: We can confirm, those boards are running a 10x slower RAM clock than expected, explaining the strong performance degradation.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.