Hi Dane,
I have deployed an orin nx on an official nano devkit using stock 35.6.2 and disabling spread spectrum as proposed by Nvidia as a “fix” for this very unfortunate hardware issue.
So far and with limited number of reboot attempts, I have already faced 2 crashes as an unhandled exception in EL3 that usually are being take care by our counter-measures. First occurrence was after a mere 56 reboot attempt. A bit sad for the ultimate “fix”.
As an outcome, the proposed fix is clearly not enough to prevent boot swap and / or unhandled exception in EL3 and this also have the downside to require to pass again the EMC validations.
Where are you with the investigations on your end? How many reboot and modules were used on your side?
Also another customer report similar issues on a agx orin ( Inquiry regarding [NvmExpressDxe] Assertion Error and Read-only File System - #4 by kayccc ) and without reply at all. This means that all three flavors are being affected. We have also in our fleet thousand of agx xavier. Should we be worried that they might be also affected?
This issue seems to attract more and more developers who have now started to stress-test reboot using an A/B scheme.
Is it possible to get a meaningful update from internal, including any progress and experiments made so far?
Is Nvidia taking this issue seriously? Your last reply was 37 days ago! And many input from different customers on different ticket were provided in between. Many of your customers including ourself have been spending a lot of efforts, time and money to find a solutions.
I am truly not impressed by the level of support provided. This is shameful!