Could it somehow be related to the SwReserved flags (-r) used? Is the watchdog flag interfering perhaps?
Could it somehow be related to the use of a 3072bit PKC key?
Could it somehow be related to loading KEK0 & KEK1 into the fuses?
Could it somehow be related to fusing too many things in a single go? Did it somehow damage the QSPI with a rogue current somewhere?
Would you be able to share a Known Good set of commands to fuse a Xavier NX eMMC board, including key generation? I could take one more board and do those exact steps and see whether I get a different outcome (but I am running low on available boards; we did not budget on losing that many to the process).
When you say that you’ve confirmed fuse burning & flashing works on Xavier NX devkits, do you also mean that you’ve only tested the SD-Card version of the module? Is that the key difference why I’m seeing these failures?
I will be off on holiday leave after today, but if you find a resolution to this issue I’ll see the notification and can give it a try on my remaining available board.
fuse burning it only works with production modules. i.e. with internal eMMC
please also check Topic 158361 as see-also for the steps to fuse/flash the target, we’ve also test with JetPack-4.6 and it return success, thanks
Okay, that rules out the SD-Card vs eMMC as a possible explanation then.
The instructions in the linked topic use the deprecated -c argument instead of --auth, and unless I’m mistaken as written it for when the board is already in PKC mode? For a blank module, should it not be --auth NS?
What size RSA key have you tested with? 2048 or 3072?
Another difference I see is that JTAG was left enabled on that thread, whereas we’ve disabled it. Not that I’d expect that to cause the failure I’m seeing, but it is a difference.
I’ve sourced a Xavier-NX DevKit carrier board now and am testing with it. I’m using the previously fused production eMMC module, and the power supply that came with the dev kit. Attempting to flash gives the same error on the debug console. With debug level output enabled I see:
Same flash command as posted above.
I have not yet fused a module using this devkit. I have one module left, and would like the clarifications I requested earlier, as otherwise I would just repeat what I have already done. At this point I have no reason to expect anything would be different if I simply execute the same commands again.
what’s the combination you’re using in the very first fuse burning/ image flashing.
for example, did you had Xavier NX SOM on the DevKit to burn the fuse? or, you’re having customize carrier board to perform those steps?
So far, this has all been with p3449 (Jetson Nano DevKit carrier) + p3668 (Jetson Xavier NX eMMC module). I will re-test once more using the p3509 (Jetson Xavier NX DevKit carrier) now that I have it.
For your reference, I have linked the complete logs - both the terminal and the serial debug console (they were too large to include here directly). The PKC key used is a 3072bit RSA key, generated as per the documentation.
I now have three boards which all are bricked in the same manner. Please advise what is wrong with the steps I’ve taken, or where I should send these boards so NVIDIA can investigate in detail why fusing breaks things. This issue is clearly 100% reproducible for me. I can share our keys if necessary to resolve this (we’ll generate new ones afterwards in that case).
Console log
Full log, start to finish, as captured by script(1):
Serial debug log
Full serial debug log, as captured by minicom(1):
If that fuse activates a watchdog which isn’t being patted, that would explain the odd failure mode. I’ll source yet another module then and burn without that fuse bit. May I suggest updating the DA-09876-001 document (Xavier NX Fuse Specification) to state that this watchdog is NOT supported by the software and DO NOT BURN?
The reason for wanting the watchdog enabled as early as possible after boot is for reliability. These devices will be located (very) remote, and when over the air upgrades are pushed to them it is imperative that the device remains functional - either running the new version, or rolling back to the previous version. We are using an A/B scheme, so as long as the device reboots on failure the system should recover and remain functional. Having the watchdog enabled as early as possible in the boot process gives us best reliability.
as you can see in the other thread, Topic 200592.
we have arrange resources to check this internally. will also share the details after we have conclusions.
besides,
we’ve verified the fuse process on Xavier series, please also refer to Topic 117585 as see-also. thanks
I’ll suggest you don’t touch FUSE_RESERVED_SW[23:0] before we conclude the issue,
we’ve test several devices with PKC+SBK and also KEKs, but we haven’t test with burning sw_reserved.
I saw your post on the linked thread about DisableWdtGlobally = 1;. I didn’t know whether it would be applicable to the Xavier NX as well, but gave it a go. I still the same error even after adding that, so either DisableWdtGlobally is not applicable to the NX, or I’m looking at a different issue here.
I tried adding it to both Linux_for_Tegra/bootloader/t186ref/BCT/tegra194-mb1-soft-fuses-l4t.cfg and Linux_for_Tegra/bootloader/tegra194-mb1-soft-fuses-l4t.cfg.
thanks for sharing test results,
would you please modify Xavier-NX’s configuration file, p3668.conf.common;
please toggle the bit-16 as zero, you may configure ODMDATA as… ODMDATA=B8180000 to test again.