Flashing Xavier NX after SBKPKC fusing fails during GPT writing

how about tried with Xavier NX DevKits, we’ve confirmed fuse burning/image flashing works.

I do not have one of those on hand, due to the supply issues of them. It was hard enough sourcing the Xavier NX modules themselves I’m told.

Hi Jerry,

When you say that you’ve confirmed fuse burning & flashing works on Xavier NX devkits, do you also mean that you’ve only tested the SD-Card version of the module? Is that the key difference why I’m seeing these failures?

I will be off on holiday leave after today, but if you find a resolution to this issue I’ll see the notification and can give it a try on my remaining available board.

Thanks,
/Johny

hello jmattsson,

fuse burning it only works with production modules. i.e. with internal eMMC
please also check Topic 158361 as see-also for the steps to fuse/flash the target, we’ve also test with JetPack-4.6 and it return success, thanks

Hi Jerry,

Okay, that rules out the SD-Card vs eMMC as a possible explanation then.

The instructions in the linked topic use the deprecated -c argument instead of --auth, and unless I’m mistaken as written it for when the board is already in PKC mode? For a blank module, should it not be --auth NS?

What size RSA key have you tested with? 2048 or 3072?
Another difference I see is that JTAG was left enabled on that thread, whereas we’ve disabled it. Not that I’d expect that to cause the failure I’m seeing, but it is a difference.

Thanks.

Hi Jerry,

I’ve sourced a Xavier-NX DevKit carrier board now and am testing with it. I’m using the previously fused production eMMC module, and the power supply that came with the dev kit. Attempting to flash gives the same error on the debug console. With debug level output enabled I see:

entering main console loop
] ��[0189.974] I> tegrabl_gpio_driver_register: register 'nvidia,tegra194-gpio' 
driver
[0190.126] D> tegrabl_gpio_driver_init: tegra gpio driver:nvidia,tegra194-gpio r
egistered successfully
[0190.139] D> Found gpio driver 'nvidia,tegra194-gpio' in list
[0190.140] I> tegrabl_gpio_driver_register: register 'nvidia,tegra194-gpio-aon' 
driver
[0190.148] D> tegrabl_gpio_driver_init: tegra gpio driver:nvidia,tegra194-gpio-a
on registered successfully
[0190.157] I> tegrabl_tca9539_init: i2c bus: 1, slave addr: 0x46
[0190.167] W> fetch_driver_phandle_from_dt: failed to get node with compatible t
i,tca9539
[0190.175] W> fetch_driver_phandle_from_dt: failed to get node with compatible n
xp,tca9539
[0190.179] W> tegrabl_tca9539_init: failed to fetch phandle from dt
[0190.185] I> tegrabl_tca9539_init: i2c bus: 1, slave addr: 0x44
[0190.195] W> fetch_driver_phandle_from_dt: failed to get node with compatible t
i,tca9539
[0190.203] W> fetch_driver_phandle_from_dt: failed to get node with compatible n
xp,tca9539
[0190.207] W> tegrabl_tca9539_init: failed to fetch phandle from dt
[0190.213] D> regulator framework initialized
[0190.219] I> fixed regulator driver initialized
[0190.223] D> register 'vdd-ac-bat' regulator
[0190.226] D> register 'vdd-sdmmc1-sw' regulator
[0190.230] D> 0x13 0x32 0x0
[0190.233] D> register 'vdd-1v8-sd' regulator
[0190.237] D> register 'vdd-3v3-cvb' regulator
[0190.242] D> register 'vdd-1v8-cvb' regulator
[0190.246] D> register 'vdd-epb-1v0' regulator
[0190.250] D> register 'avdd-cam-2v8' regulator
[0190.254] D> 0x13 0x68 0x0
[0190.257] D> register 'vdd-fan' regulator
[0190.261] D> register 'vdd-hdmi-5v0' regulator
[0190.265] D> register 'vdd_sys_en' regulator
[0190.269] D> register 'vdd-1v8-aud2' regulator
[0190.273] D> 0xc8 0xb 0x1
[0190.276] I> CPU: Nvidia Carmel
[0190.278] I> CPU: MIDR: 0x4e0f0040, MPIDR: 0x80000000
[0190.283] I> chip revision : A02P
[0190.286] I> Boot-device: eMMC
[0190.289] I> Boot_device: SDMMC_BOOT instance: 3
[0190.294] D> Instance: 3
[0190.296] D> sdmmc init
[0190.298] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.305] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.696] D> sdmmc send command failed, error = f0f0706
[0190.696] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.697] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.697] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.698] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.702] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.708] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.788] I> sdmmc DDR50 mode
[0190.794] D> DDR Data width = 6,[0190.799] D> sdmmc DDR50 mode enabled
[0190.799] D> Init boot device
[0190.799] D> Init user device
[0190.800] I> sdmmc-3 params source = safe params
[0190.800] D> Qspi using gpc-dma
[0190.800] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.801] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.803] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.810] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.816] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.823] I> QSPI source rate = 19200 Khz
[0190.826] I> Requested rate for QSPI clock = 19000 Khz
[0190.832] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.838] I> BPMP-set rate for QSPI clk = 19200 Khz
[0190.843] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.849] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.856] D> tegrabl_ccplex_bpmp_wait_for_slave_ack: Got ack from slave
[0190.862] I> QSPI Flash Size = 32 MB
[0190.868] E> CR3V cmd failed, (err:0x0)
[0190.871] E> CR3V: Blank Check enable failed, (err:0x0)
[0190.874] I> Qspi initialized successfully
[0190.878] I> qspi flash-0 params source = safe params
[0190.883] D> Instance: 3
[0190.885] I> sdmmc bdev is already initialized
[0190.890] I> sdmmc-3 params source = safe params
[0190.894] D> Publishing device 00000003
[0190.902] D> Selected access_region = 1
[0190.904] D> GPT Signature check failed
[0190.909] D> Selected access_region = 1
[0190.914] D> Selected access_region = 2
[0190.916] D> GPT Signature check failed
[0190.916] D> Could not find GPT
[0190.919] W> Cannot find any partition table for 00000003
[0190.924] D> Failed to publish 00000003
[0190.928] D> Publishing device 00010003
[0190.936] D> Selected access_region = 0
[0190.942] D> Selected access_region = 0
[0190.945] D> GPT(primary) successfully read
[0190.945] D> 01] Name APP
[0190.946] D> Start sector: 40
[0190.948] D> Num sectors : 29360128
[0190.952] D> Size        : 15032385536
[0190.955] D> Ptype guid  : ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[0190.961] D> 02] Name kernel
[0190.964] D> Start sector: 29360168
[0190.967] D> Num sectors : 131072
[0190.970] D> Size        : 67108864
[0190.974] D> Ptype guid  : ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[0190.979] D> 03] Name kernel_b
[0190.982] D> Start sector: 29491240
[0190.986] D> Num sectors : 131072
[0190.989] D> Size        : 67108864
[0190.992] D> Ptype guid  : ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[0190.998] D> 04] Name kernel-dtb
[0191.001] D> Start sector: 29622312
[0191.004] D> Num sectors : 896
[0191.007] D> Size        : 458752
[0191.010] D> Ptype guid  : ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[0191.016] D> 05] Name kernel-dtb_b
[0191.019] D> Start sector: 29623208
[0191.023] D> Num sectors : 896
[0191.026] D> Size        : 458752
[0191.029] D> Ptype guid  : ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[0191.034] D> 06] Name recovery
[0191.037] D> Start sector: 29624104
[0191.041] D> Num sectors : 129024
[0191.044] D> Size        : 66060288
[0191.047] D> Ptype guid  : ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[0191.053] D> 07] Name recovery-dtb
[0191.056] D> Start sector: 29753128
[0191.060] D> Num sectors : 1024
[0191.062] D> Size        : 524288
[0191.066] D> Ptype guid  : ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[0191.071] D> 08] Name kernel-bootctrl
[0191.075] D> Start sector: 29754152
[0191.078] D> Num sectors : 512
[0191.081] D> Size        : 262144
[0191.084] D> Ptype guid  : ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[0191.090] D> 09] Name kernel-bootctrl_b
[0191.094] D> Start sector: 29754664
[0191.097] D> Num sectors : 512
[0191.100] D> Size        : 262144
[0191.103] D> Ptype guid  : ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[0191.109] D> 10] Name RECROOTFS
[0191.112] D> Start sector: 29755176
[0191.115] D> Num sectors : 614400
[0191.118] D> Size        : 314572800
[0191.122] D> Ptype guid  : ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[0191.127] D> 11] Name UDA
[0191.130] D> Start sector: 30369576
[0191.133] D> Num sectors : 407735
[0191.136] D> Size        : 208760320
[0191.140] D> Ptype guid  : ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
[0191.146] D> All valid entries are found
[0191.149] I> Found 11 partitions in SDMMC_USER (instance 3)
[0191.155] D> Publishing device 00030000
[0191.159] D> DMA channel 1 is busy
[0191.162] D> GPT Signature check failed
[0191.166] D> DMA channel 1 is busy
[0191.169] D> GPT Signature check failed
[0191.172] D> Could not find GPT
[0191.175] W> Cannot find any partition table for 00030000
[0191.181] D> Failed to publish 00030000
[0191.184] I> Recovery boot_type: 0
[0191.187] I> Entering 3p server
[0191.190] D> Transport interface is USB
[0191.194] I> USB configuration success
[0191.198] D> nv3p: Enable checksum verification
[0194.237] I> Populate storage info
[0194.246] I> Erasing device 3: 0
[0194.246] I> QSPI: Erasing entire device
[0197.249] I> Writing device 3: 0.
[0197.412] D> Publishing device 00030000
[0197.412] D> DMA channel 1 is busy
[0197.413] D> GPT Signature check failed
[0197.413] D> Could not find GPT
[0197.413] W> Cannot find any partition table for 00030000
[0197.413] D> Failed to publish 00030000
[0197.414] E> NV3P_SERVER: Failed to initialize partition table from GPT.

Is that “DMA channel 1 is busy” any hint here?

hello jmattsson,

please share the complete steps that you’re used to fuse burning and flash process.
thanks

Same flash command as posted above.
I have not yet fused a module using this devkit. I have one module left, and would like the clarifications I requested earlier, as otherwise I would just repeat what I have already done. At this point I have no reason to expect anything would be different if I simply execute the same commands again.

hello jmattsson,

what’s the combination you’re using in the very first fuse burning/ image flashing.
for example, did you had Xavier NX SOM on the DevKit to burn the fuse? or, you’re having customize carrier board to perform those steps?

Hi Jerry,

So far, this has all been with p3449 (Jetson Nano DevKit carrier) + p3668 (Jetson Xavier NX eMMC module). I will re-test once more using the p3509 (Jetson Xavier NX DevKit carrier) now that I have it.

Brand new Xavier NX eMMC module.
Xavier NX DevKit carrier board.
Xavier NX DevKit power supply.
Clean Linux_for_Tegra setup.
Same failure.

Overview of steps taken (same as in an earlier post above):

$ tar xf jetson_linux_r32.6.1_aarch64.tbz2
$ tar xf secureboot_r32.6.1_aarch64.tbz2
$ tar -C Linux_for_Tegra/rootfs/ -xf tegra_linux_sample-root-filesystem_r32.6.1_aarch64.tbz2
$ cd Linux_for_Tegra
$ # copy our keys into keys/ directory
$ mkdir rootfs/boot/extlinux
$ cp bootloader/extlinux.conf rootfs/boot/extlinux/
$ sudo ./flash.sh jetson-xavier-nx-devkit-emmc mmcblk0p1
$ sudo env BOARDID=3668 BOARDSKU=0001 BOARDREV=N/A FAB=100 FUSELEVEL=fuselevel_production ./nvmassfusegen.sh -i 0x19 --auth NS --disable-jtag -r 0x28 -k keys/pkc-sign-key.pem -S keys/sbk.1x128.key --KEK0 keys/kek0.1x128.key --KEK1 keys/kek1.1x128.key --KEK2 keys/kek2.1x128.key -p jetson-xavier-nx-devkit-emmc
$ tar xf mfuse_jetson-xavier-nx-devkit-emmc.tbz2
$ cd mfuse_jetson-xavier-nx-devkit-emmc/
$ sudo ./nvmfuse.sh
$ cd ..
$ sudo ./flash.sh -u keys/pkc-sign-key.pem -v keys/sbk.1x128.key -s keys/pkc-sign-key.pem -y SBKPKC jetson-xavier-nx-devkit-emmc mmcblk0p1

For your reference, I have linked the complete logs - both the terminal and the serial debug console (they were too large to include here directly). The PKC key used is a 3072bit RSA key, generated as per the documentation.

I now have three boards which all are bricked in the same manner. Please advise what is wrong with the steps I’ve taken, or where I should send these boards so NVIDIA can investigate in detail why fusing breaks things. This issue is clearly 100% reproducible for me. I can share our keys if necessary to resolve this (we’ll generate new ones afterwards in that case).

Console log
Full log, start to finish, as captured by script(1):

Serial debug log
Full serial debug log, as captured by minicom(1):

hello jmattsson,

please check this similar discussion thread, Topic 200592.
this should be device rebooting for every 15-second to cause the failure.

however, may I know what’s use-case to burn this fuse, the watchdog is enabled by default and you don’t need to program enable_watchdog fuse.
thanks

Ah, now we’re getting somewhere!

If that fuse activates a watchdog which isn’t being patted, that would explain the odd failure mode. I’ll source yet another module then and burn without that fuse bit. May I suggest updating the DA-09876-001 document (Xavier NX Fuse Specification) to state that this watchdog is NOT supported by the software and DO NOT BURN?

The reason for wanting the watchdog enabled as early as possible after boot is for reliability. These devices will be located (very) remote, and when over the air upgrades are pushed to them it is imperative that the device remains functional - either running the new version, or rolling back to the previous version. We are using an A/B scheme, so as long as the device reboots on failure the system should recover and remain functional. Having the watchdog enabled as early as possible in the boot process gives us best reliability.

hi jmattsson,

as you can see in the other thread, Topic 200592.
we have arrange resources to check this internally. will also share the details after we have conclusions.

besides,
we’ve verified the fuse process on Xavier series, please also refer to Topic 117585 as see-also. thanks

Hi Jerry,

Would you advise waiting before I fuse another board with -r 0x8, to see whether the boards done with -r 0x28 will become usable first?

hi jmattsson,

I’ll suggest you don’t touch FUSE_RESERVED_SW[23:0] before we conclude the issue,
we’ve test several devices with PKC+SBK and also KEKs, but we haven’t test with burning sw_reserved.

Hi Jerry,

I saw your post on the linked thread about DisableWdtGlobally = 1;. I didn’t know whether it would be applicable to the Xavier NX as well, but gave it a go. I still the same error even after adding that, so either DisableWdtGlobally is not applicable to the NX, or I’m looking at a different issue here.

I tried adding it to both Linux_for_Tegra/bootloader/t186ref/BCT/tegra194-mb1-soft-fuses-l4t.cfg and Linux_for_Tegra/bootloader/tegra194-mb1-soft-fuses-l4t.cfg.

hello jmattsson,

thanks for sharing test results,
would you please modify Xavier-NX’s configuration file, p3668.conf.common;
please toggle the bit-16 as zero, you may configure ODMDATA as… ODMDATA=B8180000 to test again.

Hi Jerry,

I get the same error even after changing ODMDATA to ODMDATA=B8180000 in p3668.conf.common.

hello jmattsson,

okay, thanks for testing.
could you please gather the complete logs, i.e. $ dmesg --follow
we need to check the details to indicate the system reboot caused by WDT.