Xavier NX Dev Kit booting problem (most likely because of I2C bus)

Hi,
My Xavier NX dev kit does not boot most of the time after power off. On very rare occasion it boots successfully after power off. Comparing the logs of successful boot and failed boot the difference come after it tries to read EEPROM on the module (I> Reading eeprom i2c=0 address=0x50).
success1.txt (27.7 KB)
fail1.txt (26.8 KB)

I’ve tried to flash different Jetpack version using SDK Manager. The behavior is the same.

The failed boot attempt ends up at “bash-4.4#” prompt.

Once the dev kits boots up successfully it can survive resets and be able to boot next time. But once it got disconnected from the power the chances that it will boot successfully are very slim.

One time the dev kits boots up successfully and I was able to check EEPROM content (“i2cdump -f -y 0x50”, and “i2cdump -f-y 0x57”), and verify CRC8 - EEPROM data looks good.

I believe that the problem is with I2C bus.

Is there any way to reset I2C bus while in the “bash-4.4#” prompt or to proceed to normal boot?

hello nikolay.khatuntsev,

may I know the repo steps in details, please also share the failure rate for reference,
thanks

Hello Jerry,
First I put a jumper on pin 9 and 10 to enable recovery mode, connect with USB cable to the PC. Power up and use SDK Manager to flash Jetpack (tried 4.5.1, 4.6, and 5.0.2). Sometimes flashing stuck maybe with 50% chances. If it not stuck then proceed with Ubuntu configuration on Dev Kit. After initial configuration Dev Kit resets and boots up successfully. Dev Kit can be reset multiple times, with successful boot after it. But once I shut down Ubuntu and disconnect all the cables including power from the Dev Kit – the Dev Kit will not boot successfully majority of the time. So far I’ve seen only one or two occasions where fully disconnected Dev Kit was able to boot up successfully after applying power. So it is maybe 2-5% chances of success.
Here is another reason why I believe it has something to do with I2C bus. Once I got thru all the process of flashing Jetpack and able to boot up Dev Kit. I’ve played with i2cdump and tried to access I2C address that was not there for example 0x55. I2cdump outputs all XX as of data, so it was not able to read it. Then I’ve tried to do a reset which normally will be successful, but not this time because I messed up with I2C bus. The board was not able to boot up properly and stopped at bash-4.4# prompt.

Is there anything I can do in bash-4.4# with I2C bus?

hello nikolay.khatuntsev,

just for double confirmation, may I also know the power-supply (DevKit uses 19V) you’re using?

I’m using power supply that came with the Dev Kit. It is LiteOn 19.0V, 2.37A.

hello nikolay.khatuntsev,

the boot flow is… cold-boot → mb1 → mb2 → cboot → kernel
we need to check which paragraph cause this failure, do you have serial console connected? please share the bootloader logs from power-on.

BTW,
can you workaround this by pressing the reset key to force reboot the target?

Jerry, the serial console log files are in the first message (success1.txt - Dev kit booted into Ubuntu GUI, fail1.txt - Dev Kit stuck at “bash-4.4#” prompt).
No, reset does not change the behavior. If Dev Kit booted successfully, reset will cause another successful boot. If the the Dev Kit booted unsuccessfully, then reset will not help.

hello nikolay.khatuntsev,

here shows the failures,

[0011.755] E> Error 892665857: Failed to get CVM EEPROM contents
[0011.761] E> Booting w/o MAC ddresses for WIFI, Bluetooth & Ethernet
[0011.767] E> Failed to get WIFI MAC address
...

you may refer to Jetson EEPROM Layout.
it’s cboot report errors for getting eeprom contents, such incomplete boot process and it stopped at bash-4.4# prompt.

may I know the complete device info,
are you using a Xavier NX developer kit? or, you’ve plug Xavier NX SOM to other carrier board?

Hi Jerry,
You are right, the problem is with reading EEPROM. Before this error in the log there is actually an error with I2C reading.
Here is the successful boot log:

[0001.459] I> Find /i2c@3160000’s alias i2c0
[0001.459] I> Reading eeprom i2c=0 address=0x50
[0001.485] I> Device at /i2c@3160000:0x50
[0001.486] I> Reading eeprom i2c=0 address=0x57
[0001.510] I> Device at /i2c@3160000:0x57
[0001.512] I> Find /i2c@c240000’s alias i2c1
[0001.512] I> Reading eeprom i2c=1 address=0x50

and here is failed boot log:

[0001.455] I> Find /i2c@3160000’s alias i2c0
[0001.455] I> Reading eeprom i2c=0 address=0x50
[0001.460] E> I2C: slave not found in slaves.
[0001.463] E> I2C: Could not write 0 bytes to slave: 0x00a0 with repeat start true.
[0001.471] E> I2C_DEV: Failed to send register address 0x00000000.
[0001.476] E> I2C_DEV: Could not read 256 registers of size 1 from slave 0xa0 at 0x00000000 via instance 0.
[0001.486] E> eeprom: Failed to read I2C slave device
[0001.491] I> Eeprom read failed 0x3526070d
[0001.495] I> Reading eeprom i2c=0 address=0x57
[0001.917] E> I2C: Timeout while polling for RX Fifo full. Last value 0x00800000.
[0001.918] E> I2C: Could not read 256 bytes from slave: 0x00ae with repeat start false.
[0001.919] E> I2C_DEV: Could not read data of size 256 at register address 0x00000000.
[0001.920] E> I2C_DEV: Could not read 256 registers of size 1 from slave 0xae at 0x00000000 via instance 0.
[0001.925] E> eeprom: Failed to read I2C slave device
[0001.930] I> Eeprom read failed 0x35260606
[0001.935] I> Find /i2c@c240000’s alias i2c1
[0001.938] I> Reading eeprom i2c=1 address=0x50

When the Dev Kit booted successfully I was able to capture EEPROM from 0x50 and 0x57 addresses. Both look normal and with correct CRC8. Here are the binaries of EEPROM content
eeprom_50.bin (256 Bytes)
eeprom_57.bin (256 Bytes)

I’m using Xavier NX Dev Kit.

hello nikolay.khatuntsev,

since I’ve never experience this on DevKits, need to double confirm whether it’s software or hardware issue.
is it possible to have cross validation, you may try replace SOMs for confirmation.
in addition, once the Dev Kit stuck at “bash-4.4#” prompt, are you able to recover the target by another cold-boot, or, even re-flash the target?

Unfortunately I don’t have any other Xavier NX module to replace it.

Resets do not help when the Dev Kit stuck in “bash-4-4#” prompt. Neither do power cycles.
Re-flashing might help with about 50% chances, but then it will last up until next power cycle.

Hi,

Want to clarify this issue. Are you saying that “sometimes” the board is not able to read your eeprom?

If this module is even not able to be read eeprom normally with jetpack + devkit board, then I would suggest RMA.

The issue here is we have plugin-manager to clarify which disk should boot from.

For example, if the NX-sd module is connected, then sdmmc1 will be enabled, while NX-emmc module is in use, sdmmc4 will be enabled.
And in both cases, mmc0 will be either your sd or emmc.

However, since CVM eeprom is gone, bootloader cannot tell above anymore.

We can “hack” this in device tree of course. But this does not make other functionality alive but just “partially” back to life… I would still suggest to RMA module if this is hardware defect.

Hi @WayneWWW,
Yes, it is not able to read EEPROM most of the times, but there are rare occasions when it able to read it. I believe it is HW issue related to I2C bus.
I’ll try to RMA this Dev Kit.
But out of curiosity, what would be the “hack” steps to force it to boot from SD card?

You need to modify the device tree manually.

In the source code of the device tree -
hardware/nvidia/platform/t19x/jakku/kernel-dts/common/tegra194-plugin-manager-p3668.dtsi

There is a logic:

47 		fragement-tegra-sdhci-sd-dis {
48 			ids = ">=3668-0001-000";
49 			override@0 {
50 				target = <&sdhci_sd>;
51 				_overlay_ {
52 					status = "disabled";
53 				};
54 			};
55 		};
56 
57 		fragement-tegra-sdhci-emmc-dis {
58 			ids = ">=3668-0000-000";
59 			override@0 {
60 				target = <&sdhci_emmc>;
61 				_overlay_ {
62 					status = "disabled";
63 				};
64 			};
65 		};

As you can see, we use the board ID to tell which node should be enabled/disabled. For sdcard module, we enable &sdhci_sd. And for emmc module, it would be &sdhci_emmc.

The problem here is the device no longer provides the ids out because eeprom fails to read. Thus, the behavior would be unexpected.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.