Cannot reflash AGX device - incorrect EEPROM values?

I’m using the AGX Developer Kit and installed a data acquisition PCIe card. When I ran the card’s config program, the ethernet card started acting strange and I saw some I2C errors (tegra-i2c 3160000.i2c no acknowledgement) for address 0x50 through 0x57. I believe the config program inadvertently wrote to the AGX EEPROM. I tried to re-flash the AGX using the SDK manager, but got an error “Parsing board ID (calculated 0xe9 != stored 0xb7).

Looking at the “Jetson Module EEPROM Layout” page online I saw several suspicious things. Firstly, the page says that the EEPROM is on I2C bus 2, but when I ran i2cdump -y -f 2 0x50 I saw XX values for every address and a flood of “tegra-ic2 3180000.i2c no acknowledgement” errors. When I tried it on bus 0 using i2cdump -y -f 0 0x50 I saw what looked to be the EEPROM values:

     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00: 00 00 ff 00 48 0b 04 00 04 4c 00 00 00 00 00 00
10: 00 00 00 00 36 39 39 2d 38 32 38 38 38 2d 30 30
20: 30 34 2d 34 30 30 20 4c 2e 30 00 00 00 00 00 00
30: 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff
40: ff ff ff ff f8 9d 3a 2d b0 48 31 35 36 30 35 32
50: 31 30 30 38 33 35 39 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 4e 56 43 42 1c 00 4d 31 00 00
a0: ff ff ff ff ff ff ff ff ff ff ff ff f8 9d 3a 2d
b0: b0 48 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e9

I found 2 problems:

  1. Address 0x00 is 0x00 but should be 0x01
  2. Address 0x02 is 0xFF but should be 0xFC

I used i2cset to fix both of these, and verified that the new CRC is, in fact, 0xe9 as was already set in address 0xFF. Here is the updated output from i2cdump -y -f 0 0x50:

     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00: 01 00 fc 00 48 0b 04 00 04 4c 00 00 00 00 00 00
10: 00 00 00 00 36 39 39 2d 38 32 38 38 38 2d 30 30
20: 30 34 2d 34 30 30 20 4c 2e 30 00 00 00 00 00 00
30: 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff
40: ff ff ff ff f8 9d 3a 2d b0 48 31 35 36 30 35 32
50: 31 30 30 38 33 35 39 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 4e 56 43 42 1c 00 4d 31 00 00
a0: ff ff ff ff ff ff ff ff ff ff ff ff f8 9d 3a 2d
b0: b0 48 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e9

However, I still cannot flash the device and get the same Parsing error. My questions:

  1. Is the EEPROM for the AGX really on I2C bus 0? If not, how would I change it to the correct bus?
  2. Are all of the values in my i2cdump listing above correct for the AGX?
  3. What I2C bus and address(es) is the SDK Manager looking at to parse the board ID, version, SKU, etc?
  4. Which I2C bus should the PCIe (C5) use? I’ll see if the card manufacturer can change their software.
1 Like

I don’t have the board so is not able to give you the correct i2c bus mapping on the board.

But you can try to dump every 0x50 on each i2c bus and check the value on the right. Check whether there is some keyword as 2888. If there is such value, then it is the module eeprom.

If it is something like 2822, then it is carrier board eeprom, which is not in use during flash.

https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/jetson_eeprom_layout.html

Thanks, WayneWWW.

I ran i2cdump -y -f [bus] 0x50 for busses 0 - 8 and here are the results:

0: Looks like the EEPROM. Bytes 20-49 show a part number of “699-82888-0004-400 L.0”
1: All values are XX
2: All values are XX
3: All values are XX (but it runs very slowly)
4: All values are XX
5: All values are XX (but it runs very slowly)
6: All values are XX (but it runs very slowly)
7: All values are XX
8: All values are XX

Oh sorry that I made a mistake. The 0x50 shall be module eeprom and 0x57 shall be the carrier board eeprom.

Please also check your i2c bus 0 with 0x57 and it shall give you p2822 carrier board eeprom.

First off, I very much appreciate the quick response! I’m in real trouble here and this project is for Boeing Satellite.

Hmmm, nothing on 0x57, but running i2cdetect -y -r 0 shows:

     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- -- 
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
50: 50 -- -- -- -- -- 56 -- -- -- -- -- -- -- -- -- 
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
70: -- -- -- -- -- -- -- --

i2cdump -y -f 0 0x56 outputs:

No size specified (using byte-data access)
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 00 00 ff 00 06 0b 00 00 07 4b 00 00 00 00 00 00    ....??..?K......
10: 00 01 d8 1a 36 39 39 2d 38 32 38 32 32 2d 30 30    .???699-82822-00
20: 30 30 2d 37 30 30 20 4b 2e 30 00 00 00 00 00 00    00-700 K.0......
30: 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
40: ff ff ff ff ff ff ff ff ff ff 31 35 36 30 35 32    ..........156052
50: 31 30 30 36 38 37 32 00 00 00 00 00 00 00 00 00    1006872.........
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
90: 00 00 00 00 00 00 46 46 46 46 ff ff 46 46 ff ff    ......FFFF..FF..
a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
b0: ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b7    ...............?

So, this is the EEPROM on the carrier board? Should these values also follow the guidelines in the “Jetson Module EEPROM Layout” document?

Yes, it should follow the layout document. But the weird part is this should not matter during flash.

What I can only guess is, are you trying to flash jetpack5 but not jetpack4.x?

I’ve tried both JP 4.6 and JP 4.6.1

Could you share me your sdkmanager or flash.sh failure log?

Yes, but give me a little time. The Carrier Board EEPROM values are a mess. I’m going to try to straighten them out and re-flash. If that fails I’ll provide the error logs. Thanks again.

The real part I am worried about is “carrier board” EEPROM should not be checked during flash at all.

This is not a mandatory design. Most of custom boards from other vendors don’t even have such eeprom on their board.

I believe this is because I’m flashing to the NVMe on the carrier board? I’ve updated the Carrier Board’s EEPROM values and the system is currently flashing. It used to fail at 2% with the “parsing” errors, but now it’s at 38%. Fingers crossed!

Flashing finished! The ethernet card still shows some errors in dmesg, but at least I have a working base system again and can troubleshoot from there.

Thank you very much for the fast support!

Hi @chris.brahmer ,

No matter what is being flashed, carrier board eeprom should not be read.
If this is reproducible on devkit, please let me know how you reproduce this issue.

Hi @WayneWWW,

The DAQ PCIe card manufacturer confirmed that they had some offending I2C code in their configuration program, so that explains the errant EEPROM values on both the Module and Carrier Board. I’m not sure if there’s a way to find out if anything else was corrupted. We have 3 AGX Dev Kits and, unfortunately, I ran the config program on all of them, so I no longer have a clean baseline. :/

Before I fixed any of the values in either EEPROM, the flashing would fail at 2% complaining of errors “Parsing board ID” or “version” or “SKU”. In each case the error message contained a calculated CRC-8 value that differed from the stored value. Sometimes the stored value matched what was in the Module’s EEPROM CRC value (address 0xFF) and other times it would match the Carrier Board’s EEPROM CRC value.

First, I corrected the values in the Module’s EEPROM, thinking that the Carrier Board is not a part of the flashing process. After that, the flashing failed again, but only had errors with CRC-8 values that didn’t match the Carrier Board’s value. So it seemed that the Module was good. I then fixed the Carrier Board’s values and then flashed succeeded.

Again, since JP 4.6 I’ve been flashing directly to the NVMe mounted on the Dev Kit Carrier Board, so perhaps this involves reading the Carrier Board’s EEPROM? Just a guess.

Thank you for the great support.

I have to say that since you don’t actually share any log, I don’t know what to say.

If you don’t want to take care about this issue anymore, then it is fine.

Oh, I think you might have understood me. That was a sincere “thank you” for helping me get my AGX back online! Once I try the flash on my other corrupted AGXs I’ll provide the flash logs. Sorry for any confusion.

Sure. No problem here. You can file a new topic for that since this thread would be closed by forum system soon.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.