I’m using the AGX Developer Kit and installed a data acquisition PCIe card. When I ran the card’s config program, the ethernet card started acting strange and I saw some I2C errors (tegra-i2c 3160000.i2c no acknowledgement) for address 0x50 through 0x57. I believe the config program inadvertently wrote to the AGX EEPROM. I tried to re-flash the AGX using the SDK manager, but got an error “Parsing board ID (calculated 0xe9 != stored 0xb7).
Looking at the “Jetson Module EEPROM Layout” page online I saw several suspicious things. Firstly, the page says that the EEPROM is on I2C bus 2, but when I ran i2cdump -y -f 2 0x50 I saw XX values for every address and a flood of “tegra-ic2 3180000.i2c no acknowledgement” errors. When I tried it on bus 0 using i2cdump -y -f 0 0x50 I saw what looked to be the EEPROM values:
I used i2cset to fix both of these, and verified that the new CRC is, in fact, 0xe9 as was already set in address 0xFF. Here is the updated output from i2cdump -y -f 0 0x50:
I don’t have the board so is not able to give you the correct i2c bus mapping on the board.
But you can try to dump every 0x50 on each i2c bus and check the value on the right. Check whether there is some keyword as 2888. If there is such value, then it is the module eeprom.
If it is something like 2822, then it is carrier board eeprom, which is not in use during flash.
I ran i2cdump -y -f [bus] 0x50 for busses 0 - 8 and here are the results:
0: Looks like the EEPROM. Bytes 20-49 show a part number of “699-82888-0004-400 L.0”
1: All values are XX
2: All values are XX
3: All values are XX (but it runs very slowly)
4: All values are XX
5: All values are XX (but it runs very slowly)
6: All values are XX (but it runs very slowly)
7: All values are XX
8: All values are XX
Yes, but give me a little time. The Carrier Board EEPROM values are a mess. I’m going to try to straighten them out and re-flash. If that fails I’ll provide the error logs. Thanks again.
I believe this is because I’m flashing to the NVMe on the carrier board? I’ve updated the Carrier Board’s EEPROM values and the system is currently flashing. It used to fail at 2% with the “parsing” errors, but now it’s at 38%. Fingers crossed!
Flashing finished! The ethernet card still shows some errors in dmesg, but at least I have a working base system again and can troubleshoot from there.
No matter what is being flashed, carrier board eeprom should not be read.
If this is reproducible on devkit, please let me know how you reproduce this issue.
The DAQ PCIe card manufacturer confirmed that they had some offending I2C code in their configuration program, so that explains the errant EEPROM values on both the Module and Carrier Board. I’m not sure if there’s a way to find out if anything else was corrupted. We have 3 AGX Dev Kits and, unfortunately, I ran the config program on all of them, so I no longer have a clean baseline. :/
Before I fixed any of the values in either EEPROM, the flashing would fail at 2% complaining of errors “Parsing board ID” or “version” or “SKU”. In each case the error message contained a calculated CRC-8 value that differed from the stored value. Sometimes the stored value matched what was in the Module’s EEPROM CRC value (address 0xFF) and other times it would match the Carrier Board’s EEPROM CRC value.
First, I corrected the values in the Module’s EEPROM, thinking that the Carrier Board is not a part of the flashing process. After that, the flashing failed again, but only had errors with CRC-8 values that didn’t match the Carrier Board’s value. So it seemed that the Module was good. I then fixed the Carrier Board’s values and then flashed succeeded.
Again, since JP 4.6 I’ve been flashing directly to the NVMe mounted on the Dev Kit Carrier Board, so perhaps this involves reading the Carrier Board’s EEPROM? Just a guess.
Oh, I think you might have understood me. That was a sincere “thank you” for helping me get my AGX back online! Once I try the flash on my other corrupted AGXs I’ll provide the flash logs. Sorry for any confusion.