Flashing issue with Xavier AGX modules

Hi all,

A customer has reported an issue while programming their Xavier AGX modules (64 GB ones) and seeing a number of failures during the flashing process in their production line. These failures seem to be associated with modules using the Hynix DRAM (699 level part number is -402). We are flashing the parts with a Yocto image that is based on Jetpack 4.6.2, so we have support for this part revision.

Flashing fails with Invalid value MemBct dram size: 0MB for slot: 0.

[   5.4822 ] Sending BCTs
[   5.4840 ] tegrarcm_v2 --download bct_bootrom br_bct_BR.bct --download bct_mb1 mb1_bct_MB1_sigheader.bct.encrypt --download bct_mem mem_rcm_sigheader.bct.encrypt
[   5.4845 ] Applet version 01.00.0000
[   5.5742 ] Sending bct_bootrom
[   5.5744 ] [................................................] 100%
[   5.5756 ] Sending bct_mb1
[   5.5804 ] [................................................] 100%
[   5.5840 ] Sending bct_mem
[   5.6303 ] [................................................] 100%
[   5.6363 ] 0000000000000102: E> NONE: Invalid value MemBct dram size: 0MB for slot: 0.
[   5.6527 ] 
[   5.6527 ] 
Error: Return value 2
Command tegrarcm_v2 --download bct_bootrom br_bct_BR.bct --download bct_mb1 mb1_bct_MB1_sigheader.bct.encrypt --download bct_mem mem_rcm_sigheader.bct.encrypt

We’ve noticed, that when flashing fails, the board information reported by the flashing script seems to report RAM code: 0x4

Retrieving board information
[   3.7330 ] tegrarcm_v2 --oem platformdetails chip chip_info.bin
[   3.7335 ] MB2 Applet version 01.00.0000
[   3.8241 ] Saved platform info in chip_info.bin
[   3.8286 ] Chip minor revision: 2
[   3.8286 ] Bootrom revision: 0xf
[   3.8286 ] Ram code: 0x4
[   3.8286 ] Chip sku: 0xd0
[   3.8286 ] Chip Sample: non es
  • What is the significance of Ram code 0x4? Could that cause flashing failures, or is it unrelated?
  • Any suggestions in order to debug this issue could be really appreciated.

Below you can find attached the full log of the flashing script :

fw_update_log.txt (24.5 KB)

1 Like

Hi jchaves,

Please try to use SDK Manager to flash on your Xavier-64GB device.
Thanks!

Hi, carolyuu.

We already tried that but unfortunately we are encountering a different issue (out of the scope of this thread).
We were wondering if you could provide any insight regarding the provided logs. For example what does the Ram code 0x4 mean?

  1. Is this only happened to one module or a whole batch of modules?

  2. Are you able to flash those modules on devkit?

Ramcode is related but my questions should be firstly clarified.

Please be aware that these two questions are important to this issue.

Hi, WayneWWW.

Regarding your questions:

  1. Based on the information from the manufacturing line, they report all modules are failing intermittently.
  2. Unfortunately at this point we are not able to test the SOMs with a devkit and the SDKManager because the failing hardware is at a remote location in the manufacturing side. We’ve tried to reproduce the issue in other sites with those same modules that have been reported to fail but we’ve been unable to reproduce the issue. That’s why we are asking about the ram code 0x04 meaning, that could give us some clues on how to reproduce the issue perhaps. Interestingly, in our end the ram code that we get in the successful flashing logs is 0x05.

Ramcode is to tell what DRAM is in use for this module and ramcode is decided by hardware.
Because of PCN update, not every AGX Xavier module has same component. For example, some AGX may use DRAM from Micron while others may use DRAM from Hynix.
The device will use ramcode to tell which DRAM should be in use.

However, custom carrier board may affect the selection with DRAM…
That is why I asked to test this on devkit. If devkit can flash them, then it might be custom board problem here.
If devkit could not, please share us the S/N of some NG modules and good modules. We need to check that with factory.

Wayne, is there any documentation that we can refer to figure out how the carrier board would affect the selection of the DRAM type?

Please refer to the hardware design guide document, especially the UART section. The hardware design related to the UART part may affect the RAM code selection.

Hi, Wayne.

Just to share some of the latest findings on this.

In our build we use a specific SDRAM .cfg file (part of the tegra boot files). Interestingly enough in that file it seems to exist different entries that refer to a particular memory type. When we tried to replicate that issue at our end we were having in the logs that the RAM code selection was 5. That specific file that we used does not seem to have definitions of the RAM code 4 case. So the fundamental issue has been deliberately replicated by removing the RAM code 5 entry from that file, which is not exactly the reported case with RAM code 4 but it allows to see the same problem at our end.

At this point we are not completely sure why RAM code 4 is being selected with the hardware at the manufacturing line but we are working on a possible workaround. For now the issue in essence points to some hardware problem in which the RAM codes selected should never be RAM code 4 for these modules.

I am not sure if you have any input regarding these observations. Could it be that we need to extend our .cfg file instead? Do you have information on how to construct that file?

Hi @jchaves,

Sorry in advance if I didn’t understand your comment correctly.

Just to avoid this is an issue created by yourself. Could you use the default BCT file?
I am not sure why you want to change the sdram file to another one.
Dram is not something you can customize so I am not sure what you want to do that.

If default BCT file from jetpack would make it work, then just use the default BCT file.

If you really want to know the exact behavior, then please just get a devkit and flash sdkmanager with devkit. Check what ram code and cfg file are in use.

Hi Wayne,

Apologies for the confusion. I confirmed with the rest of the team and actually we were using the default BCT file but now we required to modify it as a stop-gap so it allows to continue flashing the modules that present the issues when the hardware incorrectly permits the selection of RAM code 4 (these changes still need to be tested at the moment).

For now we would like to ask for additional information. Is there a way to detect which RAM code was detected after boot up? Possibly that could allow to collect some sample data on how often the wrong RAM code is showing up (the issue appears to be intermittent) and in which modules specifically the issue is happening.

Hi,

The only RAM code that will get printed is in early boot log (MB1). Also, wrong hardware may also affect this.
Thus, I really doubt if using your custom board to check this info is a correct behavior.

You can also check our PCN announcement in the download center to compare the module S/N.

For example, if the module S/N shows they are from different PCN, then it shall have different RAM code.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.