Problem with several Jetson Tx2i

Dear NVIDIA,

We have a problem with several Jetson Tx2i.
We use the Jetsons on our custom carrier board. This Board work properly with other Jetsons and worked with faulty ones for quite some time until the recently turned bad.
We were conducting temperature cycling on several Jetsons on our board and once every few weeks we took it out of the oven to see that the Jetsons work well. In previous times (when we took them out of the oven) it worked well and in our recent try (after a total of about 400 cycles) 2 of the Jetsons have turned bad.
The cycles were from -40°C to 125°C and the boards (and the Jetsons consequently) were unbiased.
When a faulty Jetson wakes up and it sends the following logs in UART:
“MTS error(2) : dram alias check failure
Cpu waypoint 0.5 failed
ERROR: High Layer Module = 0x32, Lowest Layer Module = 0x32, Aux info = 0x1, Reason = 0x6”
We have seen this topic came up in the forums a few times but the cases were different and there was no clear answer.
Also when we tried to burn the tegra flash it failed when it tried to “send blob”. It returned with error – “Return value 1”.
So, our questions are:

  1. What is the storage temperature of the Jetson?
  2. Could the problem be related with EMMC? Perhaps could we get some technical manual of the emmc component?
  3. Do you have any idea what the problem could be and if so, how can we fix it?
  4. Do you need any additional information?
    Best Regards,

Hi,

The operation temperature of the TX2 module is -40 to 85C. Thus, I suspect your 125C test already burn down the components on module…

image

Please check the document first before you try such test next time.

Also, please try to flash the faulty module and see if they can still get flashed. If they cannot, maybe you can try to RMA the module.

Hi Wayne,

Firstly, thank you for the quick reply.
I am aware of the operating temperature of the device but the device was not operating. It was not biased. I assumed storage was until 125°C because it usually is for electrical components and I could not find any info online on the subject.
About the flashing, as I mentioned in the post, we tried to flash/burn the device and it failed in the “send blob” part. Do you have any idea what can cause that?

Hi,

Just want to make sure

  1. This board “was” able to get flashed with same host setup and software (jetpack) as now.

  2. Other modules are still able to get flashed with same host as (1) and only those modules are not able to do that.

Are they correct to you?

Does that MTS error from uart happened when you flash the board? or during boot up time?

The MST error happens when you boot the device.

Could you reply all my questions here?

Hi,

Sorry I missed the questions above.

  1. Yes
  2. Yes
    The MST error happens when you boot the device.
    The error during flashing happens when it tries to send “blob”.

There should be also some logs from uart during flash. I mean when you see “send blob failed” on your host side, there should be also some logs from UART. Please check what log was that.

And if no log is printed, then I can only suggest you to RMA.

We don’t and cannot analyze which hardware components on the module is dead.

I will try to write most of it because the logs are on a different network and it will be a lot of typing.
The logs says that it:
welcome to tegra flash
generating RCM messages
saves RCM
signings RCM messages
assuming zero filled SBK key
copying signature to rcm messages
boot rom communication
br_CID:
rcm version 0x18001
boot rom communication completed
applet version 0.100.0000
retrieving eeprom data
applet version 0.100.0000
saved platform info

welcome to tegra flash

generating RCM messages
saves RCM
signings RCM messages
assuming zero filled SBK key
copying signature to rcm messages
parsing partition layout
creating list of images to be signed
generating signatures
generating br-bct
updating dev and mss params in BR BCT
updating bl info
updating smd info
updating odmdata
get signed section of bct
assuming zero filled SBK key
updating BCT with signature
generating coldboot mb1-bct
mb1-bct version: oxf
copying sdram info from 1 to 2 set
copying sdram info from 2 to 3 set
packing sdram param for instance[0]
packing sdram param for instance[1]
packing sdram param for instance[2]
packing sdram param for instance[3]
parsing config file
parsing config file :mobile_scr.cfg
parsing config file :tegra186-mb1-bct-pad-quill-p3489-1000-a00.cfg
parsing config file :tegra186-mb1-bct-pmic-quill-p3489-1000-a00.cfg
parsing config file :tegra186-mb1-bct-bootrom-quill-p3489-1000-a00.cfg
parsing config file :tegra186-mb1-bct-prod-storm-p3489-1000-a00.cfg
updating mb1-bct with firmware information
MB1-bct version: 0xf
updating mb1-bct with storage information
.
.
.
generating recovery mb1-bct
mb1-bct version: oxf
copying sdram info from 1 to 2 set
copying sdram info from 2 to 3 set
packing sdram param for instance[0]
packing sdram param for instance[1]
packing sdram param for instance[2]
packing sdram param for instance[3]
parsing config file
parsing config file :mobile_scr.cfg
parsing config file :tegra186-mb1-bct-pad-quill-p3489-1000-a00.cfg
parsing config file :tegra186-mb1-bct-pmic-quill-p3489-1000-a00.cfg
parsing config file :tegra186-mb1-bct-bootrom-quill-p3489-1000-a00.cfg
parsing config file :tegra186-mb1-bct-prod-storm-p3489-1000-a00.cfg
updating mb1-bct with firmware information
MB1-bct version: 0xf
updating mb1-bct with storage information
copying signatures
boot rom communication
applet ersion 01.00.0000
sending BCTs
applet ersion 01.00.0000
sending bct_bootrom 100%
sending bct_mb1 100%
generating blob
tegrahost_v2 --chip 0x18 --align blob_nvtboot_recovery_cpu.bin
tegrahost_v2 --appendsigheader blob_nvtboot_recovery_cpu.bin zerobsk
tegrasign_v2 --key none --list
blob_nvtboot_recovery_cpu_sigheader.bin_list.xml --pubkeyhash pub_key.key
assuming zero filled sbk key
tegrahost_v2 --updatesigheader
blob_nvtboot_recovery_cpu_sigheader.bin.encrypt
blob_nvtboot_recovery_cpu_sigheader.bin.hash zerobsk
tegrahost_v2 --chip 0x18 --align blob_preboot_d15_prod_cpu.bin
tegrahost_v2 --appendsigheader blob_preboot_d15_prod_cpu.bin zerobsk
tegrasign_v2 --key none --list
blob_preboot_d15_prod_cpu_sigheader.bin_list.xml --pubkeyhash pub_key.key
assuming zero filled sbk key
tegrahost_v2 --updatesigheader
blob_preboot_d15_prod_cpu_sigheader.bin.encrypt
blob_preboot_d15_prod_cpu_sigheader.bin.hash zerobsk
tegrahost_v2 --chip 0x18 --align blob_mce_mts_d15_prod_cpu.bin
tegrahost_v2 --appendsigheader blob_mce_mts_d15_prod_cpu.bin zerobsk
tegrasign_v2 --key none --list
blob_mce_mts_d15_prod_cpu_sigheader.bin_list.xml --pubkeyhash pub_key.key
assuming zero filled sbk key
tegrahost_v2 --updatesigheader
blob_mce_mts_d15_prod_cpu_sigheader.bin.encrypt
blob_mce_mts_d15_prod_cpu_sigheader.bin.hash zerobsk
tegrahost_v2 --chip 0x18 --align blob_bpmp.bin
tegrahost_v2 --appendsigheader blob_bpmp.bin zerobsk
tegrasign_v2 --key none --list
blob_bpmp_sigheader.bin_list.xml --pubkeyhash pub_key.key
assuming zero filled sbk key
tegrahost_v2 --updatesigheader
blob_bpmp_sigheader.bin.encrypt
blob_bpmp_sigheader.bin.hash zerobsk
tegrahost_v2 --chip 0x18 --align blob_tegra186-a02-bpmp-storm-p3489-a00-00-ta795sa-ucm1.dtb
tegrahost_v2 --appendsigheader blob_tegra186-a02-bpmp-storm-p3489-a00-00-ta795sa-ucm1.dtb zerobsk
tegrasign_v2 --key none --list
blob_tegra186-a02-bpmp-storm-p3489-a00-00-ta795sa-ucm1_sigheader.dtb_list.xml --pubkeyhash pub_key.key
assuming zero filled sbk key
tegrahost_v2 --updatesigheader
blob_tegra186-a02-bpmp-storm-p3489-a00-00-ta795sa-ucm1_sigheader.dtb.encrypt
blob_tegra186-a02-bpmp-storm-p3489-a00-00-ta795sa-ucm1_sigheader.dtb.hash zerobsk
it continues in this pattern with different blobs and then
tegrahost_v2 --chip 0x18 --generateblob blob.xml blob.bin
number of images in blob are 9
blobsize is 3800168
added binary blob_nvtboot_recovery_signheader.bin.encrypt of size 221344
it continues adding binary of different blobs and then
sending bootloader and pre-requisite binaries
tegrarcm_v2 --download blob blob.bin
applet version 01.00.0000
sending blob
Error: return value 1
command tegrarcm_v2 --download blob blob.bin

Hi,

So which part is from UART?
You just posted the host side log again, didn’t you?

I don’t understand, all those lines I typed now were from the uart. What other uart logs are there?

Actually, since you type those logs so some format are gone… I took some time to confirm where do you get this log…

Is this user’s log similar to your uart log ?

I just want to confirm how is your uart log be like.

And if that user’s log is similar to your “uart log”, then I have to tell you that your log is not from UART… that log is just some print from flash.sh.

When you dumped those MTS log during boot, you were using some console tool like minicom, right?
You have to use that tool again when you run flash.sh.

Flash process requires 2 side. Host side send the binary to device side. Both side would dump log.

Currently, you only share the host side log. It is obvious to observe that because device side no need to package a bootloader binary and send it to host…

The second log looks like the log I had but mine failed after sending blob where his “[ 24.2759 ] Sending blob” succeeded.

So what you need is the device side in the UART log? I will try to provide it as soon as possible.

Yes, only the device side log requires the uart to help transfer. That is why we call it uart log.

And I just have to remind you. Even if you share the log with us, the final result may still be RMA the device.
Thus, it depends on you to decide if you still need to spend your time dumping the log from uart.

Hi, We don’t have any uart logs from the device side. The device doesn’t seem to send anything.