[35.5.0] RAS Uncorrectable Error in IOB at Xavier NX Devkit with Orin Nano SOM, cannot boot to Ubuntu, after several reboots

Hi, we have Orin Nano 8GB SOM at Xavier NX Devkit with 35.5.0, and we found that after several reboots, it cannot boot to Ubuntu.
The DUT is fully dead. The DUT cannot be workable until I re-flash the image to this DUT.

The build command we used is:
sudo ./tools/kernel_flash/l4t_initrd_flash.sh --massflash 10 --external-device nvme0n1p1 -c tools/kernel_flash/flash_l4t_external.xml -p “-c bootloader/t186ref/cfg/flash_t234_qspi.xml” --showlogs --network usb0 p3509-a02+p3767-0000 internal

The reproduce steps we used:

  1. sudo reboot
  2. less than 300. Once it occurs, the DUT is dead and cannot get back to normal anymore.

The console log shows:
I/TC: Asynchronous notifications are disabled
E/TC:?? 00
E/TC:?? 00 User mode data-abort at address 0x40 (translation fault)
E/TC:?? 00 esr 0x92000005 ttbr0 0x200027c1ba000 ttbr1 0x00000000 cidr 0x0
E/TC:?? 00 cpu #0 cpsr 0x60000000
E/TC:?? 00 x0 00000000405e0000 x1 0000000000000000
E/TC:?? 00 x2 0000000000000020 x3 00000000405f5c58
E/TC:?? 00 x4 00000000405f5c57 x5 00000000405d9410
E/TC:?? 00 x6 00000000000000fc x7 00000000000000fc
E/TC:?? 00 x8 00000000000000fc x9 0000000000000000
E/TC:?? 00 x10 0000a00000001000 x11 0000000000000040
E/TC:?? 00 x12 0000000000000000 x13 4200004000000000
E/TC:?? 00 x14 0000000000000000 x15 0000000000000000
E/TC:?? 00 x16 00000000400491f8 x17 00000000000000f8
E/TC:?? 00 x18 0000000000000000 x19 00000000405ef160
E/TC:?? 00 x20 00000000405f5c58 x21 00000000405e0000
E/TC:?? 00 x22 0000000000000020 x23 000000000000003f
E/TC:?? 00 x24 00000000405f5c57 x25 0000000000000000
E/TC:?? 00 x26 0000000000001000 x27 0000000000040000
E/TC:?? 00 x28 0000000040508000 x29 00000000405f5bb0
E/TC:?? 00 x30 0000000040500370 elr 00000000405d9494
E/TC:?? 00 sp_el0 00000000405f5bb0
E/TC:?? 00 region 0: va 0x0000000040000000 pa 0x000000027c042000 size 0x002000 flags —R-X
E/TC:?? 00 region 1: va 0x0000000040002000 pa 0x000000027c190000 size 0x001000 flags —RW-
E/TC:?? 00 region 2: va 0x0000000040004000 pa 0x000000027c240000 size 0x03f000 flags r-xR–
E/TC:?? 00 region 3: va 0x0000000040043000 pa 0x000000027c27f000 size 0x001000 flags rw----
E/TC:?? 00 region 4: va 0x0000000040044000 pa 0x000000027c280000 size 0x00b000 flags r-x—
E/TC:?? 00 region 5: va 0x000000004004f000 pa 0x000000027c28b000 size 0x001000 flags rw----
E/TC:?? 00 region 6: va 0x0000000040050000 pa 0x000000027c28c000 size 0x001000 flags r-x—
E/TC:?? 00 region 7: va 0x0000000040051000 pa 0x000000027c28d000 size 0x2b3000 flags r-xR–
E/TC:?? 00 region 8: va 0x0000000040304000 pa 0x000000027c540000 size 0x1fc000 flags rw-RW-
E/TC:?? 00 region 9: va 0x0000000040500000 pa 0x000000027c73c000 size 0x008000 flags r-x—
E/TC:?? 00 region 10: va 0x0000000040508000 pa 0x000000027c744000 size 0x001000 flags rw-RW-
E/TC:?? 00 region 11: va 0x0000000040509000 pa 0x000000027c745000 size 0x001000 flags r-x—
E/TC:?? 00 region 12: va 0x000000004050a000 pa 0x000000027c746000 size 0x002000 flags rw-RW-
E/TC:?? 00 region 13: va 0x000000004050c000 pa 0x000000027c748000 size 0x005000 flags r-x—
E/TC:?? 00 region 14: va 0x0000000040511000 pa 0x000000027c74d000 size 0x001000 flags rw-RW-
E/TC:?? 00 region 15: va 0x0000000040512000 pa 0x000000027c74e000 size 0x001000 flags r-x—
E/TC:?? 00 region 16: va 0x0000000040513000 pa 0x000000027c74f000 size 0x0c3000 flags rw-RW-
E/TC:?? 00 region 17: va 0x00000000405d6000 pa 0x000000027c812000 size 0x00a000 flags r-x—
E/TC:?? 00 region 18: va 0x00000000405e0000 pa 0x000000027c81c000 size 0x001000 flags rw-RW-
E/TC:?? 00 region 19: va 0x00000000405e1000 pa 0x000000027c81d000 size 0x001000 flags r-x—
E/TC:?? 00 region 20: va 0x00000000405e2000 pa 0x000000027c81e000 size 0x002000 flags rw-RW-
E/TC:?? 00 region 21: va 0x00000000405e4000 pa 0x000000027c820000 size 0x006000 flags r-x—
E/TC:?? 00 region 22: va 0x00000000405ea000 pa 0x000000027c826000 size 0x001000 flags rw-RW-
E/TC:?? 00 region 23: va 0x00000000405eb000 pa 0x000000027c827000 size 0x001000 flags r-x—
E/TC:?? 00 region 24: va 0x00000000405ec000 pa 0x000000027c828000 size 0x01f000 flags rw-RW-
E/TC:?? 00 region 25: va 0x000000004060b000 pa 0x000000027c847000 size 0x015000 flags rw-RW-
E/TC:?? 00 region 26: va 0x0000000040620000 pa 0x000000000c198000 size 0x001000 flags rw----
E/TC:?? 00 region 27: va 0x0000000040621000 pa 0x0000000003270000 size 0x010000 flags rw----
E/TC:?? 00 region 28: va 0x0000000040631000 pa 0x000000000c390000 size 0x002000 flags rw----
▒▒

▒▒ERROR: Exception reason=0 syndrome=0xbe000011
ERROR: **************************************
ERROR: RAS Uncorrectable Error in IOB, base=0xe010000:
ERROR: Status = 0xec000612
ERROR: SERR = Error response from slave: 0x12
ERROR: IERR = CBB Interface Error: 0x6
ERROR: Overflow (there may be more errors) - Uncorrectable
ERROR: MISC0 = 0xc45e0040
ERROR: MISC1 = 0x18cc860000000000
ERROR: MISC2 = 0x0
ERROR: MISC3 = 0x0
ERROR: ADDR = 0x8000000003270000
ERROR: **************************************
ERROR: sdei_dispatch_event returned -1
Unhandled Exception in EL3.
x30 = 0x0000000050011384
x0 = 0x0000000000000000
x1 = 0x0000000000000000
x2 = 0x000000005000d1b8

build.log (302.7 KB)
consolelog.txt (92.2 KB)

Thanks a lot.

Hi,

Your console log seems not having the full log during this crash. Are you sure you attach the right log?

Hi @WayneWWW ,
my bad. The log should be:
consolelog.txt (42.9 KB)

We will try to reproduce this issue first. Thanks for report.

One more question. What is the log after this crash happened? I mean the log in next boot.

Hi @WayneWWW ,
問題發生後,此顆SOM都無法開進ubuntu desktop。於next boot,都出現相同logs。
如下log檔案: 上電,crash發生,過一陣子再斷電,再上電…。

consolelog_v2.txt (213.1 KB)

1 Like

Issue will be fixed at the next SW release. Thanks

Hi kayccc. It seems that we met the same problem here with r35.5.0.

The problem occurs after one night boot test.

  ERROR:   Exception reason=0 syndrome=0xbe000011
ERROR:   **************************************
ERROR:   RAS Uncorrectable Error in IOB, base=0xe010000:
ERROR:          Status = 0xec000612
ERROR:   SERR = Error response from slave: 0x12
ERROR:          IERR = CBB Interface Error: 0x6
ERROR:          Overflow (there may be more errors) - Uncorrectable
ERROR:          MISC0 = 0xc44e0040
ERROR:          MISC1 = 0x14c860000000000
ERROR:          MISC2 = 0x0
ERROR:          MISC3 = 0x0
ERROR:          ADDR = 0x8000000003270000
ERROR:   **************************************

Here is the full log:
orin_nx_cannot_boot_RAS_Uncorrectable_Error_in_IOB.log (40.4 KB)

Would you mind to provide patches for r35.5.0?

??? You already got the answer in your own post. Why asking it again?

Sorry for bothering. I didn’t realize they were the same.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.