We encounter some boot failure problems for Jetson AGX Xavier on our customized carried board.
Repetition steps:
1.Power off and let stand for a while
2.Power on and observe the output of the debug port,If it starts normally, power on and off the device again
3.Repeat it four or five times and That would repeat the problem。
debug port log:
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 4.9.253+ (liyanhou@uisee-System-Product-Name) (gcc version 7.5.0 (Linaro GCC 7.5-2019.12) ) #1 SMP PREEMPT Tue Apr 26 11:17:20 CST 2022
[ 0.000000] Boot CPU: AArch64 Processor [4e0f0040]
[ 0.000000] OF: fdt:memory scan node memory, reg size 48,
[ 0.000000] OF: fdt: - 80000000 , 2c000000
[ 0.000000] OF: fdt: - ac200000 , 44800000
[ 0.000000] OF: fdt: - 100000000 , 780000000
[ 0.000000] earlycon: tegra_comb_uart0 at MMIO32 0x000000000c168000 (options '')
[ 0.000000] bootconsole [tegra_comb_uart0] enabled
[ 1.640722] ucsi_ccg 1-0008: read version failed
[ 1.640846] ucsi_ccg 1-0008: get_fw_info fail,, err=-121
[ 2.191234] rt5659 7-001a: Device with ID register ffffff80 is not rt5659
[ 7.882195] bpmp_wait_ack() returned -110 (ch 22 mrq 3 data <0x25 0x01 0x00 0x06>)
[ 7.964978] CPU1: SError detected, daif=1c0, spsr=0x60c000c5, mpidr=80000001, esr=be000000
[ 7.964986] CPU7: SError detected, daif=1c0, spsr=0x60c000c5, mpidr=80000301, esr=be000000
[ 7.964990] CPU6: SError detected, daif=1c0, spsr=0x608000c5, mpidr=80000300, esr=be000000
[ 7.964999] CPU4: SError detected, daif=1c0, spsr=0x608000c5, mpidr=80000200, esr=be000000
[ 7.965003] CPU5: SError detected, daif=1c0, spsr=0x608000c5, mpidr=80000201, esr=be000000
[ 7.965012] CPU3: SError detected, daif=1c0, spsr=0x608000c5, mpidr=80000101, esr=be000000
[ 7.965017] CPU2: SError detected, daif=1c0, spsr=0x608000c5, mpidr=80000100, esr=be000000
[ 8.129548] CPU:0, Error:BPMP-NOC@0xd600000,irq=483
[ 8.129550] **************************************
[ 8.129552] * For more Internal Decode Help
[ 8.129553] * http://nv/cbberr
[ 8.129554] * NVIDIA userID is required to access
[ 8.129555] **************************************
[ 8.129557] CPU:0, Error:BPMP-NOC
[ 8.129559] Error Logger : 1
[ 8.129568] ErrLog0 : 0x80030600
[ 8.129570] Transaction Type : RD - Read, Incrementing
[ 8.129572] Error Code : TMO
[ 8.129573] Error Source : Target NIU
[ 8.129575] Error Description : Target time-out error
[ 8.129577] Packet header Lock : 0
[ 8.129579] Packet header Len1 : 3
[ 8.129581] NOC protocol version : version >= 2.7
[ 8.129582] ErrLog1 : 0xbba00
[ 8.129584] ErrLog2 : 0x0
[ 8.129586] RouteId : 0xbba00
[ 8.129588] InitFlow : cpu_p_i/I/0
[ 8.129589] Targflow : cbb_t/T/0
[ 8.129591] TargSubRange : 13
[ 8.129592] SeqId : 0
[ 8.129594] ErrLog3 : 0x700020c
[ 8.129596] ErrLog4 : 0x0
[ 8.129623] Address : 0x1700020c (unknown device)
[ 8.129625] ErrLog5 : 0xcfa30
[ 8.129627] Master ID : BPMP
[ 8.129629] Security Group(GRPSEC): 0x7d
[ 8.129631] Cache : 0x0 -- Non-cacheable/Non-Bufferable)
[ 8.129634] Protection : 0x3 -- Privileged, Non-Secure, Data Access
[ 8.129635] FALCONSEC : 0x0
[ 8.129637] Virtual Queuing Channel(VQC): 0x0
[ 8.129640] **************************************
[ 8.129716] CPU0: SError detected, daif=1c0, spsr=0x40400045, mpidr=80000000, esr=be000000
[ 12.211802] **************************************
[ 12.211921] * For more Internal Decode Help
[ 12.211993] * http://nv/cbberr
[ 12.212052] * NVIDIA userID is required to access
[ 12.212131] **************************************
[ 12.212214] CPU:7, Error:BPMP-NOC
[ 12.212274] Error Logger : 1
[ 12.212334] ErrLog0 : 0x80030608
[ 12.212400] Transaction Type : WR - Write, Incrementing
[ 12.212496] Error Code : TMO
[ 12.212554] Error Source : Target NIU
[ 12.212624] Error Description : Target time-out error
[ 12.212718] Packet header Lock : 0
[ 12.212784] Packet header Len1 : 3
[ 12.212847] NOC protocol version : version >= 2.7
[ 12.212930] ErrLog1 : 0xbaa01
[ 12.212988] ErrLog2 : 0x0
[ 12.213066] RouteId : 0xbaa01
[ 12.213363] InitFlow : cpu_p_i/I/0
[ 12.213646] Targflow : cbb_t/T/0
[ 12.213943] TargSubRange : 5
[ 12.214186] SeqId : 0
[ 12.214425] ErrLog3 : 0x190000
[ 12.214682] ErrLog4 : 0x0
[ 12.214921] Address : 0xc190000 (unknown device)
[ 12.217904] ErrLog5 : 0xcfa30
[ 12.221144] Master ID : BPMP
[ 12.224378] Security Group(GRPSEC): 0x7d
[ 12.228404] Cache : 0x0 -- Non-cacheable/Non-Bufferable)
[ 12.234007] Protection : 0x3 -- Privileged, Non-Secure, Data Access
[ 12.240655] FALCONSEC : 0x0
[ 12.243804] Virtual Queuing Channel(VQC): 0x0
[ 12.248274] **************************************
[ 12.253108] **************************************
[ 12.258329] RAS Error in SCF:IOB, ERRSELR_EL1=1025:
[ 12.263054] Status = 0xfc009604
[ 12.266380] IERR = CBB Interface Error: 0x96
[ 12.270410] SERR = Assertion Failure: 0x4
[ 12.274690] Overflow (there may be more errors) - Uncorrectable
[ 12.280383] Uncorrectable (this is fatal)
[ 12.284847] MISC0 = 0x40
[ 12.287033] MISC1 = 0x264a4445e1
[ 12.290532] ADDR = 0x800000001404000c
[ 12.294469] **************************************
[ 12.299210] **************************************
[ 12.304442] RAS Error in L2, ERRSELR_EL1=512:
[ 12.308729] Status = 0xfc006612
[ 12.312142] IERR = SCF to L2 Slave Error Read: 0x66
[ 12.317041] SERR = Error response from slave: 0x12
[ 12.321682] Overflow (there may be more errors) - Uncorrectable
[ 12.327804] Uncorrectable (this is fatal)
[ 12.331662] MISC0 = 0x80000000400000
[ 12.335418] MISC1 = 0x20240000000
[ 12.338743] ADDR = 0x800000001404000c
[ 12.342336] **************************************
[ 12.347354] **************************************
[ 12.352480] RAS Error in L2, ERRSELR_EL1=560:
[ 12.356594] Status = 0xfc006612
[ 12.360267] IERR = SCF to L2 Slave Error Read: 0x66
[ 12.365079] SERR = Error response from slave: 0x12
[ 12.369892] Overflow (there may be more errors) - Uncorrectable
[ 12.375408] Uncorrectable (this is fatal)
[ 12.379694] MISC0 = 0x100000000400000
[ 12.383628] MISC1 = 0x40240000000
[ 12.386785] ADDR = 0x800000001404000c
[ 12.390893] **************************************
[ 12.395715] Bad mode in Error handler detected on CPU7, code 0xbe000000 -- SError
[ 12.403405] Kernel panic - not syncing: bad mode
[ 12.407874] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W 4.9.253+ #1
[ 12.415127] Hardware name: Jetson-AGX (DT)
[ 12.418896] Call trace:
[ 12.421784] [<ffffff800808ba40>] dump_backtrace+0x0/0x198
[ 12.426774] [<ffffff800808c004>] show_stack+0x24/0x30
[ 12.432023] [<ffffff8008f660ac>] dump_stack+0xa0/0xc4
[ 12.437444] [<ffffff8008f63150>] panic+0x12c/0x2a8
[ 12.442431] [<ffffff800808c894>] bad_mode+0x7c/0x80
[ 12.447331] [<ffffff800808ca5c>] handle_serr+0x124/0x128
[ 12.452669] [<ffffff8008082d98>] el1_serr+0xb0/0x144
[ 12.457482] [<ffffff80081114b4>] cpu_startup_entry+0xfc/0x150
[ 12.462825] [<ffffff8008091cf8>] secondary_start_kernel+0x190/0x1f8
[ 12.469035] [<0000000080f731a8>] 0x80f731a8
[ 12.473247] SMP: stopping secondary CPUs
[ 13.607853] SMP: failed to stop secondary CPUs 0-7
[ 13.607965] Kernel Offset: disabled
[ 13.608030] Memory Limit: none
[ 13.608087] trusty-log panic notifier - trusty version Built: 12:20:34 Jul 26 2021 [ 13.615012] Rebooting in 5 seconds..
[ 16.642541] ras_ccplex_serr_callback: Scanning CCPLEX Error Records for Uncorrectable Errors
[ 16.642726] **************************************
[ 16.642811] RAS Error in SCF:IOB, ERRSELR_EL1=1025:
[ 16.642896] Status = 0xfc009604
[ 16.642954] IERR = CBB Interface Error: 0x96
[ 16.643030] SERR = Assertion Failure: 0x4
[ 16.643100] Overflow (there may be more errors) - Uncorrectable
[ 16.643201] Uncorrectable (this is fatal)
[ 16.643276] MISC0 = 0x40
[ 16.643323] MISC1 = 0x264e444421
[ 16.643388] ADDR = 0x800000001404000c
[ 16.643458] **************************************
[ 16.643543] ras_corecluster_serr_callback:Scanning CoreCluster Error Records for Uncorrectable Errors
[ 16.643733] **************************************
[ 16.643846] RAS Error in L2, ERRSELR_EL1=560:
[ 16.644197] Status = 0xfc006612
[ 16.644456] IERR = SCF to L2 Slave Error Read: 0x66
[ 16.644833] SERR = Error response from slave: 0x12
[ 16.645215] Overflow (there may be more errors) - Uncorrectable
[ 16.645675] Uncorrectable (this is fatal)
[ 16.648111] MISC0 = 0x80000000400000
[ 16.651783] MISC1 = 0x20240000000
[ 16.655198] ADDR = 0x800000001404000c
[ 16.658790] **************************************
[ 16.663694] ras_core_serr_callback: Scanning Core Error Records for Uncorrectable Errors
[ 16.671734] Bad mode in Error handler detected on CPU6, code 0xbe000000 -- SError
[ 18.615813] SMP: stopping secondary CPUs
[ 19.749482] SMP: failed to stop secondary CPUs 0-7
����[ 20.918568] ras_ccplex_serr_callback: Scanning CCPLEX Error Records for Uncorrectable Errors
[ 20.918764] **************************************
[ 20.918854] RAS Error in SCF:IOB, ERRSELR_EL1=1025:
[ 20.918937] Status = 0xfc009604
[ 20.918997] IERR = CBB Interface Error: 0x96
[ 20.919073] SERR = Assertion Failure: 0x4
[ 20.919144] Overflow (there may be more errors) - Uncorrectable
[ 20.919247] Uncorrectable (this is fatal)
[ 20.919322] MISC0 = 0x40
[ 20.919369] MISC1 = 0x26424445a1
[ 20.919429] ADDR = 0x800000001404000c
[ 20.919500] **************************************
[ 20.919586] ras_corecluster_serr_callback:Scanning CoreCluster Error Records for Uncorrectable Errors
[ 20.919769] **************************************
[ 20.919898] RAS Error in L2, ERRSELR_EL1=544:
[ 20.920247] Status = 0xfc006612
[ 20.920492] IERR = SCF to L2 Slave Error Read: 0x66
[ 20.920886] SERR = Error response from slave: 0x12
[ 20.921271] Overflow (there may be more errors) - Uncorrectable
[ 20.921724] Uncorrectable (this is fatal)
[ 20.924314] MISC0 = 0x80000000400000
[ 20.927985] MISC1 = 0x20240000000
[ 20.931399] ADDR = 0x800000001404000c
[ 20.935252] **************************************
[ 20.940163] ras_core_serr_callback: Scanning Core Error Records for Uncorrectable Errors
[ 20.948202] Bad mode in Error handler detected on CPU4, code 0xbe000000 -- SError
[ 25.195030] ras_ccplex_serr_callback: Scanning CCPLEX Error Records for Uncorrectable Errors
[ 25.195217] **************************************
[ 25.195302] RAS Error in SCF:IOB, ERRSELR_EL1=1025:
[ 25.195387] Status = 0xfc009604
[ 25.195447] IERR = CBB Interface Error: 0x96
[ 25.195522] SERR = Assertion Failure: 0x4
[ 25.195592] Overflow (there may be more errors) - Uncorrectable
[ 25.195692] Uncorrectable (this is fatal)
[ 25.195766] MISC0 = 0x40
[ 25.195813] MISC1 = 0x26424444a3
[ 25.195877] ADDR = 0x800000001404000c
[ 25.195946] **************************************
[ 25.196031] ras_corecluster_serr_callback:Scanning CoreCluster Error Records for Uncorrectable Errors
[ 25.196212] **************************************
[ 25.196336] RAS Error in L2, ERRSELR_EL1=544:
[ 25.196688] Status = 0xfc006612
[ 25.196944] IERR = SCF to L2 Slave Error Read: 0x66
[ 25.197322] SERR = Error response from slave: 0x12
[ 25.197690] Overflow (there may be more errors) - Uncorrectable
[ 25.198159] Uncorrectable (this is fatal)
[ 25.200604] MISC0 = 0x100000000400000
[ 25.204537] MISC1 = 0x40240000000
[ 25.207953] ADDR = 0x800000001404000c
[ 25.211801] **************************************
[ 25.216716] ras_core_serr_callback: Scanning Core Error Records for Uncorrectable Errors
[ 25.224489] Bad mode in Error handler detected on CPU5, code 0xbe000000 -- SError
[ 29.471333] ras_ccplex_serr_callback: Scanning CCPLEX Error Records for Uncorrectable Errors
[ 29.471525] **************************************
[ 29.471610] RAS Error in SCF:IOB, ERRSELR_EL1=1025:
[ 29.471694] Status = 0xfc009604
[ 29.471754] IERR = CBB Interface Error: 0x96
[ 29.471830] SERR = Assertion Failure: 0x4
[ 29.471901] Overflow (there may be more errors) - Uncorrectable
[ 29.472002] Uncorrectable (this is fatal)
[ 29.472078] MISC0 = 0x40
[ 29.472125] MISC1 = 0x2646444423
[ 29.472186] ADDR = 0x800000001404000c
[ 29.472257] **************************************
[ 29.472343] ras_corecluster_serr_callback:Scanning CoreCluster Error Records for Uncorrectable Errors
[ 29.472517] **************************************
[ 29.472624] RAS Error in L2, ERRSELR_EL1=528:
[ 29.472997] Status = 0xfc006612
[ 29.473239] IERR = SCF to L2 Slave Error Read: 0x66
[ 29.473629] SERR = Error response from slave: 0x12
[ 29.474000] Overflow (there may be more errors) - Uncorrectable
[ 29.474469] Uncorrectable (this is fatal)
[ 29.476908] MISC0 = 0x100000000400000
[ 29.480843] MISC1 = 0x40240000000
[ 29.484258] ADDR = 0x800000001404000c
[ 29.487851] **************************************
[ 29.492773] ras_core_serr_callback: Scanning Core Error Records for Uncorrectable Errors
[ 29.500796] Bad mode in Error handler detected on CPU3, code 0xbe000000 -- SError
[ 33.747616] ras_ccplex_serr_callback: Scanning CCPLEX Error Records for Uncorrectable Errors
[ 33.747801] **************************************
[ 33.747886] RAS Error in SCF:IOB, ERRSELR_EL1=1025:
[ 33.747971] Status = 0xfc009604
[ 33.748030] IERR = CBB Interface Error: 0x96
[ 33.748107] SERR = Assertion Failure: 0x4
[ 33.748178] Overflow (there may be more errors) - Uncorrectable
[ 33.748278] Uncorrectable (this is fatal)
[ 33.748357] MISC0 = 0x40
[ 33.748405] MISC1 = 0x26464444e1
[ 33.748467] ADDR = 0x800000001404000c
[ 33.748539] **************************************
[ 33.748624] ras_corecluster_serr_callback:Scanning CoreCluster Error Records for Uncorrectable Errors
[ 33.748800] **************************************
[ 33.748919] RAS Error in L2, ERRSELR_EL1=528:
[ 33.749273] Status = 0xfc006612
[ 33.749532] IERR = SCF to L2 Slave Error Read: 0x66
[ 33.749908] SERR = Error response from slave: 0x12
[ 33.750275] Overflow (there may be more errors) - Uncorrectable
[ 33.750749] Uncorrectable (this is fatal)
[ 33.753187] MISC0 = 0x80000000400000
[ 33.756602] MISC1 = 0x20240000000
[ 33.760274] ADDR = 0x800000001404000c
[ 33.764123] **************************************
[ 33.769048] ras_core_serr_callback: Scanning Core Error Records for Uncorrectable Errors
[ 33.777068] Bad mode in Error handler detected on CPU2, code 0xbe000000 -- SError
[ 38.023901] ras_ccplex_serr_callback: Scanning CCPLEX Error Records for Uncorrectable Errors
[ 38.024089] **************************************
[ 38.024174] RAS Error in SCF:IOB, ERRSELR_EL1=1025:
[ 38.024259] Status = 0xfc009604
[ 38.024318] IERR = CBB Interface Error: 0x96
[ 38.024394] SERR = Assertion Failure: 0x4
[ 38.024465] Overflow (there may be more errors) - Uncorrectable
[ 38.024564] Uncorrectable (this is fatal)
[ 38.024642] MISC0 = 0x40
[ 38.024689] MISC1 = 0x264a4445a1
[ 38.024750] ADDR = 0x800000001404000c
[ 38.024819] **************************************
[ 38.024904] ras_corecluster_serr_callback:Scanning CoreCluster Error Records for Uncorrectable Errors
[ 38.025064] **************************************
[ 38.025208] RAS Error in L2, ERRSELR_EL1=512:
[ 38.025548] Status = 0xfc006612
[ 38.025822] IERR = SCF to L2 Slave Error Read: 0x66
[ 38.026201] SERR = Error response from slave: 0x12
[ 38.026566] Overflow (there may be more errors) - Uncorrectable
[ 38.027034] Uncorrectable (this is fatal)
[ 38.029215] MISC0 = 0x80000000400000
[ 38.033147] MISC1 = 0x20240000000
[ 38.036559] ADDR = 0x800000001404000c
[ 38.040151] **************************************
[ 38.045090] ras_core_serr_callback: Scanning Core Error Records for Uncorrectable Errors
[ 38.053356] Bad mode in Error handler detected on CPU0, code 0xbe000000 -- SError
[ 42.300170] ras_ccplex_serr_callback: Scanning CCPLEX Error Records for Uncorrectable Errors
[ 42.300356] **************************************
[ 42.300439] RAS Error in SCF:IOB, ERRSELR_EL1=1025:
[ 42.300525] Status = 0xfc009604
[ 42.300584] IERR = CBB Interface Error: 0x96
[ 42.300660] SERR = Assertion Failure: 0x4
[ 42.300731] Overflow (there may be more errors) - Uncorrectable
[ 42.300831] Uncorrectable (this is fatal)
[ 42.300910] MISC0 = 0x40
[ 42.300958] MISC1 = 0x264a444523
[ 42.301019] ADDR = 0x800000001404000c
[ 42.301092] **************************************
[ 42.301178] ras_corecluster_serr_callback:Scanning CoreCluster Error Records for Uncorrectable Errors
[ 42.301343] **************************************
[ 42.301476] RAS Error in L2, ERRSELR_EL1=512:
[ 42.301829] Status = 0xfc006612
[ 42.302085] IERR = SCF to L2 Slave Error Read: 0x66
[ 42.302448] SERR = Error response from slave: 0x12
[ 42.302830] Overflow (there may be more errors) - Uncorrectable
[ 42.303287] Uncorrectable (this is fatal)
[ 42.305483] MISC0 = 0x100000000400000
[ 42.309419] MISC1 = 0x40240000000
[ 42.313090] ADDR = 0x800000001404000c
[ 42.316940] **************************************
[ 42.321875] ras_core_serr_callback: Scanning Core Error Records for Uncorrectable Errors
[ 42.329626] Bad mode in Error handler detected on CPU1, code 0xbe000000 -- SError
The startup part adopts the non-MCU solution recommended by the OEM documentation。Attached is the circuit diagram
power up master.pdf (140.2 KB)