Xavier A no longer boots

Xavier A no longer boots. Is it a hw or sw error? Please let us know if it can be recovered by re-flashing the firmware.

OPTIONS: I18n                                                                
Compiled on Nov 15 2018, 20:18:47.                                           
Port /dev/ttyUSB2, 11:42:49                                                  
                                                                             
Press CTRL-A Z for help on special keys                                      
                                                                             
��NVG write denied!                                                          
MMFLAGS=1                                                                    
All ClockCycles offsets within tolerance                                     
Starting Safety                                                              
��NVG write denied!                                                          
MMFLAGS=1                               
��nvbpmpivc: nvpm device registered     
Starting pipe manager...                
��[    0.000000] bootconsole [uart8250] enabled
��e��0.000000] OF: fdt:Reserved memory:��S�� ��t��f��ar��a��t��i��i��l��n��e��g��d�� �       e
      ��mory for node 'fb0_carveout': base 0x0000000000000000, size 0 MiB
��t��0.000000] OF: fdt:Reserved memory: failed to reserve memory for node 'fb0_carveout': bas�
��d�� ��ar��[��t�� �� �� ��"�� ��b�� ��o��0��o��.��t��0��p��0��r��0��o��0��fi��0��           �
      ��t:Reserved memory: failed to reserve memory for node 'fb1_carveout': base 0x000000000B
[    0.000000] OF: fdt:Reserved memory: failed to reserve memory for node 'fb1_carveout': basB
[    0.000000] OF: fdt:Reserved memory: failed to reserve memory for node 'fb2_carveout': basB
[    0.000000] OF: fdt:Reserved memory: failed to reserve memory for node 'fb2_carveout': basB
[    0.000000] OF: fdt:Reserved memory: failed to reserve memory for node 'fb3_carveout': basB
[    0.000000] OF: fdt:Reserved memory: failed to reserve memory for node 'fb3_carveout': basB
[    0.000000] OF: reserved mem: initialized node generic_carveout, compatible id nvidia,genet
[    0.000000] OF: reserved mem: initialized node ramoops_carveout, compatible id nvidia,ramos
[    0.000000] OF: reserved mem: initialized node grid-of-semaphores, compatible id nvidia,gom
��t��0.000000] cm��n��a:��v�� ��b��R��p��es��m��er��p��v��i��e��v��d ��c��5��:��12��         �
��0�� ��e��[ ��r��  ��e�� ��d��0.��
      ��00000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.0 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v1.0
[    0.000000] percpu: Embedded 23 pages/cpu @ffffffc6695a7000 s55424 r8192 d30592 u94208
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 6359724
��:��0.00000��S��0��t��]��a�� ��r��K��t��e��i��r��n��n��g��e�� ��l��p�� ��i��c��p�           �
��1�� �� fbcon=map:9 aurixfw=AFW root=/dev/vblkdev0p1 gpt rootwait ip=off rw gpt console=ttyS�
      ��412400000009fe8240 bl_debug_data=65536@0x7f80020000 earlycon=uart8250,mmio32,0x0c28000
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Dentry cache hash table entries: 4194304 (order: 14, 67108864 bytes)
��o��0.000000] Inode-cache hash table entries: 2097152 (order: 12, 1��U��6��n��7��k��7��n�   �
��o�� ��n��[�� �� ��t�� ��y�� ��p�� ��e��0�� ��.��=��0�� ��0��0��0��x��0��0��0��0            �
      ��ry: 24794208K/25842688K available (11838K kernel code, 2238K rwdata, 4900K rodata, 53)
��y��0.000000] Virtual kernel memor��R��y ��u��l��n��a��n��y��i��o��n��u��g��t�� ��:��       �
�� �� ��s��[��t�� ��e�� ��m�� �� �� ��c��0��l��.��o��0��c��00��k��0�� ��0��p��0��            �
      ����platform_guest_warm_boot: guest warm reboot
��: ��C��0��u��x��r��ff��r��f��e��f��n��f��t��f�� ��8��c��0��l��0��o��0��c��0��k�            t
��o��-��d�� 0��: ��x��1��f��0��f��0��f��0��f�� ��f��u��f��s��8��e��0��c��0�� ��8�            �
��r��0��A��0��p��0��p�� ��l�� ��y�� ��i��(��n�� ��g�� �� �� ��c��1��l��2��o��8��c            �
��v�� ��i��[��o�� ��d�� �� �� ��a�� ��s��0��:��.�� ��0��1��0��0��0��0��0��0��0��0            �
      ��malloc : 0xffffff8008000000 - 0xffffffbebfff0000   (   250 GB)
[    0.000000]       .text : 0xffffff8008080000 - 0xffffff8008c10000   ( 11840 KB)
[    0.000000]     .rodata : 0xffffff8008c10000 - 0xffffff80090e0000   (  4928 KB)
[    0.000000]       .init : 0xffffff80090e0000 - 0xffffff8009610000   (  5312 KB)
[    0.000000]       .data : 0xffffff8009610000 - 0xffffff800983f808   (  2239 KB)
[    0.000000]        .bss : 0xffffff800983f808 - 0xffffff8009918068   (   867 KB)
[    0.000000]     fixed   : 0xffffffbefe7fd000 - 0xffffffbefec00000   (  4108 KB)
[    0.000000]     PCI I/O : 0xffffffbefee00000 - 0xffffffbeffe00000   (    16 MB)
[    0.000000]     vmemmap : 0xffffffbf00000000 - 0xffffffc000000000   (     4 GB maximum)
[    0.000000]               0xffffffbf00000000 - 0xffffffbf19a60000   (   410 MB actual)
�� ��0.000000]     memory  : 0��s��x��t��f��a��f��r��ff��t��f��in��fc��g��0�� ��0��Nv        �
��8��  ��f��d��f��ri��f��v��f��e��ff��r��c��.��6��.��6��.��9��
      ��00000   ( 26264 MB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=6, Nodes=1
[    0.000000] Preemptible hierarchical RCU implementation.
[    0.000000]  Build-time adjustment of leaf fanout to 64.
[    0.000000]  RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=6.
[    0.000000] R��/��C��d��U��e��:��v�� ��/��A��i��d��v��j��c��u��8��s��4��ting geom         6
��n��0.��E��0��n��0��a��0��b��0��li��0��n��0��g��]�� �� ��W��N��D��R��T��_�� d��I�           .
      ��r_��
��irqs:64 0
[    0.000000] arm_arch_timer: Architected cp15 timer(s) running at 31.25MHz (virt).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xe6a171046,s
[    0.000003] sched_clock: 56 bits at 31MHz, resolution 32ns, wraps every 4398046511088ns
[    0.000220] Console: colour dummy device 80x25
[    0.303840] kmemleak: Kernel memory leak detector disabled
[    0.303970] Calibrating delay loop (skipped), value calculated using timer frequency.. 62.)
[    0.303974] pid_max: default: 32768 minimum: 301
[    0.304342] Security Framework initialized
��s��0.304523] Mou��ma��n��i��t��n��-��: ��c��C��a��m��c��d��h��Re��e��s�� ��p��h��          �
��(order: 7, 524288 bytes)
[    0.304529] Mountpoint-cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.307830] Unable to find CPU node for /cpus/cpu@6
��t��0.340391] /��U��c��n��p��k��u��n��s��o��/��w��c��n ��p��t��u��t��-��c��m�� ��a          �
      �� get CPU for leaf core
[    0.346280] sched-energy: Sched-energy-costs installed from DT
[    0.352877] ASID allocator initialised with 65536 entries
[    0.407540] tegra-id: chipid=21917.
[    0.409746] tegra-id: opt_subrevision=0.
[    0.411883] Tegra Revision: A02 SKU: 0x90 CPU Process: 0 SoC Process: 0
[    0.415493] DTS File Name: /dvs/git/dirty/git-master_modular/kernel/kernel-4.9/arch/arm64/s
[    0.425933] DTB Build time: May  4 2019 05:43:24
��starting nvsafety
nvsafety started
starting NvGuard_Layer_1
starting NvGuard_Layer0_Safety_Srv
starting L1SS
/dev/ivc94: No such file or directory
# main: CmdRespExec_L0_Init failed 
��[    0.490645] CPU1: Booted secondary processor [4e0f0040]
[    0.536529] CPU2: Booted secondary processor [4e0f0040]
[    0.580263] CPU3: Booted secondary processor [4e0f0040]
��U���  ��t�� ��a�� ��r��0��t��.6��i��2��n��5��g��3�� ��9��n��7]��v�� ��r��C��m��            �
      ��4: Booted secondary processor [4e0f0040]
[    0.669829] CPU5: Booted secondary processor [4e0f0040]
[    0.670277] Brought up 6 CPUs
[    0.686632] SMP: Total of 6 processors activated.
[    0.689535] CPU features: detected feature: User Access Override
[    0.692733] CPU features: detected feature: 32-bit EL0 Support
[    0.699182] CPU: All CPU(s) started at EL1
[    0.702078] alternatives: patching kernel code
[    0.708579] devtmpfs: initialized
��Starting dtree-nvhvnet
comms and security are not enabled!
Configuring Static IP on hvnet interface
��[    0.756481] Initilizing CustomIPI irq domain
[    0.759516] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 76s
[    0.765016] futex hash table entries: 2048 (order: 6, 262144 bytes)
�� ��ble to start "bootprofiler" (2��[��)�� ��
      ��  0.773206��#��]�� �� pinctrl core: initialized pinctrl subsystem
[    0.776462] regulator-dummy: no parameters
[    0.778799] Initializing plugin-manager
[    0.780863] Plugin module not found
[    0.782812] Plugin-manager not available
[    0.785979] tegra_hv: adding ivc242: rx_base=ffffff8009980000 tx_base = ffffff8009980880 s*
����� 2995194|HV/c0: �����CPU:0, Error:CBBNOCAXI
����� 2999781|HV/c0: �����      Error Logger            : 0
����� 3004016|HV/c0: �����      ErrLog0                 : 0x80000000
����� 3008688|HV/c0: �����        Transaction Type      : RD  - Read, Incrementing
����� 3015392|HV/c0: �����        Error Code            : SLV
����� 3019802|HV/c0: �����        Error Source          : Target
����� 3024653|HV/c0: �����        Error Description     : Target error detected by CBB slave
����� 3032327|HV/c0: �����        Packet header Lock    : 0
����� 3037178|HV/c0: �����        Packet header Len1    : 0
����� 3042030|HV/c0: �����        NOC protocol version  : version >= 2.7
����� 3048208|HV/c0: �����      ErrLog1                 : 0x40000
����� 3052616|HV/c0: �����      ErrLog2                 : 0x0
����� 3056673|HV/c0: �����        RouteId               : 0x40000
����� 3061172|HV/c0: �����        InitFlow              : aon_p2ps/I/aon
����� 3066375|HV/c0: �����        Targflow              : gpu_p2pm/T/gpu_p2pm
����� 3072020|HV/c0: �����        TargSubRange          : 0
����� 3076431|HV/c0: �����        SeqId                 : 0
����� 3080312|HV/c0: �����      Err��data abort, halting
chronous external abort (dfsr=0x00001008), AXI slave error on read from 0x1700020c
��Log3                  : 0x0
����� 3093985|HV/c0: �����      ErrLog4                 : 0x80
����� 3098130|HV/c0: �����        debug using routeid alone as below address is a joker entry 
����� 3112685|HV/c0: �����      ErrLog5                 : 0x0
����� 3116741|HV/c0: �����        Master ID             : (null)
����� 3121328|HV/c0: �����        Non-Modify            : 0x0
����� 3125739|HV/c0: �����        AXI ID                : 0x0
����� 3129796|HV/c0: �����        Security Group(GRPSEC): 0x0
����� 3135090|HV/c0: �����        Cache                 : 0x0 -- Non-cacheable/Non-Bufferable)
����� 3142056|HV/c0: �����        Protection            : 0x0 -- Unprivileged, Secure, Data As
����� 3149730|HV/c0: �����        FALCONSEC             : 0x0
����� 3154052|HV/c0: �����        Virtual Queuing Channel(VQC): 0x0
����� 3159875|HV/c0: �����**************************************
����� 3165872|HV/c0: �����CBB-ERR: Access by non-CCPLEX master
����� 3171694|HV/c0: �����CBB-ERR: Forwarding Error to Safety SW!
����� 3177825|HV/c0: �����**************************************
����� 3183778|HV/c0: �����CPU:0, Error:CBBNOCBPMP
����� 3188455|HV/c0: �����      Error Logger            : 0
����� 3192689|HV/c0: �����      ErrLog0                 : 0x80030000
����� 3197361|HV/c0: �����        Transaction Type      : RD  - Read, Incrementing
����� 3204065|HV/c0: �����        Error Code            : SLV
����� 3208475|HV/c0: �����        Error Source          : Target
����� 3213327|HV/c0: �����        Error Description     : Target error detected by CBB slave
����� 3221000|HV/c0: �����        Packet header Lock    : 0
����� 3225851|HV/c0: �����        Packet header Len1    : 3
����� 3230703|HV/c0: �����        NOC protocol version  : version >= 2.7
����� 3236879|HV/c0: �����      ErrLog1                 : 0xbba00
����� 3241287|HV/c0: �����      ErrLog2                 : 0x0
����� 3245345|HV/c0: �����        RouteId               : 0xbba00
����� 3249843|HV/c0: �����        InitFlow              : aon_p2ps/I/aon
����� 3255049|HV/c0: �����        Targflow              : RESERVED
����� 3259723|HV/c0: �����        TargSubRange          : 93
����� 3264221|HV/c0: �����        SeqId
����� 3268103|HV/c0: �����      ErrLog3                 : 0x700020c
����� 3272689|HV/c0: �����      ErrLog4                 : 0x0
����� 3276749|HV/c0: �����        Address               : 0x700020c 
����� 3281510|HV/c0: �����      ErrLog5                 : 0xcfa30
����� 3285920|HV/c0: �����        Master ID             : CCPLEX
����� 3290506|HV/c0: �����        Non-Modify            : 0x1
����� 3294917|HV/c0: �����        AXI ID                : 0x0
����� 3298974|HV/c0: �����        Security Group(GRPSEC): 0x3e
����� 3304355|HV/c0: �����        Cache                 : 0x0 -- Non-cacheable/Non-Bufferable)
����� 3311323|HV/c0: �����        Protection            : 0x1 -- Privileged, Secure, Data Accs
����� 3318820|HV/c0: �����        FALCONSEC             : 0x2
����� 3323143|HV/c0: �����        Virtual Queuing Channel(VQC): 0x2
����� 3328965|HV/c0: �����**************************************
����� 3334997|HV/c0: �����CBB-ERR: Error address: 0x700020c not owned by any Guest
����� 3342548|HV/c0: �����CBB-ERR: Forwarding Error to Safety SW!
�����BUG: /dvs/git/dirty/git-master_foundation/hypervisor-t194/src/plat/t194/serror/tegra_cbb:
    �����
����r0  0x0000000a r1  0x00000001 r2  0x17000000 r3  0x00101000
01062e0: 5000c5f0 00000000 50106334 50077f00 |...P.....c.P...P|
0x501062f0: 50106304 00000001 0000001b 00000001 |.c.P............|
0x50106300: 50106334 50001d88 00000000 5007c700 |.c.P...P.......P|
0x50106310: 000000dd 500df4a8 500df228 5000c5f0 |.......P...P...P|
0x50106320: 00000000 500df4b0 500df4a8 500df228 |.......P...P...P|
call stack:
0x50078868
0x500776e8
0x50077a6c
0x50077f00
0x500058a8
0x50000b08
0x50000ad0
HALT: spinning forever...
��Tegra FuSaState: FUSA_UNSAFE_STATE
Tegra FuSaManager Error Log:
Tegra FuSaState: FUSA_UNSAFE_STATE                                            
Tegra FuSaManager Error Log:                                                    
Error 0: 0x10018a9                                                              
Error 1: 0x1010000                                                              
Error 2: 0xffffffff                                                             
Error 3: 0xffffffff                                                             
Error 4: 0xffffffff                                                             
Error 5: 0xffffffff                                                             
Error 6: 0xffffffff                                                             
Error 7: 0xffffffff                                                             
Error 8: 0xffffffff                                                             
Error 9: 0xffffffff                                                             
Error 10: 0xffffffff                                                            
Error 11: 0xffffffff                                                            
Error 12: 0xffffffff                                                            
Error 13: 0xffffffff                                                            
                                                                                
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR                                       
HSM ERROR 0, BACKBONE_CONTROL2HSM_FUNCTIONAL_CRITICAL_ERR                       
HSM ERROR 42, GPU                                                               
HSM ERROR 169, BPMPNOC_HSM_FUNCTIONAL_ERR

Dear jerryc_sf,

Can I know if you had flashed the system to Drive Software 9.0?
If not, could you please try to flash Drive Software 9.0? Thanks.

It has been flashed to 9.0.

xavier A no longer bootable while xavier B booted fine. This error was automatically recovered if we left it off for a day. However, it randomly came on again next time (shutdown and then reboot). please advise.

Dear jerryc_sf,

Could you please upload the full logs available at below locations for your topic?
~/.nvsdkm/sdkm.log
~/.nvsdkm/logs

No longer have the log files. it was upgraded to software 9.0 and installed in vehicle a while ago. even if I have the log file, it must be good since I knew software 9.0 was upgraded successfully.

The behavior is so wired. After a few good runs, suddenly Xavier A cannot boot right after a previous normal power off. reboot does not help. only if we wait until the next day, turn it back on, and it will automatically fix itself and boot fine again.

could you please just check what is going on from the log I posted? It seems some sort of AXI bus error which triggered a security flag?

Dear jerryc_sf,

If the platform is in a vehicle, could you please help to check #5 ~ #8 in DRIVE AGX Developer Kit Hardware Errata doc? Thanks.
The doc is on https://developer.nvidia.com/drive/documentation