AGX Orin cannot step into ubuntu22.04 desktop system needs repair

The Nvidia AGX Orin
file system is severely damaged now, causing the desktop to be inaccessible. The partition is mmcblk0p1. How can this root file system be repaired?

the startup log is here:
0714_V01.log (81.2 KB)

[   10.129950] systemd[1]: Starting Coldplug All udev Devices...
[   10.187181] systemd[1]: Started Journal Service.
[   10.238385] tegra194-pcie 14160000.pcie: Phy link never came up
[   10.269414] nvgpu: 17000000.gpu          nvgpu_nvhost_syncpt_init:122  [INFO]  syncpt_unit_base 60000000 syncpt_unit_size 4000000 size 10000
[   10.269414] 
[   10.332295] systemd-journald[261]: Received client request to flush runtime journal.
U debug prints will be routed to traces.
  10.655113] tegra-ivc-bus bc00000.rtcpu:ivc-bus:echo@0: ivc channel driver missing
[   10.655116] tegra-ivc-bus bc00000.rtcpu:ivc-bus:dbg@1: ivc channel driver missing
[   10.655118] tegra-ivc-bus bc00000.rtcpu:ivc-bus:dbg@2: ivc channel driver missing
[   10.655120] tegra-ivc-bus bc00000.rtcpu:ivc-bus:ivccontrol@3: ivc channel driver missing
[   10.655122] tegra-ivc-bus bc00000.rtcpu:ivc-bus:ivccapture@4: ivc channel driver missing
[   10.655124] tegra-ivc-bus bc00000.rtcpu:ivc-bus:diag@5: ivc channel driver missing
[   10.799403] (NULL device *): fops function table already registered
arting RmBootstrap
Registered event_type:[0] for dce_core_ipc_type:[1]
Registered event_type:[1] for dce_core_ipc_type:[3]
dce_ipc State Initialized
RmBootstrap completed successfully
  11.206274] EXT4-fs error (device mmcblk0p1): ext4_validate_block_bitmap:420: comm ext4lazyinit: bg 52: bad block bitmap checksum
[   13.856250] CPU:0, Error: cbb-fabric@0x13a00000, irq=184
[   13.856259] **************************************
[   13.856260] CPU:0, Error:cbb-fabric, Errmon:2
[   13.856266]    Error Code            : TIMEOUT_ERR
[   13.856267]    Overflow              : Multiple TIMEOUT_ERR
[   13.921752] 
[   13.921753]    Error Code            : TIMEOUT_ERR
[   13.921754]    MASTER_ID             : CCPLEX
[   13.921755]    Address               : 0x3e90078
[   13.921756]    Cache                 : 0x1 -- Bufferable 
[   13.921758]    Protection            : 0x2 -- Unprivileged, Non-Secure, Data Access
[   13.921760]    Access_Type           : Read
[   13.921761]    Access_ID             : 0x16
[   13.921762]    Fabric                : cbb-fabric
[   13.921763]    Slave_Id              : 0x2e
[   13.921764]    Burst_length          : 0x0
[   13.921765]    Burst_type            : 0x1
[   13.921766]    Beat_size             : 0x2
[   13.921767]    VQC                   : 0x0
[   13.921768]    GRPSEC                : 0x7e
[   13.921769]    FALCONSEC             : 0x0
[   13.987300]    AXI2APB_28_BLOCK_TMO_STATUS : 0x2
[   13.987302]    AXI2APB_28_BLOCK1_TMO : 0x10000
[   13.987303]    AXI2APB_28_BLOCK1_TMO : 0x0
[   13.987304]    AXI2APB_28_BLOCK1_TMO : 0x0
[   13.987305]    AXI2APB_28_BLOCK1_TMO : 0x0
[   13.987306]    AXI2APB_28_BLOCK1_TMO : 0x0
[   13.987308]  **************************************
[   13.987334] WARNING: CPU: 0 PID: 0 at drivers/soc/tegra/cbb/tegra234-cbb.c:608 tegra234_cbb_isr+0x144/0x190
[   13.987569] ---[ end trace d4662edc795697c2 ]---
[   14.992500] CPU:0, Error: cbb-fabric@0x13a00000, irq=184
[   14.992502] **************************************
[   14.992502] CPU:0, Error:cbb-fabric, Errmon:2
[   14.992507]    Error Code            : TIMEOUT_ERR
[   14.992508]    Overflow              : Multiple TIMEOUT_ERR
[   14.992515] 
[   14.992515]    Error Code            : TIMEOUT_ERR
[   14.992516]    MASTER_ID             : CCPLEX
[   14.992517]    Address               : 0x3ed00a4
[   14.992517]    Cache                 : 0x1 -- Bufferable 
[   14.992518]    Protection            : 0x2 -- Unprivileged, Non-Secure, Data Access
[   14.992520]    Access_Type           : Write
[   14.992520]    Access_ID             : 0x6
[   14.992521]    Fabric                : cbb-fabric
[   14.992521]    Slave_Id              : 0x2e
[   14.992522]    Burst_length          : 0x0
[   14.992522]    Burst_type            : 0x1
[   14.992523]    Beat_size             : 0x2
[   14.992523]    VQC                   : 0x0
[   14.992524]    GRPSEC                : 0x7e
[   14.992524]    FALCONSEC             : 0x0
[   14.992526]    AXI2APB_28_BLOCK_TMO_STATUS : 0x2
[   14.992528]    AXI2APB_28_BLOCK1_TMO : 0x10000
[   14.992529]    AXI2APB_28_BLOCK1_TMO : 0x0
[   14.992529]    AXI2APB_28_BLOCK1_TMO : 0x0
[   14.992530]    AXI2APB_28_BLOCK1_TMO : 0x0
[   14.992531]    AXI2APB_28_BLOCK1_TMO : 0x0
[   14.992531]    AXI2APB_28_BLOCK1_TMO : 0x0
[   14.992532]    AXI2APB_28_BLOCK1_TMO : 0x0
[   14.992532]    AXI2APB_28_BLOCK1_TMO : 0x0
[   14.992534]  **************************************
[   14.992549] WARNING: CPU: 0 PID: 0 at drivers/soc/tegra/cbb/tegra234-cbb.c:608 tegra234_cbb_isr+0x144/0x190
[   14.992686] ---[ end trace d4662edc795697c3 ]---
[   17.315777] rt5640 8-001c: Device with ID register 0xffff0000 is not rt5640/39
Press Enter for maintenance
(or press Control-D to continue): [  316.382349] EXT4-fs (mmcblk0p1): error count since last fsck: 4045
[  316.382359] EXT4-fs (mmcblk0p1): initial error at time 1700600237: ext4_validate_block_bitmap:420
[  316.382376] EXT4-fs (mmcblk0p1): last error at time 1700600238: ext4_validate_block_bitmap:420

Please reflash the board.

We have now used dozens of AGX ORIN devices, and currently only one has encountered this issue. Not sure if others will encounter it. Is there a way to fix it? Because flashing the system will destroy the entire root file system, which contains user programs.

  1. Can we power on again to enter repair mode or recovery mode, and then use commands to repair partition mmcblk0p1;

  2. How can we periodically check and repair the root file system to prevent this issue and reduce the possibility of failure.

Thank you

No, recovery mode is only for flashing. You cannot fix anything in this stage.

  1. How can we periodically check and repair the root file system to prevent this issue and reduce the possibility of failure.

Maybe run e2fsck tool periodically when system reboot.

The file system attribute is currently /dev/mmcblk0p1 on / type ext4 (rw,relatime), do we need to mount it as read-only before executing the repair? If periodic repairs are needed, then we cannot remount it as read-only for repair, right? This may affect running programs.

Is it possible to use an external Ubuntu 22.04 boot disk, enter with the “try ubuntu” option, and then repair the mmcblk0p1 partition?

Can this Nvidia Ubuntu 22.04 root file system enter emergency rescue mode or single user mode, and then repair the mmcblk0p1 partition? How can I enter if possible, are there any other ways to repair the mmcblk0p1 partition? Thank you!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.