Did I kill it?

canutethegreat · June 17, 2022, 12:48am

Hello,

I bought a Jetson AGX Orin Dev Kit and got it set up this morning. Plugged it in and went through the setup process. Everything seemed to be fine for a few hours. Then while installing some deb packages (nothing special) I saw this in the console:

[kernel:[ 1137.626092] BUG: workqueue lockup - pool cpus=10 node=0 flags=0x0 nice=0 stuck for XX

Where XX is a number. This repeated several times and then the machine froze. I let it sit for about 30 minutes before I decided it wasn’t coming back. I removed the power cord let it sit a few minutes and then plugged it back in. It powers on, the power light is white and I can hear the fan run once in a while. I get nothing on the attached display and network never comes up.

I tried attaching the supplied USB-C - USB-A cable to the USB-C port described in Getting Started with Jetson AGX Orin Developer Kit | NVIDIA Developer for accessing the debug console. However, either nothing shows up, it shows up as a disk (for a few minutes), or it shows up as an unknown device in error. While running I can hear the fan come on periodically so I don’t know if it’s running but in an error state or if it is in a boot loop?

Any suggestions/ideas? Thanks!

DaneLLL · June 17, 2022, 6:21am

Hi,
Please connect Orin to a host PC and check if you can set Orin to recovery mode and see the device on the hostPC(shown in lsusb command). If you can see the device, please then try to re-flash it through SDKManager.

canutethegreat · June 17, 2022, 3:22pm

Okay, update: if I leave it “on” for 7/8 hours (not exactly sure) it will eventually get to the desktop. I’ve successfully done this twice now. It is a bit sluggish and gnome-shell starts slowing eating more and more CPU time. If I log out of the GUI and log in remotely (SSH) the gnome-shell usage disappears and the system load seems to idle between 3.0 and 4.0 - which seems a bit high for idle to me. The only errors that I can find are:
systemctl --failed reports:
● nv_update_verifier.service loaded failed failed nv_update_verifier service

systemctl status nv_update_verifier.service reports:
● nv_update_verifier.service - nv_update_verifier service
Loaded: loaded (/etc/systemd/system/nv_update_verifier.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2022-06-16 18:22:29 PDT; 13h ago
Process: 1346 ExecStart=/usr/sbin/nv_update_engine --verify (code=exited, status=1/FAILURE)
Main PID: 1346 (code=exited, status=1/FAILURE)

Jun 16 18:22:29 horcrux nv_update_engine[1346]: Init SMD partition failed!
Jun 16 18:22:29 horcrux nv_update_engine[1346]: verifying update
Jun 16 18:22:29 horcrux nv_update_engine[1346]: Verify bootloader update begins.
Jun 16 18:22:29 horcrux nv_update_engine[1346]: Unable to find kernel cmdline paramater boot.slot_suffix=
Jun 16 18:22:29 horcrux nv_update_engine[1346]: Error: Verify bootloader update failed!
Jun 16 18:22:29 horcrux nv_update_engine[1346]: Verify rootfs update begins.
Jun 16 18:22:29 horcrux nv_update_engine[1346]: Fail to open metadata file
Jun 16 18:22:29 horcrux nv_update_engine[1346]: RootFS A/B is not enabled, verification finishes.
Jun 16 18:22:29 horcrux systemd[1]: nv_update_verifier.service: Main process exited, code=exited, status=1/FAILURE
Jun 16 18:22:29 horcrux systemd[1]: nv_update_verifier.service: Failed with result ‘exit-code’.

dmesg is full of messages like this:
start_addr=(0x20000), end_addr=(0x40000), buffer_size=(0x20000), smp_number_max=(16384)

I do not see any other messages/errors.

Is there anything that can/should be done while it is up and running? or should I just try to get into recovery mode and re-flash?

Thank you!

system · July 1, 2022, 3:23pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.