Jetson Nano crashes after boot

I have a Jetson Nano development kit B01 that I run some CV workloads on. All of my CV workloads are run as docker containers and apart from these, an openvpn client service is the only one that I run on the device.

The device is now crashing continuously even when I try to run simple commands like ls, tail or cat. The Nano becomes unresponsive on crash and restarts. Tried a different SD Card with this device and it works fine.

How do I debug this? Unable to print syslog or dmesg either. Anyhelp is greatly appreciated.

Thanks in advance

Hi,

What kind of “crash” do you see if you run “ls”? It sounds the root file system is corrupted.

Nah, I don’t think root file system is corrupted. At least I’m able to check the remaining space on disk and free memory and swap details

What kind of error do you see when you use command “ls” “cat” “tail” ?

I should’ve clarified better. The system crashes; nothing is responsive and the Nano restarts

Are you able to give the serial console log when error happens?

https://elinux.org/Jetson/General_debug

Is there any other way to do it? I don’t have access to a PL2303HX TTL to USB cable

Sorry that I think this is the only way to gather the detail log.

I finally got UART access. This is what is printed on the serial console

[   39.304927] mmc0: Data timeout error
[   39.308607] sdhci: =========== REGISTER DUMP (mmc0)===========
[   39.314525] sdhci: Sys addr: 0x00000400 | Version:  0x00000303
[   39.320426] sdhci: Blk size: 0x00007200 | Blk cnt:  0x00000338
[   39.326320] sdhci: Argument: 0x03c51b48 | Trn mode: 0x0000003b
[   39.332215] sdhci: Present:  0x01fb0000 | Host ctl: 0x00000017
[   39.338107] sdhci: Power:    0x00000001 | Blk gap:  0x00000000
[   39.344000] sdhci: Wake-up:  0x00000000 | Clock:    0x00000007
[   39.349891] sdhci: Timeout:  0x0000000e | Int stat: 0x00000000
[   39.355785] sdhci: Int enab: 0x02ff100b | Sig enab: 0x02fc100b
[   39.361677] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000
[   39.367570] sdhci: Caps:     0x376cd08c | Caps_1:   0x10006f73
[   39.373462] sdhci: Cmd:      0x0000123a | Max curr: 0x00000000
[   39.379349] sdhci: Host ctl2: 0x0000308b
[   39.383337] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000ffefe420
[   39.389960] sdhci: ===========================================
[   39.397021] mmcblk0: error -110 transferring data, sector 63249224, nr 1024, cmd response 0x900, 0
[   50.854416] mmc0: Data timeout error
[   50.858116] sdhci: =========== REGISTER DUMP (mmc0)===========
[   50.864033] sdhci: Sys addr: 0x00000400 | Version:  0x00000303
[   50.869935] sdhci: Blk size: 0x00007200 | Blk cnt:  0x00000324
[   50.875832] sdhci: Argument: 0x03c51b48 | Trn mode: 0x0000003b
[   50.881724] sdhci: Present:  0x01fb0000 | Host ctl: 0x00000017
[   50.887618] sdhci: Power:    0x00000001 | Blk gap:  0x00000000
[   50.893492] sdhci: Wake-up:  0x00000000 | Clock:    0x00000007
[   50.899332] sdhci: Timeout:  0x0000000e | Int stat: 0x00000000
[   50.905156] sdhci: Int enab: 0x02ff100b | Sig enab: 0x02fc100b
[   50.910993] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000
[   50.916863] sdhci: Caps:     0x376cd08c | Caps_1:   0x10006f73
[   50.922744] sdhci: Cmd:      0x0000123a | Max curr: 0x00000000
[   50.928602] sdhci: Host ctl2: 0x0000300b
[   50.932548] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000ffefe420
[   50.939148] sdhci: ===========================================

The above message is repeated multiple times and the device restarts. I suspect there might be something wrong with the memory sectors, but I’m not sure how to understand what’s causing it or how to fix it.

Attaching the file with complete log
start_to_reboot.txt (41.6 KB)

That error is from sdcard driver.

Tried a different SD Card with this device and it works fine.

If this issue could be resolved with different sdcard, I don’t think it is a hardware defect on nano.

Is it possible for you to format this card and reinstall sdcard image or sdkmanager ?

I can do this, but that wouldn’t help me understand why this happens or help me avoid such situations in the future. Is there anyway for me to find out what is happening?

I cannot tell either. If this issue could be easily reproduced with specific steps, then we can try it with our device and investigate.

However, so far I guess even you don’t know how it was crashed.