Jetson Nano and SD corruption

Hi!

I am opening this topic because I have been experiencing SD corruption quite often.
I have previous experience with Raspberry PI, and I am aware that running a Linux image on an SD is quite delicate. This is especially true when it comes to providing a stable power supply.
I am currently supplying my Jetson Nano with the 5V line of a standard ATX (15A on the 5V line)

I am also very careful in shutting down correctly the board.

However, I am using the Jetson for a H24 object detection task on 2 RTSP streaming.
That is my target, but I have been experiencing file system corruption after 4-5hours of working time.

Now, I wonder if:

  • running the Jetson Nano from an SSD through the USB3.0 port would help the situation [https://www.youtube.com/watch?v=J9EJ52Za7IE]
  • mounting the ext4 partition as read-only would not prejudice the operation of my setup [rabbitMQ + NodeRed + DeepStream-test3-app]

I hope that you could give me some suggestion about how to avoid SD corruption in the future.
Thanks!

Where do you get the uSD cards from?

Hi,

I have tried a few brands (lately Kingston and Netac)

The fact is that I plan to leave the system ON for 365 days a year. I am not sure that an SD card is the right storage device. I do not have to storage anything during the operation of my application, but I am not really sure about the activities carried out by the O.S. (I am also not sure about the storage activity of the AMQP broker as well)

I had the same problem with Armbian on a couple of Orange PI that I use to control two 3D printers that I have. I solved the SD corruption problem by configuring the system to work with the partition mounted as read-only.
Not a problem since then, and I turn off the orange PIs by unplugging the power supply.

@nvidiadev1 do you have any suggestion?

Thanks again!

  1. if you plan to use the system on all the time, you need to do something to reduce the SD card written. If you programs do not need to write things on SD card often, the most writing on Linux is the log files. consider to move the log file folder in to ramdisk. Or you can completely convert the Linux OS into ramfs which may need more work and test.

  2. Some programs will create cache file on disk and write on it frequently, for example opencv and some other AI-related program. Be careful of these programs and if needed you may use ramdisk as well.

  3. Even the SD card is not designed for long run, it still can be used for quite a long time without any issue. Esp. if you use larger SD card (128GB+) or use endurance SD card designed for video recording. It shouldn’t corrupt in few hours use. I would highly suspect that the SD cards you used are counterfeit.

  4. the power supply may not be the major issue since the power SD card used is stepped down on Jetson Nano itself.

Thanks @sunxishan for your message.

Do you think that running the Jetson Nano from a 256GB SSD through USB3.0 port might solve the issue?
[https://www.jetsonhacks.com/2019/09/17/jetson-nano-run-from-usb-drive/]

Thanks again!

Seeing corruption after 4-5 hours suggests to me that there is something else wrong here, not just the finite life of uSD cards.

Yes you can mount things read-only and use other storage. You should also check what disk activity is happening, and consider having /tmp in a ramdisk and perhaps also /log. (The problem with having /log in RAM is that you want to see the logs after a crash; it would be just as useful IMO to have no logs at all.)

But I guess most people here get MUCH longer that 4-5 hours from a card before it starts to fail; if you were buying absolute-cheapest cards from eBay I would say that was the problem, but it sounds like you’re using at least somewhat respectable brands. So maybe there is some other problem. Is your power supply stable? Are the any clues in syslog?

(What size uSD cards are you using? In my experience with USB flash sticks, the unreliable one was the one that was very large by the standards of the time.)

You could try to put rootfs on external storage.

https://elinux.org/Jetson/L4T/Boot_From_External_Device

No. I don’t think so.

Check you program and OS first to see what is writing to you disk frequently. And if you really heavily need disk writing you can consider to use our PCIE-> SATA solution and use spinning hard drive (HDD).

USB based external hard drive (HDD) is also another solution. but based on our testing, the stability of USB port data transfer for long term running is not good. you need also consider power supply of USB port enough for HDD or not.

Thanks @sunxishan , @nvidiadev1 @WayneWWW.

I was now checking the syslog and I believe something interesting is logged.
First of all this is what was logged before the crash:

Aug 14 14:29:33 jnano-desktop kernel: [20951.970291] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 14:29:33 jnano-desktop kernel: [20951.993920] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 14:29:33 jnano-desktop kernel: [20952.015255] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 14:29:33 jnano-desktop kernel: [20952.033880] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 14:29:33 jnano-desktop kernel: [20952.052526] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 14:29:33 jnano-desktop kernel: [20952.070419] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 14:29:33 jnano-desktop kernel: [20952.088838] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 14:29:33 jnano-desktop kernel: [20952.106888] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 14:29:33 jnano-desktop kernel: [20952.125841] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 14:29:33 jnano-desktop kernel: [20952.143954] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 14:29:35 jnano-desktop kernel: [20953.441030] EXT4-fs error: 1748 callbacks suppressed
Aug 14 14:29:35 jnano-desktop kernel: [20953.441041] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 14:29:35 jnano-desktop kernel: [20953.462450] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 14:29:35 jnano-desktop kernel: [20953.480005] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 14:29:35 jnano-desktop kernel: [20953.496381] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 14:29:35 jnano-desktop kernel: [20953.512694] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 14:29:35 jnano-desktop kernel: [20953.529289] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 14:29:35 jnano-desktop kernel: [20953.546100] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 14:29:35 jnano-desktop kernel: [20953.562424] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 14:29:35 jnano-desktop kernel: [20953.578812] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 14:29:35 jnano-desktop kernel: [20953.596227] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum

Those error messages start 4 minutes before the crash and they are continuously repeated until the crash.

Those errors are continuosly repeated in the syslog. You see here that they have started at 8AM:

Aug 14 08:45:58 jnano-desktop kernel: [  337.285358] EXT4-fs warning: 1431 callbacks suppressed
Aug 14 08:45:58 jnano-desktop kernel: [  337.285363] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 08:45:58 jnano-desktop kernel: [  337.308944] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 08:45:58 jnano-desktop kernel: [  337.326955] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 08:45:58 jnano-desktop kernel: [  337.345052] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 08:45:58 jnano-desktop kernel: [  337.364263] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 08:45:58 jnano-desktop kernel: [  337.382330] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 08:45:58 jnano-desktop kernel: [  337.399843] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 08:45:58 jnano-desktop kernel: [  337.400457] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 08:45:58 jnano-desktop kernel: [  337.402796] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 08:45:58 jnano-desktop kernel: [  337.405146] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #782237: comm ThreadPoolForeg: No space for directory leaf checksum. Please run e2fsck -D.
Aug 14 08:45:58 jnano-desktop kernel: [  337.490156] EXT4-fs error: 1459 callbacks suppressed
Aug 14 08:45:58 jnano-desktop kernel: [  337.490163] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 08:45:58 jnano-desktop kernel: [  337.512205] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 08:45:58 jnano-desktop kernel: [  337.530042] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 08:45:58 jnano-desktop kernel: [  337.546304] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 08:45:58 jnano-desktop kernel: [  337.562473] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 08:45:58 jnano-desktop kernel: [  337.580126] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 08:45:58 jnano-desktop kernel: [  337.601778] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 08:45:58 jnano-desktop kernel: [  337.617907] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 08:45:58 jnano-desktop kernel: [  337.634107] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 08:45:58 jnano-desktop kernel: [  337.650259] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #782237: block 31: comm ThreadPoolForeg: Directory block failed checksum
Aug 14 08:46:03 jnano-desktop kernel: [  342.290281] EXT4-fs warning: 1662 callbacks suppressed

Another error that i see is the following:

Aug 14 08:45:57 jnano-desktop compiz[6986]: Error: Can't initialize nvrm channel

Also this last one is continuosly repeated in the syslog.

I hope you could give me some suggestions.

Thanks!!

Hi,

The compiz is one component in gnome desktop manager to do the composite of each frame on monitor.

I am not sure whether this is an issue from sd corruption or not.

Could you share the dmesg when this error happens? If you cannot do that because system hang or something else, please check the log from serial console.

https://elinux.org/Jetson/General_debug

Actually, I would like to know whether your application needs to write something on sdcard continuously. It looks like you didn’t tell us this in previous comments.

Hi @WayneWWW,

thanks for your message!
First, let me state that I was able to repair the FS of the SD card by using fsck.ext4.

Then, I reproduced the same situation and I was able to log dmesg:

[   10.422832] r8168: eth0: link up
[   10.422899] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   15.284019] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[   15.284031] Bluetooth: BNEP socket layer initialized
[   15.891258] fuse init (API version 7.26)
[   16.739234] tegradc tegradc.0: unblank
[   16.739245] tegradc tegradc.1: blank - powerdown
[  318.133353] nvmap_alloc_handle: PID 10019: deepstream-test: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant.
[  654.999334] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
[  655.014434] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
[  655.032735] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
[  655.047904] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
[  655.062529] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
[  655.077453] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
[  655.101674] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
[  655.116727] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
[  655.137989] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
[  655.152961] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
[  655.167154] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
[  655.182084] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
[  655.197337] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
[  655.212415] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
[  655.226423] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
[  655.241339] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
[  655.256419] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
[  655.271365] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
[  655.285614] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
[  655.300607] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
[  675.776058] nvmap_alloc_handle: PID 12492: deepstream-test: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant.
[ 1041.242183] tegradc tegradc.0: read_edid_into_buffer: extension_blocks = 1, max_ext_blocks = 3
[ 1041.258877] tegradc tegradc.0: hdmi_recheck_edid: read_edid_into_buffer() returned 256
[ 1041.258938] tegradc tegradc.0: old edid len = 256
[ 1041.259030] tegradc tegradc.0: hdmi: No EDID change after HPD bounce, taking no action
[ 1041.390191] tegradc tegradc.0: read_edid_into_buffer: extension_blocks = 1, max_ext_blocks = 3
[ 1041.406875] tegradc tegradc.0: hdmi_recheck_edid: read_edid_into_buffer() returned 256
[ 1041.406935] tegradc tegradc.0: old edid len = 256
[ 1041.407027] tegradc tegradc.0: hdmi: No EDID change after HPD bounce, taking no action
[ 1045.381796] usb 2-1.1: new SuperSpeed USB device number 3 using tegra-xusb
[ 1045.417717] usb 2-1.1: New USB device found, idVendor=0951, idProduct=1666
[ 1045.417723] usb 2-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1045.417727] usb 2-1.1: Product: DataTraveler 3.0
[ 1045.417730] usb 2-1.1: Manufacturer: Kingston
[ 1045.417733] usb 2-1.1: SerialNumber: 60A44C3FABFFBE31D96C01D8
[ 1045.418731] usb-storage 2-1.1:1.0: USB Mass Storage device detected
[ 1045.419024] scsi host0: usb-storage 2-1.1:1.0
[ 1046.494754] scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0 PMAP PQ: 0 ANSI: 6
[ 1047.014151] sd 0:0:0:0: [sda] 30720000 512-byte logical blocks: (15.7 GB/14.6 GiB)
[ 1047.024284] sd 0:0:0:0: [sda] Write Protect is off
[ 1047.029492] sd 0:0:0:0: [sda] Mode Sense: 23 00 00 00
[ 1047.032284] sd 0:0:0:0: [sda] No Caching mode page found
[ 1047.037913] sd 0:0:0:0: [sda] Assuming drive cache: write through
[ 1047.085395]  sda: sda1
[ 1047.090977] sd 0:0:0:0: [sda] Attached SCSI removable disk
[ 1254.078532] EXT4-fs warning: 38 callbacks suppressed
[ 1254.078538] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm pool: No space for directory leaf checksum. Please run e2fsck -D.
[ 1254.098489] EXT4-fs error: 38 callbacks suppressed
[ 1254.098493] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #1187096: comm pool: Directory block failed checksum
[ 1314.259529] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm pool: No space for directory leaf checksum. Please run e2fsck -D.
[ 1314.274289] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:963: inode #1187096: comm pool: Directory block failed checksum
[ 1361.712822] usb 2-1.1: USB disconnect, device number 3
[ 1361.903145] usb 2-1: usb_suspend_both: status 0
[ 1361.903284] usb usb2: usb_suspend_both: status 0
[ 1361.907313] FAT-fs (sda1): unable to read boot sector to mark fs as dirty
[ 1684.387514] tegradc tegradc.0: blank - powerdown
[ 1684.450478] extcon-disp-state extcon:disp-state: cable 47 state 0
[ 1684.450480] Extcon AUX1(HDMI) disable
[ 1708.189605] extcon-disp-state extcon:disp-state: cable 51 state 0
[ 1708.189651] Extcon HDMI: HPD disabled
[ 1708.190067] tegradc tegradc.0: hdmi: unplugged
[ 1708.235222] tegradc tegradc.0: blank - powerdown
[ 1708.235232] tegradc tegradc.0: unblank
[ 1708.235259] tegradc tegradc.0: unblank
[ 1708.235267] tegradc tegradc.1: blank - powerdown
[ 1708.371802] tegradc tegradc.0: blank - powerdown
[ 1708.371818] tegradc tegradc.0: unblank
[ 1708.382689] tegradc tegradc.0: nominal-pclk:148500000 parent:148500000 div:1.0 pclk:148500000 147015000~161865000
[ 1708.382793] tegradc tegradc.0: hdmi: tmds rate:148500K prod-setting:prod_c_hdmi_75m_150m
[ 1708.386562] tegradc tegradc.0: hdmi: get RGB quant from EDID.
[ 1708.386569] tegradc tegradc.0: hdmi: get YCC quant from EDID.
[ 1708.421353] extcon-disp-state extcon:disp-state: cable 47 state 1
[ 1708.421356] Extcon AUX1(HDMI) enable
[ 1708.421621] extcon-disp-state extcon:disp-state: cable 51 state 1
[ 1708.421625] Extcon HDMI: HPD enabled
[ 1708.421689] tegradc tegradc.0: hdmi: plugged
[ 1708.444913] tegradc tegradc.0: blank - powerdown
[ 1708.503635] extcon-disp-state extcon:disp-state: cable 47 state 0
[ 1708.503637] Extcon AUX1(HDMI) disable
[ 1708.524008] tegradc tegradc.0: unblank
[ 1708.534663] tegradc tegradc.0: nominal-pclk:148500000 parent:148500000 div:1.0 pclk:148500000 147015000~161865000
[ 1708.534747] tegradc tegradc.0: hdmi: tmds rate:148500K prod-setting:prod_c_hdmi_75m_150m
[ 1708.535756] tegradc tegradc.0: hdmi: get RGB quant from EDID.
[ 1708.535763] tegradc tegradc.0: hdmi: get YCC quant from EDID.
[ 1708.570919] extcon-disp-state extcon:disp-state: cable 47 state 1
[ 1708.570921] Extcon AUX1(HDMI) enable
[ 1708.571101] tegradc tegradc.0: unblank
[ 1708.571108] tegradc tegradc.1: blank - powerdown
[ 1708.695563] tegradc tegradc.0: unblank
[ 1708.695722] tegradc tegradc.1: blank - powerdown

Something very similar is also visible in the syslog:

Aug 17 12:50:05 jnano-desktop compiz[7095]: Error: Can't initialize nvrm channel
Aug 17 12:50:20 jnano-desktop kernel: [  654.999334] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
Aug 17 12:50:20 jnano-desktop kernel: [  655.014434] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
Aug 17 12:50:20 jnano-desktop kernel: [  655.032735] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
Aug 17 12:50:20 jnano-desktop kernel: [  655.047904] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
Aug 17 12:50:20 jnano-desktop kernel: [  655.062529] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
Aug 17 12:50:20 jnano-desktop kernel: [  655.077453] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
Aug 17 12:50:20 jnano-desktop kernel: [  655.101674] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
Aug 17 12:50:20 jnano-desktop kernel: [  655.116727] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
Aug 17 12:50:20 jnano-desktop kernel: [  655.137989] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
Aug 17 12:50:20 jnano-desktop kernel: [  655.152961] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
Aug 17 12:50:20 jnano-desktop kernel: [  655.167154] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
Aug 17 12:50:20 jnano-desktop kernel: [  655.182084] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
Aug 17 12:50:21 jnano-desktop kernel: [  655.197337] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
Aug 17 12:50:21 jnano-desktop kernel: [  655.212415] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
Aug 17 12:50:21 jnano-desktop kernel: [  655.226423] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
Aug 17 12:50:21 jnano-desktop kernel: [  655.241339] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
Aug 17 12:50:21 jnano-desktop kernel: [  655.256419] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
Aug 17 12:50:21 jnano-desktop kernel: [  655.271365] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
Aug 17 12:50:21 jnano-desktop kernel: [  655.285614] EXT4-fs warning (device mmcblk0p1): ext4_dirent_csum_verify:353: inode #1187096: comm python: No space for directory leaf checksum. Please run e2fsck -D.
Aug 17 12:50:21 jnano-desktop kernel: [  655.300607] EXT4-fs error (device mmcblk0p1): ext4_find_entry:1451: inode #1187096: comm python: checksumming directory block 0
Aug 17 12:50:41 jnano-desktop kernel: [  675.776058] nvmap_alloc_handle: PID 12492: deepstream-test: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant. 
Aug 17 12:50:47 jnano-desktop compiz[7095]: message repeated 27 times: [ Error: Can't initialize nvrm channel]

Do you have any idea about this error?

Thanks again!!

The “No space for directory leaf checksum. Please run e2fsck -D” message seems to indicate a disk full or fragmentation issue, but I guess it could also be a symptom of corruption. Did you try e2fsk -D ? Is the device full?

I think you need to find an ext4 filesystem expert who can tell you what that message really means.

Hi @nvidiadev1,

I was not able to find any additional information regarding the error online.
Thus, I decided to go for a fresh installation.
Having setup the virgin image for jetson nano I have followed this guide https://www.jetsonhacks.com/2019/09/17/jetson-nano-run-from-usb-drive/ to run the system from an SSD drive connected to the jetson nano via USB3.0.

No problem found so far. The board has been running for 24h+.
Additionally I have measured a lower temperature of A0-therm. By running the system from the SD the temperature was 51-52 degree. Now, with the SSD the temperature is stationary at 48.5.

I will keep you updated if I encounter the problem later.

Thanks again!

Dear All,

I had another crash after 72 hours. No filesystem corruption (I am running the Nano from SSD now)
I cannot see any suspicious information in the latest part of syslog that might be related to the crash.

Aug 22 06:42:38 jnano-desktop nautilus-autostart.desktop[3884]: Error: Can't initialize nvrm channel
Aug 22 06:51:46 jnano-desktop nautilus-autostart.desktop[3884]: message repeated 2 times: [ Error: Can't initialize nvrm channel]
Aug 22 07:00:10 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_send_async: using msg topic topicname
Aug 22 07:00:10 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_send_async: Send success
Aug 22 07:00:10 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_send_async: using msg topic topicname
Aug 22 07:00:10 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_send_async: Send success
Aug 22 07:00:10 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_do_work: Message sent = {#012  "messageid" : "907287bf-d84c-4222-891a-4cf73e20ac8f",#012  "mdsversion" : "1.0",#012  "@timestamp" : "2020-08-22T05:00:10.841Z",#012  "place" : {#012    "id" : "1",#012    "name" : "XYZ",#012    "type" : "giardino",#012    "location" : {#012      "lat" : 30.32,#012      "lon" : -40.549999999999997,#012      "alt" : 100.0#012    },#012    "entrance" : {#012      "name" : "walsh",#012      "lane" : "lane1",#012      "level" : "P2",#012      "coordinate" : {#012        "x" : 1.0,#012        "y" : 2.0,#012        "z" : 3.0#012      }#012    }#012  },#012  "sensor" : {#012    "id" : "CAMERA_0",#012    "type" : "Camera",#012    "description" : "\"Entrance of Garage Right Lane\"",#012    "location" : {#012      "lat" : 45.293701446999997,#012      "lon" : -75.830391449900006,#012      "alt" : 48.155747933800001#012    },#012    "coordinate" : {#012      "x" : 5.2000000000000002,#012      "y" : 10.1,#012      "z" : 11.199999999999999#012    }#012  },#012  "analyticsModule" : {#012    "id" : "XYZ",#012    "description" : "\"Vehicle Detection and License Plate Recognition\"",#012    "source" : "OpenALR",#012    "version" : "1.0",#012    "confidence" : -0.10000000149011612#012  },#012  "object" : {#012    "id" : "-1",#012    "speed" : 0.0,#012    "direction" : 0.0,#012    "orientation" : 0.0,#012    "person" : {#012      "age" : 45,#012      "gender" : "male",#012      "hair" : "black",#012      "cap" : "none",#012      "apparel" : "formal",#012      "confidence" : -0.10000000149011612#012    },#012    "bbox" : {#012      "topleftx" : 754,#012      "toplefty" : 460,#012      "bottomrightx" : 810,#012      "bottomrighty" : 585#012    },#012    "location" : {#012      "lat" : 0.0,#012      "lon" : 0.0,#012      "alt" : 0.0#012    },#012    "coordinate" : {#012      "x" : 0.0,#012      "y" : 0.0,#012      "z" : 0.0#012    }#012  },#012  "event" : {#012    "id" : "8c8264b2-b994-44b5-b315-a553062cfeb0",#012    "type" : "entry"#012  },#012  "videoPath" : ""#012}
Aug 22 07:00:10 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_do_work: Message sent = {#012  "messageid" : "fa06027e-b406-4105-8a6f-ff09bdc8ae39",#012  "mdsversion" : "1.0",#012  "@timestamp" : "2020-08-22T05:00:10.841Z",#012  "place" : {#012    "id" : "1",#012    "name" : "XYZ",#012    "type" : "giardino",#012    "location" : {#012      "lat" : 30.32,#012      "lon" : -40.549999999999997,#012      "alt" : 100.0#012    },#012    "entrance" : {#012      "name" : "walsh",#012      "lane" : "lane1",#012      "level" : "P2",#012      "coordinate" : {#012        "x" : 1.0,#012        "y" : 2.0,#012        "z" : 3.0#012      }#012    }#012  },#012  "sensor" : {#012    "id" : "CAMERA_0",#012    "type" : "Camera",#012    "description" : "\"Entrance of Garage Right Lane\"",#012    "location" : {#012      "lat" : 45.293701446999997,#012      "lon" : -75.830391449900006,#012      "alt" : 48.155747933800001#012    },#012    "coordinate" : {#012      "x" : 5.2000000000000002,#012      "y" : 10.1,#012      "z" : 11.199999999999999#012    }#012  },#012  "analyticsModule" : {#012    "id" : "XYZ",#012    "description" : "\"Vehicle Detection and License Plate Recognition\"",#012    "source" : "OpenALR",#012    "version" : "1.0",#012    "confidence" : -0.10000000149011612#012  },#012  "object" : {#012    "id" : "-1",#012    "speed" : 0.0,#012    "direction" : 0.0,#012    "orientation" : 0.0,#012    "person" : {#012      "age" : 45,#012      "gender" : "male",#012      "hair" : "black",#012      "cap" : "none",#012      "apparel" : "formal",#012      "confidence" : -0.10000000149011612#012    },#012    "bbox" : {#012      "topleftx" : 773,#012      "toplefty" : 450,#012      "bottomrightx" : 834,#012      "bottomrighty" : 579#012    },#012    "location" : {#012      "lat" : 0.0,#012      "lon" : 0.0,#012      "alt" : 0.0#012    },#012    "coordinate" : {#012      "x" : 0.0,#012      "y" : 0.0,#012      "z" : 0.0#012    }#012  },#012  "event" : {#012    "id" : "0943759c-04b0-427a-ba5a-27c987c75cdb",#012    "type" : "entry"#012  },#012  "videoPath" : ""#012}
Aug 22 07:00:10 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_send_async: using msg topic topicname
Aug 22 07:00:10 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_send_async: Send success
Aug 22 07:00:10 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_do_work: Message sent = {#012  "messageid" : "e02e2b5a-2649-4150-b6ab-e26a063213b1",#012  "mdsversion" : "1.0",#012  "@timestamp" : "2020-08-22T05:00:10.942Z",#012  "place" : {#012    "id" : "1",#012    "name" : "XYZ",#012    "type" : "giardino",#012    "location" : {#012      "lat" : 30.32,#012      "lon" : -40.549999999999997,#012      "alt" : 100.0#012    },#012    "entrance" : {#012      "name" : "walsh",#012      "lane" : "lane1",#012      "level" : "P2",#012      "coordinate" : {#012        "x" : 1.0,#012        "y" : 2.0,#012        "z" : 3.0#012      }#012    }#012  },#012  "sensor" : {#012    "id" : "CAMERA_0",#012    "type" : "Camera",#012    "description" : "\"Entrance of Garage Right Lane\"",#012    "location" : {#012      "lat" : 45.293701446999997,#012      "lon" : -75.830391449900006,#012      "alt" : 48.155747933800001#012    },#012    "coordinate" : {#012      "x" : 5.2000000000000002,#012      "y" : 10.1,#012      "z" : 11.199999999999999#012    }#012  },#012  "analyticsModule" : {#012    "id" : "XYZ",#012    "description" : "\"Vehicle Detection and License Plate Recognition\"",#012    "source" : "OpenALR",#012    "version" : "1.0",#012    "confidence" : -0.10000000149011612#012  },#012  "object" : {#012    "id" : "-1",#012    "speed" : 0.0,#012    "direction" : 0.0,#012    "orientation" : 0.0,#012    "person" : {#012      "age" : 45,#012      "gender" : "male",#012      "hair" : "black",#012      "cap" : "none",#012      "apparel" : "formal",#012      "confidence" : -0.10000000149011612#012    },#012    "bbox" : {#012      "topleftx" : 765,#012      "toplefty" : 460,#012      "bottomrightx" : 818,#012      "bottomrighty" : 579#012    },#012    "location" : {#012      "lat" : 0.0,#012      "lon" : 0.0,#012      "alt" : 0.0#012    },#012    "coordinate" : {#012      "x" : 0.0,#012      "y" : 0.0,#012      "z" : 0.0#012    }#012  },#012  "event" : {#012    "id" : "2fe14962-6d96-4b6e-a08d-8342ecd0761c",#012    "type" : "entry"#012  },#012  "videoPath" : ""#012}
Aug 22 07:00:11 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_send_async: using msg topic topicname
Aug 22 07:00:11 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_send_async: Send success
Aug 22 07:00:11 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_do_work: Message sent = {#012  "messageid" : "369b1317-3fdf-4c69-8a34-310f3c93fa4d",#012  "mdsversion" : "1.0",#012  "@timestamp" : "2020-08-22T05:00:11.043Z",#012  "place" : {#012    "id" : "1",#012    "name" : "XYZ",#012    "type" : "giardino",#012    "location" : {#012      "lat" : 30.32,#012      "lon" : -40.549999999999997,#012      "alt" : 100.0#012    },#012    "entrance" : {#012      "name" : "walsh",#012      "lane" : "lane1",#012      "level" : "P2",#012      "coordinate" : {#012        "x" : 1.0,#012        "y" : 2.0,#012        "z" : 3.0#012      }#012    }#012  },#012  "sensor" : {#012    "id" : "CAMERA_0",#012    "type" : "Camera",#012    "description" : "\"Entrance of Garage Right Lane\"",#012    "location" : {#012      "lat" : 45.293701446999997,#012      "lon" : -75.830391449900006,#012      "alt" : 48.155747933800001#012    },#012    "coordinate" : {#012      "x" : 5.2000000000000002,#012      "y" : 10.1,#012      "z" : 11.199999999999999#012    }#012  },#012  "analyticsModule" : {#012    "id" : "XYZ",#012    "description" : "\"Vehicle Detection and License Plate Recognition\"",#012    "source" : "OpenALR",#012    "version" : "1.0",#012    "confidence" : -0.10000000149011612#012  },#012  "object" : {#012    "id" : "-1",#012    "speed" : 0.0,#012    "direction" : 0.0,#012    "orientation" : 0.0,#012    "person" : {#012      "age" : 45,#012      "gender" : "male",#012      "hair" : "black",#012      "cap" : "none",#012      "apparel" : "formal",#012      "confidence" : -0.10000000149011612#012    },#012    "bbox" : {#012      "topleftx" : 757,#012      "toplefty" : 465,#012      "bottomrightx" : 808,#012      "bottomrighty" : 582#012    },#012    "location" : {#012      "lat" : 0.0,#012      "lon" : 0.0,#012      "alt" : 0.0#012    },#012    "coordinate" : {#012      "x" : 0.0,#012      "y" : 0.0,#012      "z" : 0.0#012    }#012  },#012  "event" : {#012    "id" : "be2c9ca7-ba18-41fd-b28d-5c52dacfa437",#012    "type" : "entry"#012  },#012  "videoPath" : ""#012}
Aug 22 07:00:11 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_send_async: using msg topic topicname
Aug 22 07:00:11 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_send_async: Send success
Aug 22 07:00:11 jnano-desktop deepstream-test3-app: DSLOG:NVDS_AMQP_PROTO: nvds_msgapi_do_work: Message sent = {#012  "messageid" : "256e30a5-7d3a-47c8-a5fc-9a5ab28b2374",#012  "mdsversion" : "1.0",#012  "@timestamp" : "2020-08-22T05:00:11.146Z",#012  "place" : {#012    "id" : "1",#012    "name" : "XYZ",#012    "type" : "giardino",#012    "location" : {#012      "lat" : 30.32,#012      "lon" : -40.549999999999997,#012      "alt" : 100.0#012    },#012    "entrance" : {#012      "name" : "walsh",#012      "lane" : "lane1",#012      "level" : "P2",#012      "coordinate" : {#012        "x" : 1.0,#012        "y" : 2.0,#012        "z" : 3.0#012      }#012    }#012  },#012  "sensor" : {#012    "id" : "CAMERA_0",#012    "type" : "Camera",#012    "description" : "\"Entrance of Garage Right Lane\"",#012    "location" : {#012      "lat" : 45.293701446999997,#012      "lon" : -75.830391449900006,#012      "alt" : 48.155747933800001#012    },#012    "coordinate" : {#012      "x" : 5.2000000000000002,#012      "y" : 10.1,#012      "z" : 11.199999999999999#012    }#012  },#012  "analyticsModule" : {#012    "id" : "XYZ",#012    "description" : "\"Vehicle Detection and License Plate Recognition\"",#012    "source" : "OpenALR",#012    "version" : "1.0",#012    "confidence" : -0.10000000149011612#012  },#012  "object" : {#012    "id" : "-1",#012    "speed" : 0.0,#012    "direction" : 0.0,#012    "orientation" : 0.0,#012    "person" : {#012      "age" : 45,#012      "gender" : "male",#012      "hair" : "black",#012      "cap" : "none",#012      "apparel" : "formal",#012      "confidence" : -0.10000000149011612#012    },#012    "bbox" : {#012      "topleftx" : 757,#012      "toplefty" : 468,#012      "bottomrightx" : 808,#012      "bottomrighty" : 579#012    },#012    "location" : {#012      "lat" : 0.0,#012      "lon" : 0.0,#012      "alt" : 0.0#012    },#012    "coordinate" : {#012      "x" : 0.0,#012      "y" : 0.0,#012      "z" : 0.0#012    }#012  },#012  "event" : {#012    "id" : "23b33f9b-d33f-4de4-85c4-7e199109c079",#012    "type" : "entry"#012  },#012  "videoPath" : ""#012}
Aug 22 07:04:32 jnano-desktop systemd[1]: Started Run anacron jobs.
Aug 22 07:04:32 jnano-desktop anacron[5519]: Anacron 2.3 started on 2020-08-22
Aug 22 07:04:32 jnano-desktop anacron[5519]: Normal exit (0 jobs run)

syslog is populated by the logs of AMQP messages. The only error that I see is the usual one about nvrm channel.
jnano-desktop nautilus-autostart.desktop[3884]: Error: Can't initialize nvrm channel

The last log of kern.log is one day before the crash:

Aug 21 13:44:23 jnano-desktop kernel: [167356.820149] Extcon AUX1(HDMI) enable
Aug 21 13:44:23 jnano-desktop kernel: [167356.820352] tegradc tegradc.0: unblank
Aug 21 13:44:23 jnano-desktop kernel: [167356.820360] tegradc tegradc.1: blank - powerdown
Aug 21 13:44:24 jnano-desktop kernel: [167357.047037] tegradc tegradc.0: unblank
Aug 21 13:44:24 jnano-desktop kernel: [167357.047047] tegradc tegradc.1: blank - powerdown
Aug 21 18:35:52 jnano-desktop kernel: [184845.024389] EXT4-fs (mmcblk0p1): error count since last fsck: 6
Aug 21 18:35:52 jnano-desktop kernel: [184845.030444] EXT4-fs (mmcblk0p1): initial error at time 1597765637: htree_dirblock_to_tree:991: inode 464593: block 1592264
Aug 21 18:35:52 jnano-desktop kernel: [184845.041618] EXT4-fs (mmcblk0p1): last error at time 1597767749: htree_dirblock_to_tree:991: inode 285072: block 1059628

Now I am running the Jetson Nano from the SSD. The SD card is still present to enable to boot form USB3.0.
I have just restarted the system. Dmesg shows a warning:

[ 3110.981631] Extcon AUX1(HDMI) enable
[ 3110.982136] tegradc tegradc.0: unblank
[ 3110.982144] tegradc tegradc.1: blank - powerdown
[ 3111.415873] tegradc tegradc.0: unblank
[ 3111.415885] tegradc tegradc.1: blank - powerdown
[ 3267.330298] nvmap_alloc_handle: PID 8237: deepstream-test: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant.

Do you have any idea about what might have caused the crash?
Or what should I monitor to understand the cause of a possible next crash?

Thanks!!!

Connect a serial console and see if any messages appear. (When the kernel crashes it may be unable to save the message to a disk log, but succeed in writing it to a serial console.)

Thanks @nvidiadev1,

I will do it as soon as possible.

Meanwhile I had another crash followed by a reboot.
The system was able to record something on the kern.log:

Aug 23 21:54:11 jnano-desktop kernel: [ 7350.815505] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:50   [ERR]  PRI timeout: ADR 0x00400120 READ  DATA 0x00000000
Aug 23 21:54:11 jnano-desktop kernel: [ 7350.827125] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:56   [ERR]  FECS_ERRCODE 0xbadf1301
Aug 24 00:55:37 jnano-desktop kernel: [18236.750629] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:50   [ERR]  PRI timeout: ADR 0x00400120 READ  DATA 0x00000000
Aug 24 00:55:37 jnano-desktop kernel: [18236.762376] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:56   [ERR]  FECS_ERRCODE 0xbadf1301
Aug 24 01:24:53 jnano-desktop kernel: [19992.919393] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:50   [ERR]  PRI timeout: ADR 0x00400120 READ  DATA 0x00000000
Aug 24 01:24:53 jnano-desktop kernel: [19992.931149] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:56   [ERR]  FECS_ERRCODE 0xbadf1301
Aug 24 03:05:01 jnano-desktop kernel: [26000.704886] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:50   [ERR]  PRI timeout: ADR 0x00400120 READ  DATA 0x00000000
Aug 24 03:05:01 jnano-desktop kernel: [26000.716523] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:56   [ERR]  FECS_ERRCODE 0xbadf1301
Aug 24 03:50:46 jnano-desktop kernel: [28745.698754] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:50   [ERR]  PRI timeout: ADR 0x00400120 READ  DATA 0x00000000
Aug 24 03:50:46 jnano-desktop kernel: [28745.710504] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:56   [ERR]  FECS_ERRCODE 0xbadf1301
Aug 24 05:07:18 jnano-desktop kernel: [    0.000000] Booting Linux on physical CPU 0x0
Aug 24 05:07:18 jnano-desktop kernel: [    0.000000] Linux version 4.9.140-tegra (buildbrain@mobile-u64-3456) (gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision

However, I find quite strange that the reboot happens after more than 1 hour from the error. Maybe that is not the error triggering the reboot?

Thanks again!!

That’s what I’d guess. It could have the same root cause though.

Hi @nvidiadev1,

I was able to capture some new information through the serial console as you have suggested:

[15952.473434] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:50   [ERR]  PRI timeout: ADR 0x00400120 READ  DATA 0x00000000

[15952.485185] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:56   [ERR]  FECS_ERRCODE 0xbadf1301

[21466.007576] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:50   [ERR]  PRI timeout: ADR 0x00400120 READ  DATA 0x00000000

[21466.019349] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:56   [ERR]  FECS_ERRCODE 0xbadf1301

[23897.931833] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:50   [ERR]  PRI timeout: ADR 0x00400120 READ  DATA 0x00000000

[23897.943592] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:56   [ERR]  FECS_ERRCODE 0xbadf1301

[24864.196782] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:50   [ERR]  PRI timeout: ADR 0x00400120 READ  DATA 0x00000000

[24864.208677] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:56   [ERR]  FECS_ERRCODE 0xbadf1301

[26605.713341] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:50   [ERR]  PRI timeout: ADR 0x00400120 READ  DATA 0x00000000

[26605.725247] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:56   [ERR]  FECS_ERRCODE 0xbadf1301

[28471.093470] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:50   [ERR]  PRI timeout: ADR 0x00400120 READ  DATA 0x00000000

[28471.105101] nvgpu: 57000000.gpu                  gk20a_ptimer_isr:56   [ERR]  FECS_ERRCODE 0xbadf1301

[33548.939731] ------------[ cut here ]------------

[33548.944551] WARNING: CPU: 1 PID: 4859 at /dvs/git/dirty/git-master_linux/kernel/nvgpu/drivers/gpu/nvgpu/gk20a/gk20a.c:64 __gk20a_warn_on_no_regs+0x34/0x50 [nvgpu]

[33548.960799] ---[ end trace 6731045601169df0 ]---

[33548.970369] nvgpu: 57000000.gpu           __nvgpu_check_gpu_state:56   [ERR]  GPU has disappeared from bus!!

[33548.980210] nvgpu: 57000000.gpu           __nvgpu_check_gpu_state:57   [ERR]  Rebooting system!!

[33548.990629] EXT4-fs warning (device sda1): ext4_end_bio:313: I/O error -5 writing to inode 11141551 (offset 618496 size 4096 starting block 29526112)

[33549.003999] Buffer I/O error on device sda1, logical block 29525854

[33549.010260] Buffer I/O error on device sda1, logical block 29525855

[33549.016547] EXT4-fs warning (device sda1): ext4_end_bio:313: I/O error -5 writing to inode 11141515 (offset 9150464 size 4096 starting block 29585851)

[33549.029996] Buffer I/O error on device sda1, logical block 29585593

[33549.036250] Buffer I/O error on device sda1, logical block 29585594

[33549.042868] JBD2: Detected IO errors while flushing file data on sda1-8

[33549.049576] Aborting journal on device sda1-8.

[33549.054354] JBD2: Error -5 detected when updating journal superblock for sda1-8.

[33549.056453] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal

[33549.056457] EXT4-fs (sda1): Remounting filesystem read-only

[33549.056462] EXT4-fs (sda1): previous I/O error to superblock detected

[33549.162549] EXT4-fs warning (device sda1): dx_probe:743: inode #524291: lblock 0: comm (spawn): error -5 reading directory block

[33549.174818] EXT4-fs warning (device sda1): dx_probe:743: inode #1048577: lblock 0: comm systemd-journal: error -5 reading directory block

[33549.185662] EXT4-fs error (device sda1): ext4_find_entry:1441: inode #529938: comm colord: reading directory lblock 0

[33549.185677] EXT4-fs (sda1): previous I/O error to superblock detected

[33549.201914] EXT4-fs error (device sda1): ext4_find_entry:1441: inode #14155777: comm (umount): reading directory lblock 0

[33549.212764] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:463  [ERR]  failed to host gk20a to submit gpfifo

[33549.212767] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:464  [ERR]  Xorg

[33549.216156] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:463  [ERR]  failed to host gk20a to submit gpfifo

[33549.216159] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:464  [ERR]  compiz

[33549.216220] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:463  [ERR]  failed to host gk20a to submit gpfifo

[33549.216222] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:464  [ERR]  compiz

[33549.216569] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:463  [ERR]  failed to host gk20a to submit gpfifo

[33549.216571] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:464  [ERR]  compiz

[33549.231994] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:463  [ERR]  failed to host gk20a to submit gpfifo

[33549.231997] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:464  [ERR]  Xorg

[33549.252360] reboot: Restarting system

[0000.159] [L4T TegraBoot] (version 00.00.2018.01-l4t-80a468da)

[0000.165] Processing in cold boot mode Bootloader 2

[0000.169] A02 Bootrom Patch rev = 1023

[0000.173] Power-up reason: software reset

[0000.177] No Battery Present

[0000.179] pmic max77620 reset reason

[0000.183] pmic max77620 NVERC : 0x0

[0000.186] RamCode = 0

[0000.188] Platform has DDR4 type RAM

[0000.192] max77620 disabling SD1 Remote Sense

[0000.196] Setting DDR voltage to 1125mv

[0000.200] Serial Number of Pmic Max77663: 0x1235e9

[0000.208] Entering ramdump check

[0000.211] Get RamDumpCarveOut = 0x0

[0000.214] RamDumpCarveOut=0x0,  RamDumperFlag=0xe59ff3f8

[0000.219] Last reboot was clean, booting normally!

[0000.224] Sdram initialization is successful   

This seems to be reason for the Jetson Nano is rebooting.
Do you have any suggestion about what that might be related to?

Thanks again!!!

Here seems to be the point in which the system is rebooting.

I do not really have hints about what might be the cause.
I am not able to find any information online.

I also have tried to leave the Jetson running for more than 48h with serial logger ON. I wanted to verify that any rebooting was happening during idle state. Nothing happened. The Jetson is still ON without any entry in the serial log.

Might that be a problem with the drivers? @WayneWWW do you have any suggestion?

Thank you very much in advance

I have again logged a reboot after 1.5 hours.

The error message is still the same:

Ubuntu 18.04.4 LTS jnano-desktop ttyS0



jnano-desktop login: [   32.174313] EXT4-fs (mmcblk0p1): warning: mounting fs with errors, running e2fsck is recommended

[  128.097698] nvmap_alloc_handle: PID 4717: deepstream-test: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant. 

[ 5533.578835] ------------[ cut here ]------------

[ 5533.585041] WARNING: CPU: 3 PID: 4918 at /dvs/git/dirty/git-master_linux/kernel/nvgpu/drivers/gpu/nvgpu/gk20a/gk20a.c:64 __gk20a_warn_on_no_regs+0x34/0x50 [nvgpu]

[ 5533.607280] ---[ end trace a8c2b4d37b753354 ]---

[ 5533.631065] nvgpu: 57000000.gpu           __nvgpu_check_gpu_state:56   [ERR]  GPU has disappeared from bus!!

[ 5533.640990] nvgpu: 57000000.gpu           __nvgpu_check_gpu_state:57   [ERR]  Rebooting system!!

[ 5533.652303] EXT4-fs warning (device sda1): ext4_end_bio:313: I/O error -5 writing to inode 11141551 (offset 913408 size 4096 starting block 29526184)

[ 5533.665681] Buffer I/O error on device sda1, logical block 29525926

[ 5533.671956] Buffer I/O error on device sda1, logical block 29525927

[ 5533.678318] EXT4-fs warning (device sda1): ext4_end_bio:313: I/O error -5 writing to inode 11141527 (offset 2703360 size 4096 starting block 29585301)

[ 5533.691789] Buffer I/O error on device sda1, logical block 29585043

[ 5533.698065] Buffer I/O error on device sda1, logical block 29585044

[ 5533.704731] JBD2: Detected IO errors while flushing file data on sda1-8

[ 5533.711476] Aborting journal on device sda1-8.

[ 5533.716028] JBD2: Error -5 detected when updating journal superblock for sda1-8.

[ 5533.718298] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal

[ 5533.718300] EXT4-fs (sda1): Remounting filesystem read-only

[ 5533.718305] EXT4-fs (sda1): previous I/O error to superblock detected

[ 5533.804221] EXT4-fs warning (device sda1): dx_probe:743: inode #524291: lblock 0: comm (spawn): error -5 reading directory block

[ 5533.816642] EXT4-fs warning (device sda1): dx_probe:743: inode #1048577: lblock 0: comm systemd-journal: error -5 reading directory block

[ 5533.826443] EXT4-fs error (device sda1): ext4_find_entry:1441: inode #14155777: comm (umount): reading directory lblock 0

[ 5533.828026] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:463  [ERR]  failed to host gk20a to submit gpfifo

[ 5533.828028] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:464  [ERR]  Xorg

[ 5533.839797] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:463  [ERR]  failed to host gk20a to submit gpfifo

[ 5533.839800] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:464  [ERR]  compiz

[ 5533.839855] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:463  [ERR]  failed to host gk20a to submit gpfifo

[ 5533.839858] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:464  [ERR]  compiz

[ 5533.840121] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:463  [ERR]  failed to host gk20a to submit gpfifo

[ 5533.840123] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:464  [ERR]  compiz

[ 5533.841617] EXT4-fs error (device sda1): ext4_find_entry:1441: inode #11141128: comm python3: reading directory lblock 0

[ 5533.842262] EXT4-fs error (device sda1): ext4_find_entry:1441: inode #2: comm python3: reading directory lblock 0

[ 5533.843270] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:463  [ERR]  failed to host gk20a to submit gpfifo

[ 5533.843273] nvgpu: 57000000.gpu       nvgpu_submit_channel_gpfifo:464  [ERR]  Xorg

[ 5533.845248] EXT4-fs warning (device sda1): dx_probe:743: inode #524298: lblock 0: comm unity-panel-ser: error -5 reading directory block

[ 5533.845272] EXT4-fs error (device sda1): ext4_find_entry:1441: inode #11141128: comm unity-panel-ser: reading directory lblock 0

[ 5533.845290] EXT4-fs warning (device sda1): dx_probe:743: inode #524298: lblock 0: comm unity-panel-ser: error -5 reading directory block

[ 5533.845316] EXT4-fs error (device sda1): ext4_find_entry:1441: inode #524295: comm unity-panel-ser: reading directory lblock 0

[ 5533.845331] EXT4-fs warning (device sda1): dx_probe:743: inode #524298: lblock 0: comm unity-panel-ser: error -5 reading directory block

[ 5533.845358] EXT4-fs warning (device sda1): dx_probe:743: inode #524298: lblock 0: comm unity-panel-ser: error -5 reading directory block

[ 5533.845382] EXT4-fs error (device sda1): ext4_find_entry:1441: inode #7340034: comm unity-panel-ser: reading directory lblock 0

[ 5533.845404] EXT4-fs error (device sda1): ext4_find_entry:1441: inode #7340246: comm unity-panel-ser: reading directory lblock 0

[ 5533.845418] EXT4-fs warning (device sda1): dx_probe:743: inode #524298: lblock 0: comm unity-panel-ser: error -5 reading directory block

[ 5533.845434] EXT4-fs warning (device sda1): dx_probe:743: inode #524298: lblock 0: comm unity-panel-ser: error -5 reading directory block

[ 5533.845696] EXT4-fs error (device sda1): ext4_find_entry:1441: inode #1313044: comm unity-panel-ser: reading directory lblock 0

[ 5533.846482] EXT4-fs error (device sda1): ext4_find_entry:1441: inode #1836810: comm unity-panel-ser: reading directory lblock 0

[ 5533.846668] EXT4-fs error (device sda1): ext4_find_entry:1441: inode #1705731: comm unity-panel-ser: reading directory lblock 0

[ 5533.879162] reboot: Restarting system

It says that GPU has disappeard from the bus…
I have not found information about such an error yet.
Do you have any suggestion?

I have found this topic on the forum: Quadro RTX 6000 GPU Cards Disappearing

I think that I will try to change my ATX that is now supplying the Jetson. @njuffa I have seen that you are suggesting that this might also occur because of local brown-outs.
Do you think that this might be the case?

Thanks again!!