Tx1 : Several read/writes results in some type of crash/system hanging with both SDcard and external USB SSD Drive

Hello We are using a Nvidia TX1 Dev Kit board with latest JetPack and Ubuntu 18.04 installed. Initially we were building some files located on an SDCard (where there are several read and writes for a longer period) and various points the access to the SDCard is lost and it appears to take some of linux with it (i.e., df -h command hangs etc) and only a reboot brings back everything. When I looked on the forum, someone with a similar issue with TX2 and SDCard on this forum had a response that it was because SDCards are not robust enough so that is why that happens. Then we switched to a robust external USB SSD Drive, and the same thing happens. It appears as though the Nvidia drivers are crashing in this scenario. Is there a fix for this available or what can we update in the Nvidia BSP so we can access either an SDCard or an external USB SSD drive in a stable way without your software crashing and taking enough of liunx with it where the board must be rebooted? Thank you.

How to reproduce this issue? Any steps?
With JetPack 4.6.3 or others?
Any logs can be provided?

I want to add that serial console will be able to log even when it otherwise crashes. Running “dmesg --follow” on serial console prior to the “df -h” command should show something useful.

Hi Sorry I was out sick with the flu. I have not tried other JetPacks and only the most recent JetPack with SDK Manager that updated the reference board to 18.04. We were using ROS2 colcon to build the software on first the SDCard and then the external SSD drive. So there were a lot of immediate read and write accesses to both of a lot of files over a long period when this happens. I think emulating the same process for build as it relates to accessing both reading and writing several files to easily recreate the problem at Nvidia ( doing a lot of read/write of files accesses over at least several minutes) . I can recreate it and send you logs. How to you want me to pull the logs for you (where are the files you need from ubuntu, what are they called, or are there commands I can run to capture more information for you?) Thank you.

df -h hangs after this happens. The board only recovers after a reboot. So I will now run dmesg --follow before hand and see if that helps capture more information. Are there any other logs I can export and upload here that would help or any other commands? Thank you again.

Running this on the serial console should give you more logging messages. Serial console is very important in cases like this since it has very few drivers it is dependent upon.

Hello. Please see uploaded txt output files in zip file attached. The tx1 results1 file is the “dmesg --follow” output before the problem happens where df -h hangs in the middle of the read/write accesses to the drives somehow brings part of linux down on the TX1. The tx1 results 2 file is the “dmesg --follow” output after the problem happens and I run the df -h command and it hangs. Let me know if you need me to send you anything else. The 3rd file is just showing how access to the drive just crashes mid processing.
archive.zip (42.5 KB)
Thank you.

Unfortunately those logs don’t add anything. One of the points of an actual serial console log (versus on command line or GUI of a regular login) is that serial console does not depend on any kind of video or disk driver; the monitor, keyboard, and hard drive drivers can all crash and burn, and serial console will continue to run (including log messages of failure conditions). Whatever is going on is also crashing your method of viewing log messages, so correct me if I am wrong, but it does not look like the log was from the serial UART port (it it was, then the error is so severe that everything instantly dies). It is very unusual that the serial console won’t get some message (you’d run dmesg --follow just like what you had, but you’d get more output on the serial UART when all is crashing and burning).