I have attached an external SSD on my Jetson Xavier, and whenever i try to access the SSD, often times the Xavier becomes unresponsive and reboots. This becomes very frequent at some point.
The Xavier reboots in the middle of training and I am forced to start over each time.
There is a reasonable chance that you are running out of memory. However, you will want to run a serial console and post the log of what goes on just before and during the failure.
For serial console it is just the first serial device which shows up upon plugging the micro-B USB into a host PC (e.g., run “dmesg --follow” on the host, and then plug in the micro-B USB to the Xavier…the first serial device name will look something like “/dev/ttyUSB0”). I like gtkterm (on the host, “sudo apt-get install gtkterm”), and if the device is ttyUSB0, then this would start a connection to serial console:
gtkterm -b 8 -t 1 -s 115200 -p /dev/ttyUSB0
(you would have to use “sudo” if your user is not a member of group “dialout”)
Incidentally, you could run “dmesg --follow” on the serial console itself so this would display logs up to the point of failure. The logs occurring as the program starts (or during failure) would be of use.
Currently i have 27%/8 GB free space, could this be an issue? and if it happens to be a memory issue is it possible to extend the memory? Because i cannot free up any space, all files stored here are related to my current project.
Serial console implies the software running on the host PC (which is working as the display for the embedded system the software talks to over serial UART). There should be some obvious boot messages as the system boots, but it is perfectly reasonable that there is no output during the error (disappointing, but not uncommon…it simply means that if the console is working, then the reboot cause is so sudden that no logging can occur).
I see a lot of serious USB errors, also SATA errors. This leads to these questions:
Is the SSD connected directly, or is the disk using a USB external drive housing?
Does the NUSCENES devkit use a custom carrier board, or is it just software?
If this uses a custom carrier board, did you use the board support package the carrier board comes with?
Are any external USB or disk drive devices (excluding keyboard/mouse) using their own power, or are they drawing power from the Xavier?
The errors only say that USB-C and the SSD connected over that USB-C are having serious problems. The cause could be power delivery. Assuming the SSD is not being self-powered, meaning that the SSD is drawing power from the Xavier, then if you happen to have an externally powered USB HUB, the externally powered HUB would eliminate power consumption as the issue. External power would draw from a different source than the Xavier.
Can you confirm if there are problems in the case of the SSD being independently powered via an external power source? Note that there could have been data corruption on the SSD from previous issues, but what the logs were showing were not file system errors; instead those were USB and SATA errors.
It seems like the SSD was the issue, I copied the data from the SSD to a normal HDD and it works fine without rebooting. I don’t know if the problem is specific to SSD drives when connected the Xavier or the one I have is at fault, I will format it re-add the data to see if it persists.
I’ve test external power supply and the problem still continues to persist… I have run out of ideas now on how to fix this issue. I have used my external drives on other devices and they worked well without any hassle or power surges. I have now come to the conclusion that the Xavier is the culprit.
From what I can see in previous posts USB to the external drive is giving errors. You have tested external power to the drive, and so it is unlikely power delivery is the issue. Using “dmesg --follow”, can you verify that with external power to the drive you still get those same USB errors? I’m guessing you do, but want to verify.
One other possibility is running out of physical RAM. This could would not necessarily mean USB is not an issue, but this would cause a sudden reboot or other failures. You may want to monitor RAM use and see what it appears as at the moment of failure. I’ll suggest installing “htop” (“sudo apt-get install htop”), and monitoring that via serial console. At the moment of failure you should basically be able to see some information on memory use.
If this does not indicate anything new, then someone else will need to find out the reason for the USB errors. One clue which might help is knowing if the actual error changes any for a directly connected USB SSD, versus indirectly attached via the powered HUB.