Jetson Nano - soctherm OC Alarm

At my job, we have been using Jetson Nano 2GB boards to prototype one of our projects. One of our units has been having an issue recently, and I was wondering if anyone could help me understand what might be causing the issue.

For context, our units run a Yolo object detection network, along with a python script that uses opencv to prepare camera frames for our model. We have these prototype units installed at a few of our clients locations, so we can verify our models performance, and gather more video footage to strengthen our model. The issue we have been facing over the past month is that one of our units keeps losing power, even though it is still plugged in to the wall outlet. Pulling the power cable out of the unit and plugging it back will make the unit boot back up, but only temporarily; the unit will lose power after a few days/a week. I did some digging into the kern and syslog, and noticed in the kernel log a lot of these “soctherm: OC ALARM 0x00000001” warnings. In the kernel log, it looks like a new instance of this warning is thrown every second, up until the point where the box loses power again. I looked in the forums, and it looks like this error is telling me that this unit is receiving low voltage? If that is the case, I am trying to figure out what the point of failure could be, whether it is the power supply we are using, or the outlet our client has the unit plugged in to? For reference, we are using this canakit usb c power supply.

Overall, I’m really want to get all the information I can before I troubleshoot this unit with our client. I am wondering if anyone has more information about what this error is signifying, and if it is related to under voltage, what the likely point of failure is (bad power supply, bad outlet, a program is using too much cpu resources, etc.).

Also in case this is useful, here is the most recent kernel log from our unit. As you can see, there is a new soctherm warning every second.
kern.log (42.5 KB)

So only one unit got this problem, all rest can work normally?
IF yes, then suggest to do the RMA, see Jetson FAQ | NVIDIA Developer

The OC alarm on jetson nano, in most cases, indicate you have under voltage issue. Please try with other power adapters.

Thanks for giving me this link. Yes, while this is the only unit that is going offline due to soctherm alarms, I haven’t dug as deeply into the logs on the other units, so they may also have soctherm alarms, just not as often, or not to the point where the unit go offline. This client has had power related issues before, so I’m thinking it is most likely an environment issue, rather than a board issue. If after troubleshooting some more we determine the issue is not environmental, then I will look at the RMA. Thanks

Ok @WayneWWW, thanks for confirming. That’s the conclusion I reached after reading some other replies in the forum. As I mentioned in the other reply, this client has had power related issues before, so I’m thinking we are experiencing an environmental issue. While troubleshooting, though, I will definitely have them try other power adapters, or other outlets if possible. Thanks again!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.