Jetson Nano reboots when running my python program as systemd service

I have a python program I run as a service. I made some changes to it yesterday and now it seems to cause the nano to reboot regularly. I cannot find anything in the journalctl logs about what’s is causing it. If I stop the service - the nano goes back to being stable so it must be the python program…

Any ideas on how you track down this sort of behaviour on a nano? I’ve looked in /var/crash and there is nothing as well.

ps. I run the nano headless (in case that matters)
pps. The python program runs a subprocess which is a deepstream c application. This application has not changed so I don’t think its the issue.

?

Hi,

Sometime power starvation or out of memory can cause Nano to reboot.
Could you try the python program on the console rather than systemd to see if issue also occurs?

Thanks.

It could always be the power starvation however I’ve been running these programs and a deepstream app for a long time and never had any restarts.
I would have expected there to be some crash logs or something to lend a hint? Does the nano not report anything on crashes/restarts - other ubuntu systems seem to??

Not sure on a Nano, but do you get any information from this:
cat /sys/kernel/pmc/tegra_reset_reason

I see the following:

jason@nano-1:/sys/kernel/pmc$ cat tegra_reset_level:

TEGRA_RESET_LEVEL_WARM

jason@nano-1:/sys/kernel/pmc$ cat tegra_reset_reason:

TEGRA_POWER_ON_RESET

Is there somewhere where I can find the definitions of all these codes?

TEGRA_POWER_ON_RESET - is that the issue you’re describing: power starvation? Or just the fact that I rebooted by pulling the power cable out?

Hi,

We want to check if this issue is caused by GPU rail gating.
Could you check if this commands helps?
(Please try to disable railgate before Nano rebooting.)

echo 0 > /sys/devices/57000000.gpu/railgate_enable

If the issue goes on, we will need to reproduce this issue in our environment.
Would you mind to share the python application and the change applied to the systemd?

Thanks.

This value already contains 0.

I’ve removed some code that may have been causing a thread deadlock situation and added some more logging statements. Started it last night and its been running fine for over 12 hours now…

I can’t imagine why a python app could crash the entire system though and what does “TEGRA_POWER_ON_RESET” actually mean?

Are you able to point me toward any documentation so I can research what these values mean for future use and also what GPU railgating means? What value would I expect to see if my nano was not able to draw enough power from the power supply…?

Hi,

We have an internal discussion for this issue today.
Would you mind to try to use rt.local as this comment:

We have confirmed that multimedia pipeline is working with rt.local.
Thanks.