We have an AGX module with a custom carrier board. Occasionally during field use, the AGX is resetting and I am trying to determine the source of the reset.
The carrier board has a power monitor and if the voltage falls out of an expected range, it will trigger a reset on the VDDIN_PWR_BAD_N input to the AGX.
It is not feasible to instrument this signal and monitor during field use, so I am trying to gauge what happened from the logs.
Is anyone able to offer guidance on whether there are specific log messages to look for if the VDDIN_PWR_BAD_N input to the AGX had been triggered? Additionally, are there other things to look for/implement that would give me finer granularity into the reasons an AGX may have reset?
After the reset and restart, what do you see from:
tegra-pmc: get_secure_pmc_setting: done secure_pmc=0
tegra-pmc: ### PMC reset source: TEGRA_POWER_ON_RESET
tegra-pmc: ### PMC reset level: TEGRA_RESET_LEVEL_L0
tegra-pmc: ### PMC reset status reg: 0x0
Am I correct in thinking that manually cycling power to the system would produce the same log messages?
I believe this is correct, but someone from NVIDIA will need to verify. Keep in mind that the reason for power on reset is not shown, but at least the issue is now narrowed down: You know it is related to actual power. I couldn’t tell you if there is some other way to for example find out if the power rails shut down due to overcurrent…probably not, but it just illustrates that you are also interested in why you got
TEGRA_POWER_ON_RESET, and not just that there was a power reset. Hopefully there is a way to dig further.
As for that way to dig further, I am wondering if monitoring logs with serial console might show something if it is logging at the moment of reset. I don’t know if you have a way to run a serial console to your custom device, but if serial console is working, then you should attempt to have the log running “
dmesg --follow” to see if anything shows up upon failure.
The user in the field mentioned having to manually power cycle the system to restore functionality. I have now confirmed that this will produce the same log message so I may have erroneously indicated a power issue.
If we have a power issue on our carrier board, it would be reflected by triggering the VDDIN_PWR_BAD_N input to the AGX. Are there any log messages associated with triggering this input that I could leverage to differentiate sources of reset?
This I do not know. Someone who works on custom carrier boards would need to answer.
please gather the details with
$ dmesg --follow;
you may also access Jetson AGX Xavier OEM Product Design Guide for reference,
Referencing the OEM product design guide, triggering the VIN_PWR_BAD input will initialize a controlled shut down of the AGX.
Is the VIN_PWR_BAD state read and stored in a location that would be accessible for me to query/log?
you may enable a terminal and gather the details with
$ dmesg --follow;