In summary, we traced a brownout issue on the Jetsons Xavier AGX to a voltage droop when running Deep Learning models that operated the system near its limits. We are operating in power mode MAXn, but reproduced the brownout once using MODE_30W_ALL.
To prevent this, we replaced the 65W barrel jack power supply that came with the jetson with this one (100W USB-C).
This prevented the brownout from occurring but the Jetson’s carrier board fried after operating near its limits over night. Interestingly, the Jetson wasn’t running any strenuous processes at the moment the failure happened (they had been terminated minutes before).
What we saw was the Jetson not responding when power was applied to the USB port, and the power light flickering on and off when power was applied to the barrel jack. We confirmed that it is the carrier board by switching it out for a known working one and the Jetson worked fine. We have checked the 5V bus, and it seems ok, but we haven’t gotten the chance to check the 3.3V or 1V.
I’m hoping to hear from NVIDIA if MAXn mode is meant to be used in production and if that could be our issue?
If it is safe for production then I’m wondering if you have any ideas about what else we could try to prevent the brownout issue?
Hi, are you testing on devkit? Devkit is for development only, not for production. It can not be used for long time heavy workload.
The MAXN mode has no problem on module with good thermal solution. You can try to find out which components on carrier are broken first.
Hi! Thank you for the response. It’s good to have confirmation that the MAXn mode can be used for production.
Yes, we’re testing on Devkits. We have seen the brownout after running for a short time- 30-45 minutes usually but also sometimes instantly when starting our process and none of the thermal sensors show any abnormal levels (the highest temperature is ~70c). Do you have any thoughts on what else we can try to prevent the Devkits from browning out?
edit: Wondering if this could still be a thermal issue even with the low temps reported?
If it is not thermal issue, then it might be the power supply capability issue. Please read the chapter 5.5.1 DV/Dt Circuit Considerations in Design Guide. There are some suggestion to improve power supply performance.