I’d like to ask around about an issue we’re experiencing with the Nvidia Jetson TX2 units. We use these units commercially and have about 80 TX2s running non-stop in the field for some time now (for almost a year).
The issue our customers experience is the sudden ‘disappearance’ of TX2 units from the network. Further investigation of software and OS logs show that the unit is completely dead/frozen during the disappearance period. A power cycle brings the unit back to life as if it were just shut down before.
The symptoms, as they were recorded up until now slightly seem to point in the direction of the power supply. E.g. as if a slight fluctuation in the voltage causes the TX2 unit to lock up completely. Even though we use very reliable power supplies, the issues keep coming up. Perhaps anomalies in the power grid before the power supply causes the power supply to dip/surge a bit – but we are not sure about this. Below I’ve listed the power supplies we use in the field.
Some of the reasons our gut feeling makes us think the power supply / power handling of the TX2/carrier boards is the source of the issue:
- The units abruptly and completely disappear from the network.
- We've had some experiences in the early stages of development of our product when we used cheap adapters (rated 12V/15W, i.e. on the virge of peak power usage). Sometimes our TX2 units would just suddenly turn off without an apparent reason (and sometimes even wouldn't turn on for a while -- which was very strange). When we changed to proper high wattage power supplies these problems went away.
- We've had a situation where a customer used the unit in combination with a solar-powered power buffer. This power buffer would sometimes 'disturb' other local devices in the same power grid. We saw the issue with this unit often, and the issue went away when the unit was placed in another power grid position.
Now, the first thing that’s not clear is whether the source of the issue is the TX2 unit itself, or the carrier board. We use two kinds of carrier boards in the field:
- ConnectTech Orbitty, see: http://connecttech.com/product/orbitty-carrier-for-nvidia-jetson-tx2-tx1/
- Aetina ACE-N510, see: https://www.aetina.com/products-detail.php?i=234
Some of the power supplies we use:
- Meanwell RS25-12, see: https://www.meanwell-web.com/en-gb/ac-dc-single-output-enclosed-power-supply-output-rs--25--12
- Meanwell EDR-75-12, see: https://www.meanwell-web.com/en-gb/dinrail-powersupply/all-application/ac-dc/12/ac-dc-industrial-din-rail-power-supply-output-12v-edr--75--12?returnurl=%2fen-gb%2fdinrail-powersupply%2fall-application%2fac-dc%2f12%2f%23edr-75-12
- Meanwell NDR-120-12, see: https://www.meanwell-web.com/en-gb/dinrail-powersupply/all-application/ac-dc/12/ac-dc-single-output-industrial-din-rail-power-ndr--120--12?returnurl=%2fen-gb%2fdinrail-powersupply%2fall-application%2fac-dc%2f12%2f%23ndr-120-12
Some other notable remarks:
- It happens to any unit, not just a few single ones that always fail.
- The units live in the same network and fail intermittently, i.e. there seems no relation to events in the network.
- The issue has occured in about 15%..20% of all units, which is quite devastating for our reliability figures.
Is there by any chance anyone who has seen similar issues with the TX2 units? Or, can someone point us in the right direction as to where to search for the culprit? Any help is appreciated!
Thanks in advance,