Xavier crashes when I touch it

jr.b · July 19, 2023, 4:39am

My Jetson Xavier NX EMMC module crashes when I touch or move the physical board. It also sometimes crashes spontaneously.

Setup:
Jetpack 4.6 with desktop removed
Standard Dev Kit (reproduced the issue on several dev kits)
static aluminum fin heat sink - no fan connected
1 TB ssd drive mounted to /mnt/ssd/ using /etc/fstab
rc.local configured to increase usb buffersize using:
sudo sh -c ‘echo 1000 > /sys/module/usbcore/parameters/usbfs_memory_mb’

logs generated via serial console and dmesg --follow command. I’m new to capturing these logs so some of the logs may not be complete. We have several modules and dev kits and this is the only module that is experiencing this issue. The issue has been documented on multiple dev kits. The module does have a different thermal gap pad material between the chip and the heatsink, but the material has been tested and is not conductiveAny advice on determining whether this is a hardware issue would be appreciated. I am about to attempt to reflash the board to see if the issue persists.

logs from crashes generated by a mild jostling of the board:
dmesg_202307181841.log (128 KB)
dmesg_202307181849.log (141.2 KB)

spontaneous failure log:
dmesg_202307182008.log (75.5 KB)

WayneWWW · July 19, 2023, 4:55am

Hi,

If the board would crash because you “physically moving it”, then I think it could be hardware problem.

linuxdev · July 19, 2023, 7:53pm

I’m also wondering if you are using one of the power supplies provided by NVIDIA, or something else? I could see these causing a problem even if the hardware is ok:

Static electricity.
Some sort of ground loop causing a change in power delivery.

It seems very “overly sensitive”, and it seems like nothing more than the capacitance of being near it might be a problem (which could actually be hardware failure if it is that sensitive, but I suggest first considering if power delivery is correct).

jr.b · July 19, 2023, 8:10pm

I have used multiple power supplies provided by nvidia, and tested that the same power supplies and dev kit carrier boards work with other modules.

I have now reflashed the module twice with two different backup images (which work on other modules that we have) and still seen the issue appear.

It is possible that there was some hardware damage when the module was being shipped. Do you have any advice on how to further narrow down the problem to determine what might have caused it?

linuxdev · July 19, 2023, 8:30pm

Does this issue appear on just the one module? If changing supplies and trying other units results in just that one unit failing, then it is probably RMA time. If any of the other units also have this problem, then it could literally be the wiring of the power socket having incorrect ground setup (there are inexpensive home power socket testing devices to say if it is wired correctly). Anything that is occurring with just the one unit means it is very likely hardware.

I don’t think shipping would cause this. It could affect an electrolytic capacitor if in shipping the unit were to freeze at extremely cold temperatures. Electrolytic capacitors themselves have limited life. But I don’t think there are any electrolytic capacitors on a Jetson.

jr.b · July 19, 2023, 8:35pm

Yes, so far the issue is limited to one module out of five that we have worked with. The only difference with this module was that it was shipped across the country, we used a different thermal gap pad on the chip to sink to the fins, and we flashed using a new backup image. But I reverted to the old image and still see the issue.

linuxdev · July 19, 2023, 8:39pm

I’m guessing NVIDIA will recommend RMA, but it isn’t something I can be certain of. The only thing I can think of is if the thermal pad were too thick, and enough torque is added due to tightening cooling hardware down, then it might change the contacts. If you’ve glued this on, or not applied a lot of pressure on mounting points, then it is doubtful that this is related. Maybe a bent pin or marginal connector contact exists, though that is really stretching for an answer. It could even be a cold solder joint on a ground.

jr.b · July 20, 2023, 11:43pm

Thanks for your help, I contacted support and am starting the RMA process now

system · August 9, 2023, 6:44am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson getting burned with abrupt shutdown Jetson Xavier NX power	10	459	May 17, 2023
AGX Xavier Crashing Issue Jetson AGX Xavier reboot	8	512	April 24, 2023
Xavier AGX eMMC after abrupt power removal? Jetson AGX Xavier reflash , board-design	8	1083	October 18, 2021
Jetson xavier nx freezes after boot into os Jetson Xavier NX boot	13	344	January 30, 2024
Jetson xavier nx repair and schematic Jetson AGX Xavier hw	4	916	July 21, 2022
Nvidia Jetson Xavier Crashes Jetson AGX Xavier boot	5	882	October 18, 2021
Needing an RMA for Jetson AGX Xavier Jetson AGX Xavier	4	40	August 7, 2024
Jetson Xavier Nx Crashes randomly Jetson Xavier NX boot , reflash , cuda , kernel , usb	4	192	June 13, 2024
Dev board issues Jetson AGX Xavier	2	475	October 18, 2021
Jetson Xavier NX randomly freezes and reboot Jetson Xavier NX reboot	6	253	May 22, 2024

Xavier crashes when I touch it

Related topics