Jetson TK1's seems like hardware failure

Hi, I purchased 2 Jetson TK1’s for use on my project. A couple months of normal use go by and one of them experiences some sort of hardware failure. It was running normally and minimally and all of a sudden it just turned off. And when I try to power it back on, it turns on for a little bit and then turns off shortly after. The display never shows up, I know it’s running because the fan is on. So, I stopped trying with that one and started using my second one. About a month of light use later (today), it experiences the same exact issues as the first one. It turns on for a very short amount of time and then it turns off.

I’m not sure why this is happening. Is there anything you can do to fix this?

avlahos,

Are you able to dump any log from tk1 and see what is going on after boot up?

Hi WayneWWW,

I am not able to dump any log because I can never see anything on the display. The TK1 does not boot up successfully. It doesn’t finish booting up before it turns off.

Have you tried to reflashed your device?

I am curious if you can start it in recovery mode and have it stay up?

Is this on a public network? Most people have their own router and can trust that nothing inside that router is controlled by other people. If you did not change your passwords, and if other people use the same router, then you were probably hacked. With account “ubuntu” still using the “ubuntu” password I’d actually be surprised to see it last a day.

FYI, if the TK1 can remain up with recovery mode, then you could try cloning to see what’s going on.

I can have it stay running in recovery mode. I tried to reflash it, but it fails and turns off after a lot of progress with the following message:

RCM communication completed
BCT sent successfully
sending file: tegra124-jetson_tk1-pm375-000-c00-00.dtb
- 59661/59661 bytes sent
tegra124-jetson_tk1-pm375-000-c00-00.dtb sent successfully
odm data: 0x6009c000
downloading bootloader -- load address: 0x83d88000 entry point: 0x83d88000
sending file: fastboot.bin
- 594363/594363 bytes sent
fastboot.bin sent successfully
waiting for bootloader to initialize
usb read error (71): Protocol error
bootloader failed NvError 0x0
command failure/warning: bootloader download failed 
Failed flashing ardbeg.

This was on my local network, so I don’t think it was hacked.
What do I need to do in order to clone it to see what’s going on?

The jetson also gets VERY hot immediately when turning on.

If you are using a VM that sort of failure is expected. For cloning a TK1 see:
https://elinux.org/Jetson/Cloning

You might want to see about cloning “APP”, which is the rootfs. For debugging you might also see if the entire disk can be cloned:

sudo ./nvflash --rawdeviceread 0 3849216 all.img --bl ardbeg/fastboot.bin --go

If you also have the partition table cloned then it is possible to extract any partition out of the full disk clone via dd. Mostly though consider it a test of whether memory can be read from the entire eMMC. This will take a lot of disk space (rootfs is usually about 15GB, the entire disk would be about 16GB).

The “very hot” is worrisome. Start with just the APP/rootfs clone. Despite how long it will take I would suggest watching it in person and not leaving it alone. Disconnect any peripherals you have on it other than the USB used for flash.

FYI, you might have this through JetPack, but cloning is basically the driver package.

I am not using a VM.

What specifically should I do after cloning? I don’t necessarily want to extract any data or files from the device. I just want to know how to get it to work properly again.

Also, I just tried to clone and it failed with this message.

RCM communication completed
downloading bootloader -- load address: 0x83d88000 entry point: 0x83d88000
sending file: ardbeg/fastboot.bin
- 594363/594363 bytes sent
ardbeg/fastboot.bin sent successfully
waiting for bootloader to initialize
usb read error (71): Protocol error
bootloader failed NvError 0x0
command failure/warning: bootloader download failed

The clone was to see if some of the more basic services were working, along with whether all of the eMMC is readable. It looks like it isn’t. I would expect a USB error with a VM, but not with a regular Linux host.

You could try to flash it again, but if it can’t read eMMC and can’t flash, then there is probably a permanent hardware failure.

No luck with trying to flash it again. I get the same error that I got 4 replies ago. So this seems like my TK1 is defective. Is there a process for replacing defective hardware?

See “RMA” near the top of this URL:
https://devtalk.nvidia.com/default/topic/793798/embedded-systems/some-jetson-web-links/

Thank you! I have submitted an RMA.

Please try to submit RMA. This issue and error message are also seen on some customer’s tk1.