TX1 crashes after JetPack 2.3 eMMC upgrade

Hello all,

After putting on my TX1 it no longer boots correctly. The only thing that changed is I had a UHD monitor attached instead of a HD monitor.

Without monitor the TX1 also no longer boots correctly, it crashes during user-space init (kernel to userspace handover).

I then tried the latest release from JetPack 2.3 but the TX1 keeps crashing during the user space init, the HDMI display gets corrupted once user-space starts and a kernel oops occurs.

Without HDMI monitor, the kernel oops also occurs.

The JetPack 2.3 installation is on a Ubuntu 14.04.3 LTS host, the filesystem is ext4.

I have a serial cable attached to capture the boot log.

Can I re-do the flash.sh step to eMMC from the terminal (How exactly?)
Should I consider RMA? (I am providing product development support around TX1, this blocks the project).

Thanks, Leon.

tegra.log (154 KB)

The boot log was truncated when I copy/pasted it, I have now attached the full log as a file in post #1.

Since you have serial console and it sounds like it boots ok when the monitor is not connected, I’d be curious whether the nVidia-specific drivers appear to be valid:

sha1sum -c /etc/nv_tegra_release

This catches my attention near the start of the failure:

[FAILED] Failed to s[   11.648239] Library at 0x7f7a3f9a80: 0x7f7a38f000 /lib/aarch64-linux-gnu/libc-2.23.so
tart Load/Save RF Kill Switch Status.

That makes me wonder about wireless issues, which in turn makes me think about the older WiFi firmware issue. See:
https://devtalk.nvidia.com/default/topic/901180/my-jetson-tx1-is-not-able-to-connect-to-wifi/

I’m thinking that under serial console you might check that other thread to find out if perhaps you got one of the older modules with the mismatched WiFi firmware…in particular, run the check tool:
http://developer.download.nvidia.com/embedded/jetson/TX1/tools/wifi_config_check/wifi-config-check.sh

I have doubts that this would be the cause when doing a new flash with a recent L4T, but it’s easy to check and failure during WiFi init bumps the odds up for WiFi firmware being the problem.

On the other hand, I’d probably just reflash again (I’d recommend R24.2). Regardless of whether you use JetPack or not, it might be wise to download a fresh install copy and make sure it is properly unpacked (sample rootfs would require unpack with sudo or root authority). To just flash without JetPack you could unpack the driver package, then in the rootfs directory unpack sample rootfs (sudo), cd back one directory and “sudo ./apply_binaries.sh”, then flash:

sudo ./flash.sh -S 14580MiB jetson-tx1 mmcblk0p1

I will verify WiFi to make sure. However, it doesn’t really point in that direction AFAIK:

Just to make sure all the facts so far:

  • The board came with L4T 24.1 and I had never updated it. When I powered it on, after one month of no-use, the kernel crashed. The only thing that had changed, was a UHD monitor instead of HD.

  • The kernel crashed somewhere into starting user-space. Up to then the HDMI output (console framebuffer) looked good, but suddenly it got corrupted, 90% still readable but garbage on-screen.

  • So I suspected the UHD monitor-compatibility and upgrades to JetPack 2.3 (L4T 24.2).

  • Same problem persists.

  • After writing post #1 I installed Ubuntu 14.04 64-bit cleanly on a 250GB disk on another host system and did the complete JetPack installation and eMMC flash procedure again.

  • Same problem persists.

Is there a known-good binary image (not host-generated) that I can copy to the eMMC over USB?

I would like to first exclude board problems.

There is no “supplied” binary image. The tools do allow you to create, backup, or install binary images. Initially the flash.sh program takes the rootfs subdirectory, combines it with some boot edits, and creates loopback mountable “bootloader/system.img”. This gets moved to “bootloader/system.img.raw”, and a compressed version becomes “bootloader/system.img”. If flash has the “-r” reuse system image option, then rootfs is ignored and any system.img is left in place and used exactly as it was without disturbing it. Note that the compressed/raw version is not required, the uncompressed raw image can be renamed to system.img and flash will work. Should you have a binary image, just use the “-r” reuse option and place it as “bootloader/system.img”.

To extract an image for backup and later “-r” restore, clone the root partition (you can clone other partitions as well):
https://devtalk.nvidia.com/default/topic/898999/jetson-tx1/tx1-r23-1-new-flash-structure-how-to-clone-/post/4784149/#4784149

To create your own image from scratch, you can take any file of the right size, cover it with loopback, and format it as ext4. Common sizes are 15288238080 bytes ("-S 14580MiB" == 1024102414580) or 14680064 bytes ("-S 14GiB" == 1410241024*1024). This could be created by “dd” with input file “/dev/zero” and appropriate block size and count. losetup is used to cover this file with loopback, and then mkfs.ext4 is applied to the device special file. Whatever you put on that becomes an image which is placed on the Jetson with bit-for-bit exact copy.

It is unlikely that you would get any different response from a normal flash versus restore from a cloned image…something else is likely going on either from side effects of the crash or a bug in the parsing of the new monitor’s EDID data. I’d strongly suggest either testing first with the old monitor or flashing again to get a fresh image. Other than taking a lot of disk space (and time if not using the compressed image) there isn’t much to the flash process. Somewhere along the way if nothing is found flashing would still be one of the steps before deciding to RMA or not. The RF Kill message could be related to firmware, and incorrect firmware could cause odd things to happen even in other parts of the operating system. WiFi is known to have had an early version of firmware mismatch…if you have one of those older versions of firmware there is no telling how it might interact with other parts of the system.

Thank you. Too bad there is no “supplied” or “known-good” image. This would help tremendously in finding user errors (such as disk full condition, which is on the edge of what I would call a user error.)

  • I tried reflashing using a clean install of JetPack on clean Ubuntu 14.04 using the JetPack GUI default flash procedure.

  • I then connected the identical HD monitor that had always worked.

  • Same problem persists.

Let me check WiFi firmware when I with the device again.

Thanks for the suggestions and explanation on how to create the rootfs manually. I may want to disable some services / startup items to see which one is causing the system to fail.

  • The system also crashes without monitor.

  • The system crashes at a random point in time. I have captured a log where the system came up in Ubuntu Desktop (booting this time with monitor) but it crashes after 2 minutes of idle time.

  • I have captured /etc/nv_tegra_release

boot_crash_after_2mins.txt (84.8 KB)

  • WiFi firmware seems OK:
ubuntu@tegra-ubuntu:~$ sudo ./wifi-config-check.sh 
[sudo] password for ubuntu: 

Checking MAC address configuration
MAC address configuration is correct.
Checking WiFi settings next

=========================================
Your developer kit is functioning correctly.
=========================================
Please check file: /home/ubuntu/wifi-config-check-log for more details
=========================================

Looks like the newer attachments are not coming through yet. Meanwhile, it looks like the original log basically starts failure messages at a time when the login manager and dbus session is starting (multi-user.target for text mode likely does not have a problem, it is at the very start of the login software for GUI that the problem starts at…Ubuntu seems to do WiFi setup here as well, so this might explain why the RF Kill message shows up while firmware is validated as correct).

Although I’m going to end up suggesting it is easiest to just test by reflash using non-JetPack for purely driver plus sample rootfs, here is something to test for further debugging if you want to look closer at what is going on (and in case the issue remains even after a careful command-line-only flash). You may want to skip to just flashing as mentioned at the end of this if you do not want to explore more about the issue.

Since you have a serial console you will be able to select from multiple boot entries at startup. Even so, file edits under serial console can be difficult due to line wrap being more primitive under serial console…are you able to use ssh to log in and edit files, or does the crash prevent this? What I’d like to do is add this to the end of the “APPEND” key/value pair of “/boot/extlinux/extlinux.conf” (and this is one very long line which must remain a single very long line):

sytemd.unit=multi-user.target

Or, as an alternative to editing the original boot entry, you could create a second boot entry in extlinux.conf which differs only from the original by labels and addition of that same APPEND edit, then boot to the alternate entry. What that does is tell the system to boot to purely text mode and never load anything related to GUI. If booting to text mode without ever touching GUI results in no error, then the source of the error has been narrowed down on. You can enter or leave GUI mode by telling systemd to switch to “multi-user.target” (text mode) or “graphical.target” (GUI). Kernel command line “systemd.unit=” is one way to do this, but a slight modification lets you do the same thing in an interactive shell (which doesn’t do much good if you can’t get to a shell…thus the reason for extlinux.conf edits):

sudo systemctl isolate graphical.target
# OR:
sudo systemctl isolate multi-user.target

If you are interested in doing a minimal flash to get as close as possible to validating hardware without use of JetPack, here are the instructions…

Download the driver package and sample rootfs. R24.2 is here:
https://developer.nvidia.com/embedded/linux-tegra

Make sure you have about 25GB of spare space after download completes (it may actually be a bit less space required, but a bit of safety for temporary files is good). Make sure the partition is ext4 (“df -H -T .” while in the relevant directory will confirm).

Unpack the driver package without sudo. cd into the Linux_for_Tegra subdirectory, then into rootfs. Unpack sample rootfs here using sudo. cd back one directory to Linux_for_Tegra, and run “sudo ./apply_binaries.sh”. This is now set up for flash once the Jetson is in recovery mode.

Attach the micro-B USB cable, put the Jetson in recovery mode. Verify the host sees this via “lsusb -d 0955:7721” (it should show something).

Start the flash:

sudo ./flash.sh -S 14580MiB jetson-tk1 mmcblk0p1

This will take some time. At some point “bootloader/system.img.raw” will appear, and its exact byte size should be 1458010241024 bytes, or 15288238080 bytes.

The Jetson should be rebooted when done, and network address should be from DHCP query…ping should work. Serial console should work. Hopefully the GUI stage startup will not be an issue. If the issue remains (an actual crash versus simple video setup problems), then it might be time to RMA, but it would be unusual for this kind of problem to be hardware related when text mode works correctly (which is why a careful reflash is an important test should multi-user.target work correctly…even if multi-user.target fails you’d want to do this reflash before RMA as a final test).

Thanks, today I was already trying to modify the kernel command line by adding the “single” keyword, which I expect to a root-only minimal init level. I am not very familiar with systemd yet so thanks for the suggestions.

I cannot modify that line on the target via serial console, because typically the system has crashed before that. (I seldomly get to the login at all.)

  • It crashes randomly during user-space init (so far), mostly immediately in user-space, and within seconds to minutes.

In both cases however, I was using the ./flash.sh approach, where I had a copy of the rootfs/, but I found ./flash.sh 1) overwrites the extlinux.conf 2) did not respect the CMDLINE_ADD environment variable.

So I am finding a way to modify the kernel command line from my host currently still.

  • I also find it strange the system crashes in user-space only. (this would suggest s/w problem).
  • I find it strange the board started to crash after one month of shelf time without any modifications to the eMMC content. It had never failed during the weeks before. (this would suggest h/w problem).

Unless you use the “-r” option in flash.sh for re-using an image, the rootfs gets boot parameters and boot-related files edited. You could flash once, and then save system.img.raw as system.img, and loopback mount system.img…followed by edit of extlinux.conf, then flashing with “-r”. Flash would take a long time since the image being sent would be uncompressed.

I think when flashing to mmcblk0p1 that the template comes from “bootloader/t210ref/p2371-2180-devkit/extlinux.conf.emmc”. It might be possible to add a second boot entry there, or modify the default entry, and have this automatically be part of the flash (be sure to keep an unmodified version if you edit this).

Memory can possibly fail in such a way that re-flash solves the issue…or not. Basically you have to flash to find out. Other hardware issues tend to not be fixed by flash.

I have tried all the suggestions. Thanks again.

  • The board is randomly failing to start some services (all the earlier logs show this).

  • It doesn’t reach text mode in most cases, sometimes it does.

  • It fails within first two minutes almost for 99% of all boots.

  • Reflashing the uncompressed (loopback generated) image from just the sample rootfs with drivers fails similar. My host fs is ext4 with over 200 GB free.

I have submitted a RMA request.