Jetson TK1 R21.2 Upgrade Failure

Hello all I am having some issues getting R21.2 up and running; hoping one of you can help!

Using: https://developer.nvidia.com/linux-tegra-rel-21
I followed the quick start guide and all seemed well until the reboot which ends in a kernel panic and an exception trace.

Boot Log dump is big so find it here:
[url]http://tempsend.com/E5FA3D562D[/url]

So common problems I believe that are not the case:
Driver package download matches SHA1 checksum
Was driver package built as root: Yes
Used a usb 2.0 port from a Ubuntu 14.04 Release system (no VM)
Used provided USB Cable

Flash Cmd: (tried both of these)
./flash.sh jetson-tk1 mmcblk0p1
./flash.sh -S 14GiB jetson-tk1 mmcblk0p1

Wiring:
Serial to usb with null modem to catch console at boot
Provided USB flash cable
Power

I believe that the problem is in mounting the root due to this:
[ 7.935322] EXT4-fs (mmcblk0p1): couldn’t mount as ext3 due to feature incompatibilities
[ 7.947583] EXT4-fs (mmcblk0p1): couldn’t mount as ext2 due to feature incompatibilities
[ 7.968149] EXT4-fs (mmcblk0p1): recovery complete
[ 7.974804] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
[ 7.986736] VFS: Mounted root (ext4 filesystem) on device 179:1.
[ 7.995537] devtmpfs: error mounting -2
[ 8.001718] Freeing unused kernel memory: 504K (c0b31000 - c0baf000)
[ 8.011382] Kernel panic - not syncing: No init found. Try passing init= option to kernel. See Linux Documentation/init.txt for guidance.

Any ideas …

~Zach

It does look fine until it starts setting up file systems. Try explicitly naming u-boot, rather than using defaults. Create a log while you do this, e.g., run this command:

./flash.sh -S 14580MiB -L bootloader/ardbeg/u-boot.bin jetson-tk1 mmcblk0p1 2>&1 | tee testlog1-r21_2.txt

If the system still fails, you can make the testlog1-r21_2.txt of above available and the install itself can be scrutinized.

FYI, some of the file system error log prior to failure is actually part of a working system as well…these lines in particular seem to be the system checking in order for ext2 then ext3 then ext4…failing for ext2/3 because it is ext4…and then working as ext4:

EXT4-fs (mmcblk0p1): couldn't mount as ext3 due to feature incompatibilities
EXT4-fs (mmcblk0p1): couldn't mount as ext2 due to feature incompatibilities
<i>EXT4-fs (mmcblk0p1): recovery complete</i>

These lines are where a deviation starts between a fresh R21.2 install and your log:

EXT4-fs (mmcblk0p1): <i>recovery complete</i>
EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext4 filesystem) on device 179:1.
devtmpfs: error mounting -2

On my fresh install there is no “recovery complete” line. Following the “Mounted root…” line a working system should show this:

devtmpfs: mounted

The devtmpfs is not a “real” file system, it is the device special file interface to many drivers. The OOPS seems to indicate that the root file system itself is missing (the lack of init), but we know the partition was correctly ext4 mounted…so it would indicate that files within the ext4 partition are missing or inaccessible…and devfs must have this to be mounted as it is like any other file system and must have a mount point. I believe your mount point is missing or corrupt (that log line about “recovery complete” tends to mean a bad file system was “fixed”).

Was sample rootfs completely unpacked onto a native linux file system as root? Was there any particularly notable USB device connected other than simple items like keyboard/mouse?

Ok tried the whole process again, everything seems to go just fine but once again it fails at the same step.

devtmpfs showing mount error.

Attached is the install and boot log.
BootLog: [url]http://expirebox.com/download/6a896a874bd1b2117b2b87b857985d8d.html[/url]
InstallLog: [url]http://expirebox.com/download/6a896a874bd1b2117b2b87b857985d8d.html[/url]

~Zach

Once again I notice in the bootlog the first difference is the failure “EXT4-fs (mmcblk0p1): recovery complete”. For whatever reason the root file system sample was not considered correct…it has some flaw in its copy to the Jetson which causes the Jetson to believe it must repair the rootfs. It never finds init, probably because of the file system issue. Everything up till this point is correct, I have to believe the sample rootfs copied into your R21.2 rootfs directory is incorrect in some way. Possibilities being it wasn’t copied in as root or it isn’t on a native linux file system…remember that it isn’t enough to execute this as root, even files being unpacked must be unpacked as root…every single stage should be done as root.

As for the InstallLog…the link given is incorrect, it is a duplicate of the BootLog, so I can’t tell if there was an install issue.

linuxdev,

Thanks for the help; so far so I am still in the same place …

In the install I have included the log for every step including making of dirs ls output and other outputs that I thought might be questioned.

The only weirdish thing I see on the install log is the following:
Skipping BoardID read at miniloader level
System Information:
chip name: unknown

Besides the bootlog crash.

Host System doing the flashing / rootfs building:
Device Boot Start End Blocks Id System
/dev/sda1 * 2048 42969087 21483520 83 Linux
/dev/sda2 42971134 48828415 2928641 5 Extended
/dev/sda3 48828416 250068991 100620288 83 Linux
/dev/sda5 42971136 48828415 2928640 82 Linux swap / Solaris

/dev/sda1 on / type ext4 (rw,noatime,errors=remount-ro,commit=300,commit=0)

zach@blackboxM:/tegra$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.1 LTS
Release: 14.04
Codename: trusty

Attached is the install and boot log.
BootLog: [url]http://tempsend.com/0FE1846854[/url]
InstallLog: [url]http://tempsend.com/D2CE3174D8[/url]

Any thoughts …
~Zach

I noticed something in the install log. Look at this:

...$ sudo tar xpf <b>Tegra124_Linux_R21.2.0_armhf.tbz2</b> 
...$ cd Linux_for_Tegra/rootfs/
...Linux_for_Tegra/rootfs$ sudo tar xpf ../../<b>Tegra124_Linux_R21.2.0_armhf.tbz2</b> 
...Linux_for_Tegra/rootfs$ cd ..

When you did the cd into “rootfs” you unpacked the same tar archive a second time…this wasn’t the sample rootfs. This is why there is no init found…the entire “/” file system is instead the flash program copied recursively into itself. :)

The unpacked file should have been from here:
http://developer.download.nvidia.com/mobile/tegra/l4t/r21.2.0/pm375_release_armhf/Tegra_Linux_Sample-Root-Filesystem_R21.2.0_armhf.tbz2

I have this issue also and my problem is slightly different.

  1. this is not an upgrade, system has been running fine for some time
  2. file system boots as read-only... which is my problem

I also suspect bad bootsector?.. possibly bootloader corruption?.. or extended journal file system errors, from possible kill power situations?

139-[    0.480945] Raydium - touch platform_id :  8
140-[    0.563151] platform tegradc.0: IOVA linear map 0xf8500000(1200000)
141-[    0.566050] platform tegradc.0: IOVA linear map 0xf9700000(4800000)
142-[    0.570458] platform tegradc.1: IOVA linear map 0xf8500000(1200000)
143-[    0.573343] platform tegradc.1: IOVA linear map 0xf9700000(4800000)
144-[    0.573942] tegra11_soctherem_oc_int_init(): OC interrupts are not enabled
145-[    0.574362] hw-breakpoint: found 5 (+1 reserved) breakpoint and 4 watchpoint registers.
146-[    0.574391] hw-breakpoint: maximum watchpoint size is 8 bytes.
147:[    0.574649] mc-err: Started MC error interface!

…which can’t be good.

720-[    8.683113] ALSA device list:
721-[    8.689444]   #0: HDA NVIDIA Tegra at 0x70038000 irq 113
722-[    8.698131]   #1: tegra-rt5639
723-[    8.705614] EXT4-fs (mmcblk0p1): couldn't mount as ext3 due to feature incompatibilities
724-[    8.717980] EXT4-fs (mmcblk0p1): couldn't mount as ext2 due to feature incompatibilities
725:[    8.912922] EXT4-fs (mmcblk0p1): warning: mounting fs with errors, running e2fsck is recommended
726-[    8.926571] EXT4-fs (mmcblk0p1): recovery complete
727-[    8.935919] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
728-[    8.947652] VFS: Mounted root (ext4 filesystem) on device 179:1.
729-[    8.961045] devtmpfs: mounted
730-[    8.967959] Freeing unused kernel memory: 504K (c0b31000 - c0baf000)
731-[    9.241759] init: plymouth-upstart-bridge main process (119) terminated with status 1
732-[    9.253752] init: plymouth-upstart-bridge main process ended, respawning
733-[    9.299323] init: plymouth-upstart-bridge main process (129) terminated with status 1
734-[    9.311087] init: plymouth-upstart-bridge main process ended, respawning
735-[    9.323739] init: ureadahead main process (122) terminated with status 5
736-[    9.369447] init: plymouth-upstart-bridge main process (133) terminated with status 1
737-[    9.381387] init: plymouth-upstart-bridge main process ended, respawning

…and again later, upstart comes back.

810-[   19.433799] nvmap_background_zero_allocator: PP alloc thread starting.
811-[   39.210727] init: plymouth-stop pre-start process (1816) terminated with status 1
812:[  309.243214] EXT4-fs (mmcblk0p1): error count: 42
813:[  309.243265] EXT4-fs (mmcblk0p1): initial error at 946685057: __ext4_journal_start_sb:62
814:[  309.243312] EXT4-fs (mmcblk0p1): last error at 946685271: ext4_lookup:1437: inode 262146
815-[ 9665.376419] r8169 0000:01:00.0 eth0: link up

…at a far stretch I was think extended file system errors possible to not powering down the board in a ‘graceful’ way on several HDMI connection issues during some uncounted number of initial boots? Robot would occasionally lose power also - but, that simply can’t be avoided.

if the emmc is corrupting, that’s a bummer…
running e2fsck on a mounted disk - oh so scary and unadvisable…

I don’t see anything that shows actual eMMC failure. The boot sector log seems “normal”. Any shutdown which is not clean would cause possible ext4 corruption, which is quite different from eMMC failure.

At one point there were some bugs related to init files ignoring the standard “touch /forcefsck”, which seems to be back. Even then the “ro” kernel command line parameter could be added for a u-boot entry to mount read-only, but this now seems to fail as well, which is not good (I’m testing on R21.4).

Here are some possibilities to deal with the issue until a fix is available:

  1. Flash to fix the issue.
  2. Clone mmcblk0p1, loopback mount it, fsck.ext4 that loopback mount, and then write mmcblk0p1 back. See: http://elinux.org/Jetson/Cloning.
  3. Broken (not possible): Touch /forcefsck. Should result in a one-time fsck, but init is ignoring this.
  4. Broken (not possible): Add boot loader option "ro" instead of "rw". This seems to cause a kernel OOPS on my test R21.4 (will still work on some versions).
  5. Boot to SD card as if it is a rescue disk. (Works only if you previously installed the SD card boot entry, or can still edit and add this entry):
    • If using u-boot and you can edit /boot/extlinux/extlinux.conf, add an entry to point to mmcblk1p1 (the SD card), and boot to this.
    • Once here, you're running on mmcblk1p1, not mmcblk0p1. You may be able to umount or remount mmcblk0p1 read-only, or at worst minimize risk since root is on the SD card.
    • If interested in this, understand that you can unpack a sample rootfs onto an SD card, then run install_binaries.sh with the --root option pointing to it via a host.
    • Example entry: ``` LABEL SDcard MENU LABEL SD Card LINUX /boot/zImage FDT /boot/tegra124-jetson_tk1-pm375-000-c00-00.dtb APPEND console=ttyS0,115200n8 console=tty1 no_console_suspend=1 lp0_vec=2064@0xf46ff000 mem=2015M@2048M memtype=255 ddr_die=2048M@2048M section=256M pmuboard=0x0177:0x0000:0x02:0x43:0x00 tsec=32M@3913M otf_key=c75e5bb91eb3bd947560357b64422f85 usbcore.old_scheme_first=1 core_edp_mv=1150 core_edp_ma=4000 tegraid=40.1.1.0.0 debug_uartport=lsport,3 power_supply=Adapter audio_codec=rt5640 modem_id=0 android.kerneltype=normal fbcon=map:1 commchip_id=0 usb_port_owner_info=0 lane_owner_info=6 emc_max_dvfs=0 touch_id=0@0 board_info=0x0177:0x0000:0x02:0x43:0x00 root=/dev/mmcblk1p1 rw rootwait tegraboot=sdmmc gpt ```

@linuxdev
Thanks for the suggestions. I can follow up and try the few extra there. I had already gone the route of ‘touch /forcefsck’ after I backed up everything which I cared about. So, that only fixed the appearance of the ‘__ext4_journal_sb’ error in dmesg - and at least it is not read-only now. The rest of the above errors are still in the dmesg output. I can try the other suggestions and see if that works.