Can`t recover Jeston after update to L4T 21.1 (DeleteAll failed)

By following the instructions here: https://github.com/NVIDIA/tegra-nouveau-rootfs, I’ve got a rootfs which I can run on an SD card. The eMMC is not flashable, but I can boot the SD card on the TK1 by running tegra-uboot-flasher on the host PC. Unless I could get the eMMC replaced, that’s the best I’m going to get.

If you’re not familiar with tegra-nouveau-rootfs, it is Arch Linux. Now my problem is, how to install CUDA. Does anyone know where I can find it for Arch Linux Arm?

CUDA requires the nVidia drivers. Nouveau will not work. It is possible that some of the files which apply_binaries.sh unpacks could work on Arch Linux, but I suspect mostly they won’t work. The nVidia graphics drivers expose the GPU in direct manner which nouveau does not…CUDA depends on direct GPU access.

Ok. It seems like my choices are to get the NVidia rootfs to boot, or to use OpenCL.

Here is what happens when I try to boot the NVidia rootfs, any suggestions?

Tegra124 (Jetson TK1) # boot
switch to partitions #0, OK
mmc1 is current device
Scanning mmc 1:1…
Found /boot/extlinux/extlinux.conf
Retrieving file: /boot/extlinux/extlinux.conf
796 bytes read in 67 ms (10.7 KiB/s)
Jetson-TK1 SD Card boot options
1: primary kernel
Enter choice: 1
1: primary kernel
Retrieving file: /boot/zImage
6199520 bytes read in 367 ms (16.1 MiB/s)
bootarg overflow 582+0+0+1 > 512
SCRIPT FAILED

Actually, it seems that if I remove enough of the bootargs to be below the 512 limit, then the Linux_for_Tegra_tk1 rootfs boots on the SD card.

These are what remain in extlinux.conf (sdcard), I hope the missing arguments won’t cause problems later:

APPEND console=ttyS0,115200n8 console=tty1 no_console_suspend=1 lp0_vec=2064@0xf46ff000 mem=2015M@2048M memtype=255 ddr_die=2048M@2048M section=256M pmuboard=0x0177:0x0000:0x02:0x43:0x00 tsec=32M@3913M otf_key=c75e5bb91eb3bd947560357b64422f85 usbcore.old_scheme_first=1 core_edp_mv=1150 core_edp_ma=4000 tegraid=40.1.1.0.0 android.kerneltype=normal fbcon=map:1 commchip_id=0 usb_port_owner_info=0 lane_owner_info=6 emc_max_dvfs=0 touch_id=0@0 board_info=0x0177:0x0000:0x02:0x43:0x00 root=/dev/mmcblk1p1 rw rootwait

The removed arguments are:

debug_uartport=lsport,3 power_supply=Adapter audio_codec=rt5640 modem_id=0

An interesting issue, I’ve never looked at the max arg length. When I go to boot an SD card, I leave the boot loader on eMMC, but point root partition at mmcblk1p1 (flash is to mmcblk0p1 when I do this, the SD card is simply pointed at later on). Other than that, the arguments I’ve used were default. Were there any particular customizations or edits you added for your APPEND?

No, those are the factory defaults in Linux_for_Tegra_tk1/bootloader/ardbeg/jetson-tk1_extlinux.conf.sdcard.

Thanks for you input, I will try to solve it one day. For now, I’ve moved on to CUDA. Unfortunately, some of the samples provided in NVIDIA_CUDA-7.0_Samples will not link as is.

For example, in 6_Advanced/cpdLUDecomposition:

$ nvcc -ccbin g++ -m32 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o cdpLUDecomposition cdp_lu.o cdp_lu_main.o dgetf2.o dgetrf.o dlaswp.o -lcublas -lcublas_device -lcudadevrt

nvlink warning : SM Arch (‘sm_37’) not found in ‘/usr/local/cuda-7.0/bin/…/targets/armv7-linux-gnueabihf/lib/libcublas_device.a:hgemm.o’
nvlink warning : SM Arch (‘sm_37’) not found in ‘/usr/local/cuda-7.0/bin/…/targets/armv7-linux-gnueabihf/lib/libcublas_device.a:sgemmEx.o’
nvlink error : Undefined reference to ‘cublasIdamax_v2’ in ‘dgetf2.o’
nvlink error : Undefined reference to ‘cublasDswap_v2’ in ‘dgetf2.o’
nvlink error : Undefined reference to ‘cublasDscal_v2’ in ‘dgetf2.o’
.
.
.

Removing all gencode arguments except 53 allows them to link, like this:

$ nvcc -ccbin g++ -m32 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o cdpLUDecomposition cdp_lu.o cdp_lu_main.o dgetf2.o dgetrf.o dlaswp.o -lcublas -lcublas_device -lcudadevrt
$

I’ve also noticed that even without the manual intervention, building the samples takes far longer than the 15 minutes stated in the documentation.

One thing I noticed is that the examples you said won’t link are from CUDA 7.0 (at least the directory is ‘/usr/local/cuda-7.0/’). The 32-bit Jetson TK1 works up to CUDA 6.5…7.0 is incompatible because it requires a 64-bit environment.

It would have been nice to know that before I spent hours installing and building. The documentation leaves something to be desired.

Back to Linux: I appreciated having a serial console after boot with tegra-nouveau. On the other hand, when I boot L4T, the last thing I see on the serial console is:

Starting kernel …

Then it’s useless. Is there a way to tell Linux_for_Tegra_tk1 that I want the serial console?

PS. USB keyboard and mouse do not work. At least I can log in with SSH over ethernet. If that fails, the serial console is my only backup.

If you have any suggestions for getting USB input devices to work in L4T, I would be grateful.

L4T by default does run serial console. If console fails, then it is possible the serial port settings are mismatched…but by default this is set up in boot parameters. The kernel itself could also be incorrectly set up, but all of the default configurations for all Jetsons and all L4T have this present (is this a custom kernel?).

Other than serial console, was the Jetson booting correctly? If so, first check that the port is set for speed 115200, 8 bits, 1 stop bit, no parity (115200,8N1).

Assuming you flashed to an R21.x L4T (which both of the recent JetPacks have), u-boot should be the boot loader. The parameters passed to the kernel are contained in file “/boot/extlinux/extlinux.conf”. In the “APPEND” line, by default the first parameters set up console:

APPEND console=ttyS0,115200n8 console=tty1 no_console_suspend=1

To check L4T version (useful because u-boot is known installed on R21.x):

head -n 1 /etc/nv_tegra_release

$ head -n 1 /etc/nv_tegra_release

R21 (release), REVISION: 4.0, GCID: 5650832, BOARD: ardbeg, EABI: hard, DATE: Thu Jun 25 22:38:59 UTC 2015

Scrolling up you will see my bootargs, they do in fact include console=ttyS0,115200n8 console=tty1 no_console_suspend=1. But the console is dead in L4T. I’m using the same u-boot for both nouveau and L4T. The console works in nouveau.

What about the kernel? Is this the stock kernel shipping with L4T?

This is something of an odd situation, as the issues should “just work”…serial console and some of the boot loader strangeness just does not happen very often. I thought of something else which may be related to all of this, due to SD card. Just as a general explanation, consider that Jetson boots and then hands off to the boot loader, which then loads any dtb firmware, followed by handing off to the kernel. Up until handing off to the kernel, serial console is run by the boot loader. Because the Jetson can hand off to a boot loader on either eMMC or SD card, this also means some files are searched for on SD card instead of Jetson if and only if boot loader itself goes to SD card…root partition and boot loader (including firmware and kernel) locations are independent. Boot files are still referred to as “/boot”, but the device they reside on changes.

Perhaps something has moved when going to or from SD card, but not everything moved…if the kernel is searched for in “/boot/zImage”, then there are two possible zImages which might be loaded, two possible sets of firmware, and two extlinux.conf files…one in “/boot” of eMMC, the other in “/boot” of SD card. Once the kernel is loaded, modules and some of the other /lib items always come from the root partition…up until that moment, “/boot” device is a moving target. This could cause parts of kernel setup (and configuration-related files) to fail.

We know u-boot was showing serial console, and then it stopped at kernel load. Serial console must work. We know stock kernels work. As a test (and maybe you’ve done this already), an install needs a manual flash which is guaranteed completely from to eMMC:

sudo ./flash.sh -S 14580MiB jetson-tk1 mmc0blk1

…during that flash no SD card should be involved, and no SD card extlinux.conf should be involved. This would generate a reference boot known to work. This would determine if there is an actual Jetson hardware issue, versus SD card configuration issue. For completeness, I’d recommend a fresh sample rootfs and apply_binaries.sh as well; be absolutely certain the SD card extlinux.conf is not used. Understand that the flash process itself will modify the sample rootfs boot partition, and so a new/clean rootfs itself may be required.

I thought I was using a rootfs and kernel build by JetPack, but there are so many problems that it could be that I’m not.

I just installed CUDA 6.5, built the samples (takes hours), and tried one. This is what I see:

$ 1_Utilities/deviceQuery/deviceQuery1_Utilities/deviceQuery/deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
→ no CUDA-capable device is detected
Result = FAIL

So, I will start over. Maybe grinch will work.

PS. The eMMC is dead, not flashable. This is why I’m using the SD card.

The reason for no CUDA-capable device is probably because one of the files of apply_binaries.sh was not installed to SD card, or permissions were wrong (needs sudo for most stages, as certain file types cannot be placed correctly without it). If you can reach a command line, then you can at least validate checksums via “sha1sum -c /etc/nv_tegra_release” (this will not validate permissions though).

To determine memory functions related to eMMC and SD will work, attempt to clone the root partition to see how far it gets (realize that if there isn’t a valid partition there will be a failure, but the goal is to see what kind of failure it is):

http://elinux.org/Jetson/Cloning

You could use clone to read the partition table, and check one-by-one if all partitions are at least readable (other than rootfs…clone each partition one at a time…a very very slow process).

Since eMMC is likely dead (at least in part…there are non-rootfs partitions which must still work even if booting to SD card), here is what I’d suggest. Start with a fresh copy of L4T R21.4. Use sudo, unpack sample rootfs to the L4T install area for that. Run the apply_binaries.sh (again, use sudo). Now unpack the same sample rootfs on SD card (sudo), and run apply_binaries.sh (sudo) with the “-r ” option. Tell the flash program to flash to eMMC. Clone eMMC rootfs, edit extlinux.conf to point at SD card, and put the edited clone back on eMMC. Hopefully the “/boot” portion will make it to eMMC. If not, then you can try again to go straight to mmcblk1p1 instead of mmcblk0p1 (this will rely on other partitions still functioning).

Working on it, but so far the emmc seems dead.

Can you tell me where to get a cross toolchain for Fedora?

I use Linaro. For some versions you will need to build the entire chain from scratch, which is a very difficult job. For the most recent releases, pre-built binaries are available. You won’t find any pre-packaged rpm-based tools for Fedora. See:
[url]https://releases.linaro.org/components/toolchain/binaries/latest-5.2/[/url]

For the JTK1, native build is usually the easiest way to go…you know it’ll be compatible with the linker and libs on the JTK1. For cross-compile, you may need to build a 4.9 version of Linaro.

Great thanks. I installed several packages already such as arm-none-eabi-*. I wasn’t sure if those were what I need. If that’s wrong, I’ll try linaro, but I would still need to know which one (or more) to choose from.

In the meantime, I bought another TK1 to help with getting the broken one working. I’m in the process of copying the eMMC of the new one to an SD card to boot on the old one.

I have a question about what is on the new TK1: is that Ubuntu or L4T?

$ cat os-release
NAME=“Ubuntu”
VERSION=“14.04.1 LTS, Trusty Tahr”

$ cat nv_tegra_release

R21 (release), REVISION: 4.0, GCID: 5650832, BOARD: ardbeg, EABI: hard, DATE:5

Looks like Ubuntu, but I guess L4T is Ubuntu, also?

I want CUDA, I and assume that it’s not already installed. Can I install “CUDA 6.5 Toolkit for L4T Rel 21.4” on this OS, or is there a different cuda toolkit that I should install? Or, should I change the OS first?

Thanks

I found at Tegra/Downstream SW/Linux4Tegra - eLinux.org that it is l4t. Sorry for the unnecessary question.

My copy of the new board’s eMMC boots from SD card on the other board, but only if I remove boot args. I must be using the wrong u-boot. I’ll try rebuilding it.

I’m currently installing CUDA on both boards. Thanks for all the help.

arm-none-eabi is for bare metal, with no hardware floating point. This works for building boot loaders, and in sometimes kernels. This would be incompatible with user space apps. For both kernels and user space I use arm-linux-eabihf (which will probably have “gnu” in the name, depending on file packaging). “linux” implies a linux environment (instead of bare metal), the “hf” implies hard float convention (which all of the modern Linux systems use). The eabi is just the GNU E ABI calling convention.

L4T is Ubuntu, except that some direct hardware access files are added. L4T sample rootfs is pure Ubuntu, the apply_binaries.sh script places direct hardware access files over the top of this. System administration is still Ubuntu, as well as configuration (other than perhaps some nVidia tools).

Those direct hardware access files make it possible to install CUDA (on JTK1 version 6.5 is the most recent version supported for 32-bit). Basically you can take a JTK1 which freshly arrives with Ubuntu, and turn it into L4T by running the script in the NVIDIA-INSTALLER directory; this is directly equivalent to flashing with the sample rootfs after the apply_binaries.sh script places those same files on the sample rootfs. CUDA can be installed after this…typically you install the repository file and then just use apt-get to install the rest of CUDA.

~/NVIDIA_CUDA-6.5_Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
→ no CUDA-capable device is detected
Result = FAIL

It works ok as root or with sudo. Would you please tell me what I need to change to use CUDA as a non-root user?