Cross-compiler setup for Jetson-TK1

I am trying to follow the CUDA Getting Started guide and install a cross-build
environment for ARM.

The available packages for the Jetson TK1 are there:
https://developer.nvidia.com/jetson-tk1-support

I installed the Grinch version of L4T (19.3.6) in order to have working wireless
mouse and keyboard.
https://devtalk.nvidia.com/default/topic/766303/embedded-systems/-customkernel-the-grinch-19-3-6-for-jetson-tk1/1/

It seems that only CUDA 6.0 is available for the Jetson-TK1, as of today (https://devtalk.nvidia.com/default/topic/773186/cuda-missing-on-jetson-tk1/?offset=9#4306854). The
cross-development toolkit is only available on Ubuntu 12.04 64-bit, and I have
14.04 64-bit, but I’d like to give it a shot anyway.

I follow the Getting Started guide, section 5:
http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#cross-arm
Here are the issues I run into, and would appreciate help with:

  1. apt-get
    sudo dpkg --add-architecture armhf sudo apt-get update
    I get a lot of “404 Not Found” for amrhf packages.
    Err http://security.ubuntu.com trusty-security/main armhf Packages
    404 Not Found [IP: 91.189.88.149 80]
    Err http://se.archive.ubuntu.com trusty/main armhf Packages
    404 Not Found [IP: 130.239.18.173 80]

  2. CUDA versions
    The .deb I used is the one for cross-development from the Jetson TK1 Support
    page. It is called cuda-repo-ubuntu1204_6.0-37_amd64.deb
    When I go through the 5.1 section of the guide, and after getting all the 404
    above, I:
    sudo dpkg -i cuda-repo-ubuntu1204_6.0-37_amd64.deb then update, then sudo apt-get install cuda-cross-armhf
    What I get is cuda 6.5. Not 6.0.

What should I do to get my cross-compilation to work, and be at the right
version?

Ok, I’ll rewrite my post, since devtalk.nvidia.com just trashed everything I wrote (is it 1998 or what?)

Well, that’s awkward. The solution to install 6.0 was so simple:
$ sudo apt-get install cuda-cross-armhf-6-0

So now I have the correct libs, I think.
However, the libs on my host are now 6.0.52, while the ones on my target are 6.0.42. This is probably going to cause compatibility issues, right?
I guess that’s why the guide recommends to mount the target fs onto the host and use the libs and headers there.

Anyway, I copied the target’s /usr/local/cuda to my host, and I am trying to use that to build a sample application.

Somehow, g++ does not recognize -mfload-abi=hard. Isn’t that strange? Am I using the correct version of g++?

gauthier@sobel:~/tegra/grinch/Linux_for_Tegra/NVIDIA_CUDA-6.0_Samples/1_Utilities/deviceQuery $ make ARMv7=1 EXTRA_LDFLAGS="-L/home/gauthier/tegra/sandbox/cuda-target-installation/lib"
/usr/local/cuda-6.0/bin/nvcc -ccbin g++ -I../../common/inc  -m32 -target-cpu-arch ARM  -Xcompiler -mfloat-abi=hard  -gencode arch=compute_32,code=sm_32 -o deviceQuery.o -c deviceQuery.cpp
g++: error: unrecognized command line option ‘-mfloat-abi=hard’
make: *** [deviceQuery.o] Error 1
gauthier@sobel:~/tegra/grinch/Linux_for_Tegra/NVIDIA_CUDA-6.0_Samples/1_Utilities/deviceQuery $ which g++
/usr/bin/g++

devtalk.nvidia.com does not seem to let me write ec_ho (without underscore) in a post. So I’ll go on writing ec_ho instead:

gauthier@sobel:~/tegra/grinch/Linux_for_Tegra/NVIDIA_CUDA-6.0_Samples/1_Utilities/deviceQuery $ ec_ho $PATH
/home/gauthier/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/gauthier/tegra/sandbox/cuda-target-installation/bin
gauthier@sobel:~/tegra/grinch/Linux_for_Tegra/NVIDIA_CUDA-6.0_Samples/1_Utilities/deviceQuery $ ec_ho $LD_LIBRARY_PATH :/home/gauthier/tegra/sandbox/cuda-target-installation/lib

(/home/gauthier/tegra/sandbos/cuda-target-installation is where I copied the target’s /usr/local/cuda)

CUDA 6.0 does work in 12.04 and 14.04 32/64bit if you got the correct arch debs.

Did you install some kind of a ARMv7 toolchain? DS5, Sourcery or Linaro … etc?

cuda-cross-armhf-6-0 only install CUDA configurations for cross compile use

Also, it also didn’t look like you set the script environment to explictly use the ARM Toolchain.
Seems you are calling the x86 g++ instead of the armhf g++ which is why that flag doesn’t exist.

There have been a few updates on the questions, I’ll see if I can clarify a bit and then see how the question changes. And I have also had problems with forums not accepting certain words…seems to think they are markup.

FYI, lib version 6.0.52 and 6.0.42 only differ in patch level…an application compiled against them will be the same and compatible with both lib versions (the ABI linked against is exact, and the symbols in the libraries will have the same signatures). If there were a bug in the libs then it might matter in terms of a difference between running on both systems…but you can’t run ARM cross-compiled binaries on your x86 system anyway, so you will never see any difference ever, even for bugs in the libs.

About versions of CUDA: Your cross versions built on x86 host cannot run on the host. Your versions compiled and built on Jetson cannot run on the host. The added ARM architecture to apt-get will of course offer to download applications to build and output ARM applications from x86 which can only run on ARM; the cross tools themselves will run only on x86. There are cases where one architecture or the other may or may not support the options of the other architecture, but hard float is not recognized on x86; hard float option failure implies that part of your tool chain was intended for x86 output/target and does not know anything about hard float convention (restated, tools are improperly mixing x86 and ARMv7 output, as x86 target has no understanding of hard float convention). Remember, HOST (factory of software) versus TARGET (consumer of software). If this were “Ghost Busters”, I’d make a joke here about crossing the streams…

It “looks” like in the above that nvcc was running on the x86 HOST, but I’m not sure. Ignore this paragraph if you were running on Jetson/ARMv7:
nvcc running on x86 host is not a cross compiler, and is instead an x86 factory with x86 output and no understanding of ARMv7 (and thus no understanding of -mfloat-abi=hard). You would have to run the command on Jetson itself, as nvcc seems to only exist as HOST/TARGET same…both x86 or both ARMv7.

The nice thing about kernel cross compiles is that the kbuild system already understands tool chains and setup is easy. For general software compile, it is quite a different story, as different pieces of different software seem to have a desire to look in “standard locations” you never would have thought of. E.G., something linked by something you linked might have hard wired a path you didn’t want. So most environments for cross compile can be quite difficult to set up, especially since even NFS mounted TARGETs won’t contain everything you need to develop…you’d have to put dev packages on top of everything the TARGET has for execute, and then figure out how to get all of the dependencies to look at only this location. Jetson has some very easy solutions for a chunk of this…

You don’t need to build an entire environment on your x86 HOST because the entire Jetson environment, including dev packages, can be put on Jetson…and either NFS exported or loopback mounted from system.img. If you loopback mount system.img, you can even use rsync to update from Jetson and have a 100% exact bit-for-bit copy on x86 HOST…you wouldn’t even need to have Jetson running, nor would you need NFS server options added to Jetson’s kernel. Should you keep a copy of this system.img, the option to reuse system.img from flash.sh means you could also instantly restore Jetson if something ever happened…or clone it on thousands of Jetsons.

So about software in cross compile environments which wants hard wired paths that get in the way: If you either mount Jetson’s root and export all of Jetson, or (and this is faster with no network required) if you loopback mount system.img, a couple of x86 HOST symbolic links will take care of a majority of headaches related to hard wired paths (my testing is with linaro version 4.9.1 20140717 tool chain on fedora x86_64 HOST):

cd /lib
ln -s <mount_point>/lib/arm-linux-gnueabihf .
cd /lib/arm-linux-gnueabihf
ln -s <mount_point>/lib/arm-linux-gnueabihf/libc_nonshared.a .
cd /usr/lib
ln -s <mount_point>/lib/arm-linux-gnueabihf .
cd /usr/lib/arm-linux-gnueabihf
ln -s <mount_point>/lib/arm-linux-gnueabihf/libc_nonshared.a .

Then in your makefile perhaps add something like this as a lib search directory:

CRT=${ARMv7_MOUNT}/usr/lib/arm-linux-gnueabihf

You will want to have this in compile lib search directory path:

-L${CRT}

…this location is where the crt*.o files are and are basically the “magic” used to allow main() as an entry point.

Your explanations… it seems so obvious when I read them.
My wrong assumption was that g++ (even if running on x64) was capable of compiling code for ARM if given the right flags. The worse is that I have used a linaro cross-compiler in the past, so I really should have known better.
I would have expected g++ to complain about some other flags as well, such as -target-cpu-arch ARM.

I was also assuming that cuda-cross-armhf-6-0 would install a whole cross-compilation toolchain, which the Makefiles in the cuda samples would automatically use. I see now that it is not the case.

So it’s either nvcc on x86 for x86, or on ARMv7 for ARMv7? The Makefiles in the samples use nvcc, so this means I cannot use these makefiles for cross-compilation?

NVCC := $(CUDA_PATH)/bin/nvcc -ccbin $(GCC)
...
deviceQuery.o:deviceQuery.cpp
	$(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS) -o $@ -c $<
  • How do you run rsync? I suppose you let Jetson run the command to an NFS location that is physically on the host? My understanding is that you can’t let the host run rsync since it would require having the target mounted on host, and for that you’d need an NFS server on target (and support for that in its kernel).
    Do you use the whole target’s ‘/’ as source to “rsync -a”?
    I definitely like the idea of having images of the target filesystem on my host, and the ability to restore, duplicate, and version them!
  • Thanks for the list of symlinks, this will be useful. Don’t you think it would be cleaner to solve this with environment variables?

In my case, “ec_ho” is considered a security issue, I guess some kind of injection. Even the preview stops updating, and if you try posting you get an error (security) and your whole post gets trashed with no warning.

Just a general statement…much in the way of cross compiling involves setting up an environment for a specific set of software (e.g., the linux kernel has added adjustments to its kbuild to allow a tool chain). Making a general “works for all situations” cross compile environment is much more difficult.

What I mentioned above about system.img would give you the best basis for both individual software setup or a general setup starting point. If you can use a cross compile tool chain, and thus run x86 host to ARMv7 output, system.img on host should work perfectly.

I do not know much about QEMU, but this is how you would make a general environment capable of running ARMv7 software on x86 host without a cross toolchain. You would be able to indirectly run native nvcc ARMv7 on x86 as if your x86 host were ARMv7, using the same libraries and directories as you have on Jetson.

Provided ARMv7 software does not need to run on x86 host, the general system I gave above should work for almost any software via cross compile tool chain. Should you need QEMU to execute ARMv7, the above setup could serve as the file system QEMU uses…so consider the above system.img method as something of a building block useful on its own for cross compile, or from which to build on from QEMU to completely emulate.

rsync has many methods to copy from one machine to another. You could add the rsync server, but I’ve never done this. Perhaps easiest is just an example…my assumption is that you created a directory on x86 “/L4T”, and that on the host root has mounted system.img (mount -o loop -t ext4 system.img /L4T). I also assume the two machines are networked without any firewall restrictions. Keep in mind these rsync options while looking at the example:

-c is checksum method
-r                // is recursive
-a                // is archive mode
-z                // is compress
-v                // if you want verbose
-e ssh            // tells rsync to use ssh
--delete-before   // this deletes system.img files which shouldn't exist on system.img

From TK1, assume loopback mounted on host named “x86”, and that whatever account has root authority for /L4T writes is used (I use root, I’m using Fedora):
sudo rsync -avczr -e ssh /boot root@x86:/L4T
EDIT: sudo rsync --delete-before -avczr -e ssh /boot root@x86:/L4T

Related to rsync and loopback system.img, a subtle fact is to know the “ubuntu” admin account on Jetson is user ID 1000 (UID), group ID 1000 (GID). When you go to use “ls -l” on linux, and it shows user name and group, it’s simply looking those ID numbers up in /etc files. Preserving UID/GID from rsync to x86 means that when you are on x86 and “ls -l” the user and group won’t list as ubuntu/ubuntu unless UID/GID 1000 on x86 are also user ubuntu. On my x86 host development machine I gave my regular user UID/GID 1000…whenever I copy files to/from Jetson via x86 host developer account I never worry about permissions. Any rsync files on loopback mounted L4T owned by ubuntu show up on x86 as my developer account. I’d advise simplifying life by either making your developer account on remote x86 host UID/GID 1000, or else adding an account to Jetson with a UID/GID matching your x86 developer account. UID/GID translations get messy or are an accident waiting for a place to happen.

Another thing about rsync is that not all files are on an actual file system…some are just pseudo systems in ram. You cannot and should not try to copy those. The /proc system is not real, nor is /sys. Also, several /dev files are generated and not truly static, but those which are not static are ok to back up anyway. If you’ve added an SD card or other storage, you probably don’t want them mounted during rsync, although mount point directory structure is fine. I personally run rsync on “/” directories bin, boot, dev, etc, home, lib, media, mnt, opt, root, sbin, srv, usr, var. If you cd to a directory and run “df -T .” it’ll tell you file system type…ext4 is a go, others should be skipped. If in doubt, copy your system.img (always keep a pristine backup anyway) and practice rsync on the copy.

As for use of environment variables, I’ve found that different software might depend on different variables…plus I develop on other non-Jetson Tegra hardware. I got tired of constantly dealing with paths and environment variables, so I use sym links. This was especially motivated by software linked to software linked to software, so on, and somewhere in all those links was some piece of software hard wired for a particular path and ignoring environment variables.

Hi Gauthier,

How did you solve the first error,ie;the armhf packages for the cross build environment on ubuntu 14.04 x86_64 system?

I also got the same issue
Err http://in.archive.ubuntu.com trusty-security/universe armhf Packages
404 Not Found [IP: 91.189.91.13 80]
Err http://in.archive.ubuntu.com trusty-security/multiverse armhf Packages
404 Not Found [IP: 91.189.91.13 80]
Please help me in solving this issue.

This is also about adding packages…

I’ve got a Dev Board
(nVidia Jestson TK1), and I’m using an iMac running ubuntu 3.13.0-32
under Parallels and within that, I’ve downloaded kernel and source from
nVidia for this board, along with their recommended cross-compiler.

I’m able to a cross-compile as-is and download that kernel image to my
board and it runs as expected. I can even tweak some code here
or there and build that and run that. Great.

My goal is to create a few kernel modules that I will run on my dev board.
I can create modules successfully for the “native” ubuntu on my Dev (virtual)
machine. These do what I expect them to do. Next, I want to cross-compile
these modules for my TK1.

I then hit the following cascade of problems:

  • the default config for the as-is TK1 kernel
    does not support my building modules
    • as I attempted to tweak the config to allow module creation, I am
      overwhelmed by the magnitude of config options. I’m feeling certain
      that most of them should not be changed…just the ~important~ one(s).
      • I have read that menuconfig is the likely best way to selectively
        make changes to the config.
    • when I attemped to build menuconfig, it complains that
      it requires ncurses
      • a little research leads me to identify (what I think is)
        the correct ncurses package to install…this I do.
        • ‘make menuconfig’ seems to be happier for a while, but then,
          while doing:
          HOSTLD scripts/kconfig/mconf
          I hit a plethora of undefined refs to symbols that should
          have been resolved with ncurses (wmove, wrefresh, for example)

…now I am doubting that I have selected the correct ncurses package,
but all my googling has not given me a better candidate.

My Initial Question: What is the correct ncurses package?
My Better Question: Is there a more straight-forward way to permanently
config my kernel to allow module development?

(Not wanting to manually edit an auto-generated config file)

About ncurses…it requires ncurses-dev (development package) and not just ncurses (“make” of menuconfig compiles the interface, so devel is needed).

It sounds like additionally your kernel is completely unconfigured. You need to start with existing config and modify that rather than starting from scratch. When the Jetson runs it should have /proc/config.gz…this can be copied to your kernel source, uncompressed, and renamed “.config”. THEN run make menuconfig and load this first.

The default kernel already uses plenty of modules and has no issue with modules. However, they MUST match the running kernel or cannot be insmod.

yes…sorry…I hadn’t made it clear that the nurses dev pkg is the one I installed.
I have the default config from nVidia that I had used to originally build the kernel.
I agree that there ~should~ be no issue with compiling modules, but when I attempt it
(make module <…>) the error is very specific in pointing to the fact that the kernel was
not built to allow the use (wait…scratch that: to allow the compiling) of modules.
I was surprised by this, of course.

Mt next step (without better advice) is to re-install everything from scratch…starting with
parallels, then ubuntu, then the nVida kernel&source … I think this problem is odd enough
in character to warrant being certain that I haven’t inadvertently tweaked my dev environment.
…especially since it sounds like nothing obvious is amiss.
…I’ll update when I’ve got a repro or success

Which cross tool chain are you using? What did you set environment ARCH and CROSS_COMPILE to? What is the exact compile command line sequence entered and was it from a full kernel source tree or were you compiling modules externally to the kernel tree?

Okay…update: Building & Compiling is now working (I went thru a from-the-start re-install of everything)
…I found a unexpected use of an Environment Variable by “make module …” for building my kernel modules.
This definition got me past that problem:

export TEGRA_KERNEL_OUT=kernel_image_OUT
export KCONFIG_CONFIG=$TEGRA_KERNEL_OUT/.config

…this, then got me to build a kernel (& modules!) from source and to flash those to the TK1.

my next task: cross-compile some kernel modules and “insmod” those on the TK1
They modules (previously tested natively on my host linux) cross-compiled w/o error.
When I copied them (scp) to the TK1 and then tried ins mod, I get this error:
Unknown symbol __aeabi_dmul

my linux kernel modules are referencing this built-in libraries
…but had not resolved them before link. There’s a bevy of similar errors, all
part of the built-in math alias library.

Should these not have been resolved at compile time?
…OR:
Should the TK1 kernel image have exported these?

Answering Self:
Looks like the TK1 kernel doesn’t support Floating Point (whereas my host kernel does)

…adjusting code to remove use of Floating point in kernel modules…

It just means the original compile was with a kernel exporting required functionality, but the insmod was to a kernel which was compiled without that functionality. A kernel can’t export what it was told to not build in the first place. An alternative would be to find out which features are required and to build those as modules, then insmod the requirements modules before insmod of your own.