Best development path


We are just getting started with CUDA app development for Jetson TX1 board. What’s the best development path for this? Cross-compilation or Native?

If cross-compilation is used, is there any specific tutorial that we can use to set-up Nsight on a 64-bit Ubuntu host?

I have searched around on the forum and have not found a strong response for the same.

Thank you!

For kernel build you really need cross-compile (the next L4T release may change this). I believe CUDA development in user space is probably easiest directly on JTX1 (my opinion…others may be set up for convenient cross-development from a desktop host).

Here’s a thread which lightly touches on the specifics of cross-compiling a kernel for JTX1:

Knowledge you may find useful is that currently L4T user space is 32-bit, but kernel is 64-bit, which is why two compilers are used during kernel builds, but only one compiler is used during user space build. Add to this that the next L4T release (I don’t know when that is) will put everything into 64-bit (both kernel space and user space), and you’ll be back to using a single compiler for everything (I imagine there will be significant performance improvements too).

Thank you for that information! It is helpful to know that current user space is 32-bit.

We had a Ubuntu host cross-compilation set-up for TK1 through Nsight CUDA and were able to utilize that set-up again for TX1. Though Nsight has difficulty pushing the locally built executable to remote platform, we just scp it to TX1 at the moment.

Currently Ubuntu’s graphical features are up and running but it is better for us to go back to headless configuration for accurate benchmarking of CUDA applications. Is it possible to do so in a revertible fashion?

Thank you!

This URL may seem unrelated, but notes on remote execution and complications with CUDA apply for your case:

If you can run remote execution without passing through an X11 DISPLAY to your desktop, headless should be fine. If your program executes with error and complains when no DISPLAY variable is set (such as ssh with no “-X” and no “-Y”), then probably native execution and display on Jetson is required for accurate benchmarking.

Thank you for the post and notes!

We do more of non-graphical applications hence headless is preferred. I was able to add a Screen entry to xorg.conf to get to a terminal based interface. When I run tegrastats, I can see GPU staying at 0% until our application kicks in.

I recently got my first TX1, and I’m finding compiling locally on the TX1 or crosscompiling on even a eight-core linux box prohibitively slow. Some time this week I’ll be setting up a proper cross-compilation environment to run on AWS, so I can fire up a 32-core instance, build, transfer, then shut down.

I’ve done this for the Raspberry Pi, and it has been a life saver; the whole kernel (+modules, of course) compiles in about 4 minutes.

If there is interest, I’ll share the AMI and/or instructions on how to set one up.