Brief considerations on the Nano

I’d like to quickly share my experience so far (2 weeks) with a Jetson Nano in case anyone is considering it for their prototyping or other projects.

I am powering it with a 60W Nedis ACPA016 supply and plugging to an HP 3440x1440 monitor. The image is Ubuntu 18.04 with CUDA 10 and my compiler of choice is Clang 5. With the standard Unity/GDM, 1,5GB of memory is used right after boot, so I replaced it with LXDE which uses around 400MB after boot.

This is my first SBC and have to say I didn’t expect it to work so well as a regular desktop. Web surfing, programming and typesetting work exactly as on a desktop and you just need to give it a bit more time to compile stuff or unpack and install OS updates/packages. But I bought it to do computation and run a signal processing library I’m writing.

Memory-bound kernels like the omnipresent vector add or other operations like Log and Exp, my reference machine with a 1080Ti processes 2GB of input data in around 0,6s including the data copy (the kernel actually runs in around 12ms). The Nano doesn’t have the memory for a 2GB input, so I tested with 512MB and 1GB, then extrapolated to 2GB as the increase in time was linear. In an hypothetical 2GB memory-bound operation it would process in 5,6s including data copy (CPU -> GPU -> CPU).

A complex signal such as instantaneous phase takes 1,1s on the 1080Ti for 2GB including data copy (the kernels are just a few milliseconds), and the Nano would take a projected 15,6s (7,75s for successfully runs of 1GB). This includes the FFT steps and Hilbert transform.

My PC has a power consumption of around 350W with the 1080Ti under full load (250W of the card plus the motherboard, CPU and some SSDs/fans). With the Nano module rated at 10W without peripherals, it consumes 35x less power but is only around 15-16x slower on complex data manipulation. This means it is possibly 2x more energy efficient than the already energy efficient GPU processing compared to CPUs.

I don’t do image/audio processing or AI, my thing is number crunching, and I have to say that this little Maxwell with 128 cores is really a surprise. I haven’t profiled kernels on the Nano yet because data copy is something I need to take into account so I did these timings first, but I am curious to see how it fares against my Fortran routines and won’t be surprised if it at least matches the processing time on a considerably more expensive motherboard+memory+CPU combo.

Thumbs up to the guys who put this little device together, good job.

Hi saulocpp, thanks for sharing your feedback, good to know!

Was this the procedure you followed to install LXDE?

On Jetson, the memory is shared between CPU/GPU, so you could use zero copy and avoid cudaMemcpy() if you want.

Good evening, @dusty.

Yes, this is exactly the guide I used. It is not that Nano wasn’t responsive with Unity, it is just that with LXDE I have 1GB more free.

As for the zero copy procedure, thanks for the tip, I will definitely have a look at it and change the code accordingly.