Jetson nanocluster worth it?

I just heard of the Jetson Nano and did a back of the napkin calculation.

~500 GFLOPs/5W or 100 GFLOP/watt. A 1080TI is a about 11000 GFLOP/254W ~50GFLOP/W. Take in account the Jetson Nano is 1/7 the cost of the 1080 Ti, are we looking at something about 14 times as powerful for the same money? Am I reasoning this correctly?

If so, I might make a medium sized cluster for my deep learning experiments.

I’ll state my bias up front. I have long believed, partially based on experience and partially based on theoretical considerations, that Seymour Cray got it right when he stated that it is easier to plow a field with a pair of oxen than 1024 chickens. Looking at the current state of art, which favors clusters of fat nodes, serves as confirmation of that notion.

What you would need to look at to analyze a competing design comprised of skinny nodes is, at a minimum: (1) software scalability; (2) communication overhead; (3) communication power requirements.

Interesting things to consider, I admit I am new to this. My idea involves neuro-evolution, where each member of the population could be on a single Jetson nano. I was thinking a population size of 100, so I figure about the price of a modest GPU rig and about 14 times faster for my application?

Population size of 100 implies capital expenditure in the $12,000 to $15,000 range, plus operational cost. Before you embark on spending that nice chunk of money, you might want to clarify your ultimate objective for this project.

Are you conducting research, or trying to boost throughput for an existing use case? Is the idea to build a cluster with superior throughput/cost metric, e.g. compared to a NVIDIA’s DGX system (which retails for about 10x the cost of cluster you are envisioning, best I now)? Do you want to explore trade-offs between clusters based on skinny vs fat nodes?

I came up with this idea that got back-burnered a while back that I want to take up again, so basically research. I decided to work on my idea again after seeing a strikingly similar approach by these people at Columbia university.

Research into self modeling/simulation, free will, consciousness. Calculating Phi for small evolved networks. Stuff that would probably never get funded. I figured I could blow $10K on a hobby car, or I can tinker on these ideas on 1-2 kW of power. The problem is evolving neural networks is super slow, expensive.

I would definitely like to replicate the work I’ve seen so far:
https://robotics.sciencemag.org/content/4/26/eaau9354

I don’t have any personal experience building clusters, but I do understand the idea of doing open-ended research that would not get funded by regular funding mechanisms and may not deliver publishable results. Tinkering can definitely be intellectually stimulating and satisfying. Best of luck with your project.

I appreciate that! I just realized I made a mistake in the calculations, when I started seriously considering making the purchase. A $10K system would give 50 TFLOPs, where a $5K system using 1080 TIs would have nearly the same performance, just twice the power consumption/heat. So really need to see if power consumption/cooling is that big a deal. Time to pick up on the economics of GPU computing.

I am excited the nano corresponds 1:1 with what I need for my application.

Just to add my personal experience comparing the Nano to a 1080Ti, the former is rated at 10W and the latter, at 250W. The Nano uses 25x less energy, and in a signal processing library I made and tested on various GPUs, the Nano is on average 12-13x slower than a 1080Ti with some extrapolations. This means it is around 2x more energy-efficient. On functions that are compute-bound the performance gap widens, the 1080Ti is still a very fast card.

As @njuffa already considered, 2 oxen will do the job more conveniently, but if you can carry your research with 2 chickens and have it scalable when more stuff is plugged in, why not? You can always resort to the discrete cards with a recompilation.

That’s a good point! What do you think about using a Jetson AGX Xavier instead? It’s got 16 TFLOPs for 30W, about $900 each. Still cheaper than most vid card GPUs. Has about the same computing resources, but uses about 1/10 the power. I might pitch the idea to a few angel investors I know, because the machine would be pretty powerful (1.6 PFLOPs,) but pricey. Is there any way for me to incorporate the cluster into a cloud and earn some money to offset the cost in between runs?

Personally I don’t have any experience with the Xavier, but if your particular need is computer vision or something that is available only for the Volta, then it is the right tool for you. It still won’t match the power of a discrete Volta but you will most certainly be able to do your proof of concept.

On a sidenote, make sure your potential investors know, or the person communicating with them, that this technology is scalable. But you also need to properly implement the load distribution according to the number of devices plugged. This is where the little Nano comes in handy to test scalability because it is cheap to stack them, but if your project needs at least the Xavier and you are paying upfront, boy it can hurt when you buy 3 of them. :)

As they say, “cloud is really just other people’s computers”, but since I never deal with hardware at this level, maybe @njuffa wants to enlighten us?

I have zero experience with cloud computing (both user and provider side).