Retraining on a budget

Not sure if this is the right place…

I am looking at using a Jetson Nano for our FIRST Robotics Competition team over the summer; planning to use it to recognize and locate game pieces.

I looked at the tutorials on Github (https://github.com/dusty-nv/jetson-inference/blob/master/docs/imagenet-training.md), and the part on doing retraining requires a DIGITS server. That appears to be kind of expensive, either dedicated hardware or racking up charges in the cloud.

Did I hear on the webinar today that training would be possible in the future on the Jetson itself?

Even if it worked, you would probably end up paying more in electricity eventually than you would if you paid for cloud training, just because it’s a lot more efficient to do that kind of thing on dedicated hardware.

I don’t think it’s possible right now specifically because DIGITS requires the nvidia docker runtime and the version of docker I see installed on the nano by default doesn’t appear to have it (you can see this with sudo docker info and look under Runtimes).

I personally hope this changes. I don’t care about training, but having the nvidia runtime for docker on Nano would be useful to me as I already use nvidia-docker. Without the Nvidia runtime, I don’t think it’s (elegantly) possible to use Cuda within Docker, unfortunately. This makes DIGITS impossible for now.

DIGITS is only supported on PC/server, so it’s not for Jetson. However you could run transfer learning (re-training a pre-trained model) with PyTorch or TensorFlow. In my experience training with PyTorch is less memory-hungry than TensorFlow. I included some results from re-training with the PyTorch ImageNet example in this blog: https://devblogs.nvidia.com/jetson-nano-ai-computing/

So yes, it is possible, you could let your Nano run overnight or for a couple days. It becomes more an issue when you are developing new models and experimenting, but if you are only transfer learning with quality data, and not changing the network architecture and layer configuration, that should be less an issue. You probably will want to mount a 2GB or 4GB swap file on your Nano to avoid any memory issue.

thank you both for your answers. I don’t know enough yet to know if I am wanting to re-train a pre-trained model or developing a new model, but will look at the tutorial Dustin mentioned.

I will post on the DIGITS board about minimum HW recommendation. It looks like a couple of hundred of dollars worth of GTX 1060 will get me by…

Dustin: Can you point me to reference or tutorial material on using the Nano to do re-training (what steps does one go through to get the data in table 3 of https://devblogs.nvidia.com/jetson-nano-ai-computing/?

I have all summer to work on this, and at 10W, the cost of electricity for a few hundred hours seems trivial…

Here is a quick run-down of the procedure - install PyTorch from the sticky on this forum. Mount a 4GB swap file. Run the PyTorch imagenet example - it can use different networks, I used AlexNet and ResNet-18. Also run it in pretrained mode so you are using transfer learning and aren’t training from scratch.

You will also need to download and extract a dataset of images to use. Recommend using a 64GB SD card or a USB3-to-SATA dongle with an SSD. I put the dataset I used up on Google Drive here (20GB): https://drive.google.com/a/nvidia.com/file/d/1LsxHT9HX5gM2wMVqPUfILgrqVlGtqX1o/view?usp=drivesdk

I plan to add training with PyTorch to jetson-inference this summer, but that is the gist of it for now.

Dustin: thank you for the additional information. Let me get the additional DASD together and look at the information; I have a lot of tutorial reading to do here! Probably need another Nano; the one I have is going onto a jetbot for the students when the last of the parts come in…

My students and I will be more than happy to beta-test any training instructions you plan to add to jetson-inference; if we can make this work, I think a lot of FRC teams will have an interest.

I requested read access to the dataset in Google Drive. Do you have any objection to my sharing it via BitTorrent?

Can you share with me the torrent when you have it so I can include it as a download option when I update the tutorial with it? Thanks.

By the way, found a blog post earlier on the same topic, here it is: https://www.zaferarican.com/post/transfer-learning-training-on-jetson-nano-with-pytorch

Let’s try

https://github.com/fovea1959/nvidia_musings/raw/fd5878791640880aa29932a2e901ecf5cd3678cb/torrents/ilsvrc12_subset.tar.gz.torrent

Right now, that’s being seeded from a residence, so a few more seeds out there would be helpful. If that doesn’t happen, using the torrent will be worse than downloading from Google.