Retraining on a budget

fovea1959-nvidia · May 2, 2019, 9:09pm

Not sure if this is the right place…

I am looking at using a Jetson Nano for our FIRST Robotics Competition team over the summer; planning to use it to recognize and locate game pieces.

I looked at the tutorials on Github (jetson-inference/imagenet-training.md at master · dusty-nv/jetson-inference · GitHub), and the part on doing retraining requires a DIGITS server. That appears to be kind of expensive, either dedicated hardware or racking up charges in the cloud.

Did I hear on the webinar today that training would be possible in the future on the Jetson itself?

mdegans · May 2, 2019, 9:41pm

Even if it worked, you would probably end up paying more in electricity eventually than you would if you paid for cloud training, just because it’s a lot more efficient to do that kind of thing on dedicated hardware.

I don’t think it’s possible right now specifically because DIGITS requires the nvidia docker runtime and the version of docker I see installed on the nano by default doesn’t appear to have it (you can see this with sudo docker info and look under Runtimes).

I personally hope this changes. I don’t care about training, but having the nvidia runtime for docker on Nano would be useful to me as I already use nvidia-docker. Without the Nvidia runtime, I don’t think it’s (elegantly) possible to use Cuda within Docker, unfortunately. This makes DIGITS impossible for now.

dusty_nv · May 2, 2019, 11:01pm

DIGITS is only supported on PC/server, so it’s not for Jetson. However you could run transfer learning (re-training a pre-trained model) with PyTorch or TensorFlow. In my experience training with PyTorch is less memory-hungry than TensorFlow. I included some results from re-training with the PyTorch ImageNet example in this blog: [url]https://devblogs.nvidia.com/jetson-nano-ai-computing/[/url]

So yes, it is possible, you could let your Nano run overnight or for a couple days. It becomes more an issue when you are developing new models and experimenting, but if you are only transfer learning with quality data, and not changing the network architecture and layer configuration, that should be less an issue. You probably will want to mount a 2GB or 4GB swap file on your Nano to avoid any memory issue.

fovea1959-nvidia · May 3, 2019, 2:17pm

thank you both for your answers. I don’t know enough yet to know if I am wanting to re-train a pre-trained model or developing a new model, but will look at the tutorial Dustin mentioned.

I will post on the DIGITS board about minimum HW recommendation. It looks like a couple of hundred of dollars worth of GTX 1060 will get me by…

fovea1959-nvidia · May 3, 2019, 2:24pm

Dustin: Can you point me to reference or tutorial material on using the Nano to do re-training (what steps does one go through to get the data in table 3 of https://devblogs.nvidia.com/jetson-nano-ai-computing/?

I have all summer to work on this, and at 10W, the cost of electricity for a few hundred hours seems trivial…

dusty_nv · May 4, 2019, 1:09am

Here is a quick run-down of the procedure - install PyTorch from the sticky on this forum. Mount a 4GB swap file. Run the PyTorch imagenet example - it can use different networks, I used AlexNet and ResNet-18. Also run it in pretrained mode so you are using transfer learning and aren’t training from scratch.

You will also need to download and extract a dataset of images to use. Recommend using a 64GB SD card or a USB3-to-SATA dongle with an SSD. I put the dataset I used up on Google Drive here (20GB): [url]ilsvrc12_subset.tar.gz - Google Drive

I plan to add training with PyTorch to jetson-inference this summer, but that is the gist of it for now.

fovea1959-nvidia · May 4, 2019, 4:07pm

Dustin: thank you for the additional information. Let me get the additional DASD together and look at the information; I have a lot of tutorial reading to do here! Probably need another Nano; the one I have is going onto a jetbot for the students when the last of the parts come in…

My students and I will be more than happy to beta-test any training instructions you plan to add to jetson-inference; if we can make this work, I think a lot of FRC teams will have an interest.

I requested read access to the dataset in Google Drive. Do you have any objection to my sharing it via BitTorrent?

dusty_nv · May 4, 2019, 7:20pm

Can you share with me the torrent when you have it so I can include it as a download option when I update the tutorial with it? Thanks.

By the way, found a blog post earlier on the same topic, here it is: [url]https://www.zaferarican.com/post/transfer-learning-training-on-jetson-nano-with-pytorch[/url]

fovea1959-nvidia · May 6, 2019, 12:55pm

Let’s try

https://github.com/fovea1959/nvidia_musings/raw/fd5878791640880aa29932a2e901ecf5cd3678cb/torrents/ilsvrc12_subset.tar.gz.torrent

Right now, that’s being seeded from a residence, so a few more seeds out there would be helpful. If that doesn’t happen, using the torrent will be worse than downloading from Google.

Topic		Replies	Views
Can I retrain the neural net with the Jetson Nano Jetson Nano	7	825	October 15, 2021
DIGITS Server on Nano, or Training Times and Modes Jetson Nano	14	1349	October 14, 2021
Can I use Jetson Nano to train a neural network? Jetson Nano	6	17345	October 14, 2021
Jetson nano - retraining detection problems with FDDB demo Jetson Nano ai-training	4	470	October 18, 2021
DIGITS or somthing else Jetson Nano	10	3310	February 19, 2020
Explain Jetson Nano Jetson Nano	5	542	October 18, 2021
System requirement for Digits software to train DNN module for Jetson Nano Deep Learning (Training & Inference)	0	420	November 29, 2019
Training Network with DIGITS Jetson Nano digits	3	731	October 18, 2021
Re-trained Pytorch Mask-RCNN inferencing in Jetson Nano Jetson Nano pytorch	2	1571	October 18, 2021
Course project using GPU acceleration Jetson Nano jetson-inference	3	449	October 18, 2021

Retraining on a budget

Related topics