I have detectnet working nicely on the Jetson Nano, I have the trainer all set up well on my PC. Unfortunately my data set is huge and it is complicated so needs a lot of training steps.
I was wondering if possible and how I would go about using cloud GPU to train a model that is compatible with DetectNet?
I have trained models using other methods before and never been able to convert them to work on DetectNet, only models I have trained using the provided tools.
So I would like to know how to get it right if possible from the get go and train the model on cloud GPU because currently it takes me around 50 hours of training on my pc to get half way threre.
Any info and help in the right direction appreciated.
Hi @TP2049, if you are referring to the DetectNet from jetson-inference, you can run the pytorch-ssd code on a Linux PC or cloud instance with NVIDIA GPU. It runs the same on x86 as it does on Jetson. On x86 I have only tested it under Ubuntu, and you will need to install CUDA, cuDNN, PyTorch, and the other dependencies (such as Pandas)
Yeah I have the training working on my PC using your repo but still takes around 1 hour per 10 epochs on a P220 (no tensor cores :P)
I have not yet attempted a cloud based training. And given the requirements for DetectNet on jetson inference I was not sure how to make a configuration and send to a gpu that would send back a model workable.
The only tutorials I find online produce h5 models which is not what we want, I am used to the trainer producing a .pth and converting to onnx then letting DetectNet build the TRT engine on first run on the jetson.
cloud gpu is new territory for me but I need to get to grips with it! Do you know of any how to guides available that work specifically with your repo?
Thanks a lot for the speedy response much appreciated :)
In theory it would not be very different than you PC, you would run the same pytorch_ssd code after installing the dependencies into the cloud instance.
You could try Google Colab first (free), it has GPU and I believe it comes with PyTorch (or you can install it). Colab is an IPython notebook - basically you would run shell commands from the notebook to invoke train_ssd.py
oohh I see so essentially its just a virtual pc that I setup like my own? I got confused as I see cloud GPU services and you have to select all these different GPUs and create some config files etc I will have a test run with colab.
There certainly are ML cloud services out there that train models for you, and typically you have to work with the model format they output. However you can also just get a cloud instance (whether it be on AWS, Azure, Colab, ect) and install/run your own software on it.