Using re-trained model inside Python script

I am following Dusty’s tutorials about using Jetson-inference (thank you in advance, you’re a lifesaver!) and I was just wondering about the custom retrained models. I have coded my own python script that detects birds and people on a boat, and then tracks the birds by moving some stepper motors and to keep it within frame. I have a working python script that uses the webcam and detects the birds, and outputs values, but I have just retrained my own model with just pictures of people and birds. How do I call the custom re-trained model in my python function?

DetectNet code (pretty much the same)

Which calls the ssd-mobilenet-v2 model - how do I call my custom re-trained model? (ssd-mobilenet-onnx). I’ve also followed the tutorial here jetson-inference/pytorch-ssd.md at master · dusty-nv/jetson-inference · GitHub.

Another question, the re-training tutorial uses the ssd-mobilenet-v1 model, is it possible to retrain the ssd-mobilenet-v2 model instead? And how would you go abouts doing this?

Hi @ngsteven97, you can do it like this:

net = jetson.inference.detectNet(argv=['--model=models/your-model/ssd-mobilenet.onnx', '--labels=models/your-model/labels.txt', '--input-blob=input_0', '--output-cvg=scores', '--output-bbox=boxes'], threshold=0.5)

The pytorch-ssd doesn’t have ssd-mobilenet-v2, it does have ssd-mobilenet-v2-lite, but I haven’t tried it. I’m also not sure if it works with ONNX export and with TensorRT. The main difference between -v2 and -v1 is separable convolution vs pointwise, which can give a performance improvement on resource-limited mobile devices, however on Jetson it’s not that big a deal because of it’s powerful GPU and TensorRT optimizations. So I would just stick with ssd-mobilenet-v1.

1 Like

I tried using your code but it seems I get an error:
error: model file ‘models/animals/ssd-mobilenet.onnx’ was not found.

Do I have to point the DetectNet to the correct directory? I assume since my custom re-trained model is in the jetson-inference/python/training/detection/ssd/models/animals directory, and not the directory where the built-in models were downloaded to, that I would have to call the new model? I assume that’s what the --model=models/your-model/ssd-mobilenet.onnx was for, but I’m not sure if the directory is complete?

Thanks.

I just tried to change the directory to be more explicit and point directly to the correct folder, and that seemed to work.

net = jetson.inference.detectNet(argv=[‘–model=jetson-inference/python/training/detection/ssd/models/animals/ssd-mobilenet.onnx’, ‘–labels=jetson-inference/python/training/detection/ssd/models/animals/labels.txt’, ‘–input-blob=input_0’, ‘–output-cvg=scores’, ‘–output-bbox=boxes’], threshold=0.5)

I seem to have gotten it working but had to explicitly write out the directory of the folder so that it could be found - would it possibly be easier if I just moved the .onnx files to the folder with all the other pre-trained models and call it similarly to normally? So just ‘ssd-mobilenet-v2’ or ‘ssd-mobilenet-v1’ or in this case ‘ssd-mobilenet.onnx’ or would that not work?

I also have a debug argument as an input for the python script so would put ‘–debug’ to set the log level on to verbose, but when I have this debug argument in the jetson.inference.detectNet, it doesn’t like it and gives the SyntaxError: positional argument follows keyword argument. I think this is where I have to specify the argument, similar to have I have defined the threshold instead of just using the value in my else statement.

Edit: I seem to have fixed the debug comment, I just had to attach the ‘–log-level=silent’ to the main argv flags.

Hi @ngsteven97, yes if you moved your model into the jetson-inference/data/networks directory, that path is checked to find the model. Let’s say you put your model in jetson-inference/data/networks/my-model/ssd-mobilenet.onnx. Then you would use --model=my-model/ssd-mobilenet.onnx and it would look under data/networks folder for it.

Thanks Dusty, I gave that a try but I seem to be getting an error where the model file is not found.

Also a question about the re-training, I tried training it the other day and left it on overnight only to find that it had crashed at 16th epoch (I had originally intended to do more, around 30). Is there a way to continue from that point without having to run the programme again? I ran it again and stated 5 epochs this time just so I could prove that it would work for my use case, but it overrid the existing 0-4 epoch files.

Can you check if you can see your model under /usr/local/bin/networks ? This should be symlink’d to your $/jetson-inference/data/networks if you did sudo make install. The /usr/local/bin/networks is the path it actually checks.

Yes, you can use the --pretrained-ssd argument and specify the path to the previous .pth checkpoint you want to restart from. You probably want to specify a different --model-dir, because the epochs will restart at 0 (as you have found)

I can confirm that the model is under the /usr/local/bin/networks directory.

I will give the --pretrained-ssd argument a try later when I have time, is there a tutorial on this method?

Ah, try passing your model path as --model=networks/bird/ssd-mobilenet.onnx

Restarting the detection training is relatively undocumented other than the source, so there isn’t a tutorial about it.

Awesome, that worked! Thanks so much! Is there also a way to cap the FPS so to not use as much GPU overhead? With my retrained model I’m getting around 40 FPS, but realistically I probably don’t need more than real time detection (24 FPS) so being able to cap it at 24 FPS and not having the GPU and temps ramp out would be great.

Okay, I’ll give the detection training another shot and hopefully it doesn’t crash this time. The error I got is shown below, do you think I needed to change my settings? I used the normal batch-size of 4 and 30 epochs, but didn’t specify a workers argument.

Probably the easiest way would be to just put a sleep() call at the end of your main loop

It may be out-of-memory error, but hard to tell. You could reduce the batch size and number of workers. You can also disable ZRAM with sudo systemctl disable nvzramconfig && sudo reboot. You can also disable the desktop during training with these steps:

I just restarted the training, but had to pause it so that I could move it to another room (I had the fan on full blast to keep it cool), and wanted to resume the training, but I tried doing it but it didn’t work? There seems to be two ways that you’ve mentioned.

The first being to use the --resume=CHECKPOINT method in which I believe I just had to put the path of the folder in which I was running in, in this case models/bird, but I had difficulties in calling this as the function said that there was a directory error. I also tried the full directory name so jetson-inference/python/training/detection/models/bird, as well as moving it to a seperate folder called ‘old’ and trying to call that, none of them worked. Did I had to call the specific .pth file of the last epoch?

The second method being the --pretrained-ssd method where I believe I just had to put the path of the folder again? Or do I have to specify the specific .pth file of the last epoch? I think there was also another argument --epoch-start=NUMBER to resume from a specific epoch number?

Thanks in advance.

For both --resume or --pretrained-ssd, you need to specify the path to a .pth checkpoint (not the model dir). Generally you pick the last checkpoint (the one with highest epoch) or the one with the lowest loss.