Using re-trained model inside Python script

ngsteven97 · March 18, 2021, 12:59pm

I am following Dusty’s tutorials about using Jetson-inference (thank you in advance, you’re a lifesaver!) and I was just wondering about the custom retrained models. I have coded my own python script that detects birds and people on a boat, and then tracks the birds by moving some stepper motors and to keep it within frame. I have a working python script that uses the webcam and detects the birds, and outputs values, but I have just retrained my own model with just pictures of people and birds. How do I call the custom re-trained model in my python function?

DetectNet code (pretty much the same)

github.com

dusty-nv/jetson-inference/blob/master/docs/detectnet-example-2.md

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-header.jpg" width="100%">
<p align="right"><sup><a href="detectnet-camera-2.md">Back</a> | <a href="segnet-console-2.md">Next</a> | </sup><a href="../README.md#hello-ai-world"><sup>Contents</sup></a>
<br/>
<sup>Object Detection</sup></p>

# Coding Your Own Object Detection Program

In this step of the tutorial, we'll walk through the creation of the previous example for realtime object detection on a live camera feed in only 10 lines of Python code.  The program will load the detection network with the [`detectNet`](https://rawgit.com/dusty-nv/jetson-inference/dev/docs/html/python/jetson.inference.html#detectNet) object, capture video frames and process them, and then render the detected objects to the display.

For your convenience and reference, the completed source is available in the [`python/examples/my-detection.py`](../python/examples/my-detection.py) file of the repo, but the guide below will act like they reside in the user's home directory or in an arbitrary directory of your choosing.  

Here's a quick preview of the Python code we'll be walking through:

``` python
import jetson.inference
import jetson.utils

net = jetson.inference.detectNet("ssd-mobilenet-v2", threshold=0.5)
camera = jetson.utils.videoSource("csi://0")      # '/dev/video0' for V4L2
display = jetson.utils.videoOutput("display://0") # 'my_video.mp4' for file

This file has been truncated. show original

Which calls the ssd-mobilenet-v2 model - how do I call my custom re-trained model? (ssd-mobilenet-onnx). I’ve also followed the tutorial here jetson-inference/pytorch-ssd.md at master · dusty-nv/jetson-inference · GitHub.

Another question, the re-training tutorial uses the ssd-mobilenet-v1 model, is it possible to retrain the ssd-mobilenet-v2 model instead? And how would you go abouts doing this?

dusty_nv · March 18, 2021, 2:41pm

Hi @ngsteven97, you can do it like this:

net = jetson.inference.detectNet(argv=['--model=models/your-model/ssd-mobilenet.onnx', '--labels=models/your-model/labels.txt', '--input-blob=input_0', '--output-cvg=scores', '--output-bbox=boxes'], threshold=0.5)

The pytorch-ssd doesn’t have ssd-mobilenet-v2, it does have ssd-mobilenet-v2-lite, but I haven’t tried it. I’m also not sure if it works with ONNX export and with TensorRT. The main difference between -v2 and -v1 is separable convolution vs pointwise, which can give a performance improvement on resource-limited mobile devices, however on Jetson it’s not that big a deal because of it’s powerful GPU and TensorRT optimizations. So I would just stick with ssd-mobilenet-v1.

ngsteven97 · March 19, 2021, 12:30pm

I tried using your code but it seems I get an error:
error: model file ‘models/animals/ssd-mobilenet.onnx’ was not found.

Do I have to point the DetectNet to the correct directory? I assume since my custom re-trained model is in the jetson-inference/python/training/detection/ssd/models/animals directory, and not the directory where the built-in models were downloaded to, that I would have to call the new model? I assume that’s what the --model=models/your-model/ssd-mobilenet.onnx was for, but I’m not sure if the directory is complete?

Thanks.

ngsteven97 · March 19, 2021, 12:34pm

I just tried to change the directory to be more explicit and point directly to the correct folder, and that seemed to work.

net = jetson.inference.detectNet(argv=[‘–model=jetson-inference/python/training/detection/ssd/models/animals/ssd-mobilenet.onnx’, ‘–labels=jetson-inference/python/training/detection/ssd/models/animals/labels.txt’, ‘–input-blob=input_0’, ‘–output-cvg=scores’, ‘–output-bbox=boxes’], threshold=0.5)

ngsteven97 · March 19, 2021, 1:47pm

I seem to have gotten it working but had to explicitly write out the directory of the folder so that it could be found - would it possibly be easier if I just moved the .onnx files to the folder with all the other pre-trained models and call it similarly to normally? So just ‘ssd-mobilenet-v2’ or ‘ssd-mobilenet-v1’ or in this case ‘ssd-mobilenet.onnx’ or would that not work?

I also have a debug argument as an input for the python script so would put ‘–debug’ to set the log level on to verbose, but when I have this debug argument in the jetson.inference.detectNet, it doesn’t like it and gives the SyntaxError: positional argument follows keyword argument. I think this is where I have to specify the argument, similar to have I have defined the threshold instead of just using the value in my else statement.

Edit: I seem to have fixed the debug comment, I just had to attach the ‘–log-level=silent’ to the main argv flags.

dusty_nv · March 19, 2021, 2:33pm

Hi @ngsteven97, yes if you moved your model into the jetson-inference/data/networks directory, that path is checked to find the model. Let’s say you put your model in jetson-inference/data/networks/my-model/ssd-mobilenet.onnx. Then you would use --model=my-model/ssd-mobilenet.onnx and it would look under data/networks folder for it.

ngsteven97 · March 19, 2021, 2:46pm

Thanks Dusty, I gave that a try but I seem to be getting an error where the model file is not found.

Also a question about the re-training, I tried training it the other day and left it on overnight only to find that it had crashed at 16th epoch (I had originally intended to do more, around 30). Is there a way to continue from that point without having to run the programme again? I ran it again and stated 5 epochs this time just so I could prove that it would work for my use case, but it overrid the existing 0-4 epoch files.

dusty_nv · March 19, 2021, 3:07pm

Can you check if you can see your model under /usr/local/bin/networks ? This should be symlink’d to your $/jetson-inference/data/networks if you did sudo make install. The /usr/local/bin/networks is the path it actually checks.

Yes, you can use the --pretrained-ssd argument and specify the path to the previous .pth checkpoint you want to restart from. You probably want to specify a different --model-dir, because the epochs will restart at 0 (as you have found)

ngsteven97 · March 19, 2021, 3:18pm

I can confirm that the model is under the /usr/local/bin/networks directory.

I will give the --pretrained-ssd argument a try later when I have time, is there a tutorial on this method?

dusty_nv · March 19, 2021, 3:35pm

Ah, try passing your model path as --model=networks/bird/ssd-mobilenet.onnx

Restarting the detection training is relatively undocumented other than the source, so there isn’t a tutorial about it.

ngsteven97 · March 19, 2021, 3:41pm

Awesome, that worked! Thanks so much! Is there also a way to cap the FPS so to not use as much GPU overhead? With my retrained model I’m getting around 40 FPS, but realistically I probably don’t need more than real time detection (24 FPS) so being able to cap it at 24 FPS and not having the GPU and temps ramp out would be great.

Okay, I’ll give the detection training another shot and hopefully it doesn’t crash this time. The error I got is shown below, do you think I needed to change my settings? I used the normal batch-size of 4 and 30 epochs, but didn’t specify a workers argument.

dusty_nv · March 19, 2021, 3:47pm

Probably the easiest way would be to just put a sleep() call at the end of your main loop

It may be out-of-memory error, but hard to tell. You could reduce the batch size and number of workers. You can also disable ZRAM with sudo systemctl disable nvzramconfig && sudo reboot. You can also disable the desktop during training with these steps:

https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-transfer-learning.md#disabling-the-desktop-gui

ngsteven97 · March 19, 2021, 8:09pm

I just restarted the training, but had to pause it so that I could move it to another room (I had the fan on full blast to keep it cool), and wanted to resume the training, but I tried doing it but it didn’t work? There seems to be two ways that you’ve mentioned.

The first being to use the --resume=CHECKPOINT method in which I believe I just had to put the path of the folder in which I was running in, in this case models/bird, but I had difficulties in calling this as the function said that there was a directory error. I also tried the full directory name so jetson-inference/python/training/detection/models/bird, as well as moving it to a seperate folder called ‘old’ and trying to call that, none of them worked. Did I had to call the specific .pth file of the last epoch?

The second method being the --pretrained-ssd method where I believe I just had to put the path of the folder again? Or do I have to specify the specific .pth file of the last epoch? I think there was also another argument --epoch-start=NUMBER to resume from a specific epoch number?

Thanks in advance.

dusty_nv · March 19, 2021, 8:22pm

For both --resume or --pretrained-ssd, you need to specify the path to a .pth checkpoint (not the model dir). Generally you pick the last checkpoint (the one with highest epoch) or the one with the lowest loss.