How to build resnet10 equivalent with TLT for deepstream

I use the standard deepstream resnet 10 model with up to 8 sources.

However I need to fine-tune the model.

As NVIDIA does not allow this can you please suggest the best course of action using TLT?

  • what model to base on from TLT
  • performance - can it handle 8 streams in a deepstream app
  • performance - accuracy of detections as compared to the resnet10

Which resnet10 model do you mean, is it /opt/nvidia/deepstream/deepstream-5.0/samples/models/Primary_Detector/resnet10.caffemodel ?

Yes that’s the one @Morganh. I’ve been told elsewhere on the forums that this one is proprietary and cannot be retrained or fine tuned with TLT. I wish it could as it does almost everything I need. I just want to fine -tune it on low light situations.

Unfortunately the resnet10.caffemodel is not a tlt model. So it cannot be retrained or fine tuned with TLT.

Suggest you using the unpruned trafficcamnet or dashcamnet for finetuning/pruning /retraining.

See Transfer Learning Toolkit — Transfer Learning Toolkit 3.0 documentation or Transfer Learning Toolkit — Transfer Learning Toolkit 3.0 documentation

You can also directly use the pruned version to run inference.
https://ngc.nvidia.com/catalog/models/nvidia:tlt_trafficcamnet/files?version=pruned_v1.0
https://ngc.nvidia.com/catalog/models/nvidia:tlt_dashcamnet/files?version=pruned_v1.0

Thankyou @Morganh. From what I can see the inference speeds are far slower with these TLT models. So it would seem that we have no alternative to the standard resnet10 model if we want the sort of performance to be able to handle 8 streams.
Can you confirm this?

Ie. TLT doco says traffic can network can run at 17.7 fps on jetson nano.
This means in deepstream it could not even handle one camera stream at 30fps??

Yes, it is the result at batch-size 1 under fp16 mode.
But, it is not an apple-to-apple comparison.

For trafficcamnet, its resolution is 960x544, the backbone is resnet18.
For resnet10.caffemodel, its resolution is 640x368, the backbone is resnet10.

I will sync with internal team for your request.

Thanks. Would be good to know if there is a solution allowing multiple cameras on a jetson nano with a TLT model. Otherwise we are stuck with the resnet10 which we can’t fine-tune.

For “if there is a solution allowing multiple cameras on a jetson nano”, this should be a question for deepstream. Please search/ask in deepstream forum.

Ok - I have asked the question in the deepstream forum but please note I was originally asked to move it to the TLT forum. ;-)

For your question “if there is a solution allowing multiple cameras on a jetson nano”, below is the searching result in deepstream forum. You can have a look.
https://forums.developer.nvidia.com/search?q=multiple%20cameras%20%20%23intelligent-video-analytics%3Adeepstream-sdk%20

That’s not what I have asked. I know already how to get the jetson nano to work with multiple cameras and deepstream.

The problem is that NVIDIA advertise deepstream as being able to handle 8 cameras at 30fps but that is with the resnet10 model which cannot be fine tuned to a specific purpose.
If you want to use a TLT model the performance will be nowhere near as good and it seems from the performance info on the traffic cam model that it won’t even be able to handle one stream at 30fps. (Traffic cam model states it will run at 17fps on jetson nano.)

OK, as I mentioned above, I will sync with internal team for your request. In my opinion, if ngc can provide a pretrained model which is 640x368 resnet10 unpruned model, then you can use it to train/prune/retrain in TLT. And also, if a 640x368 resnet10 pruned etlt model is available, you can directly use it to run inference.