Hey,
Could I get more information about the “primary detector” of Deepstream. It is a resnet10 caffemodel and I figured that it is much faster compared to standard approaches like Yolo or SSD (3-4 times faster).
I am curious how the model was trained and about the performance. Is there any documentation?
Thanks for the information!
I still don’t really understand with which algorithm it is trained as it doesn’t seem to go with SSD/FasterRCNN/DetectNetv2. What is the benefit of it compared to ResNet10-SSD, ResNEt10-FasterRCNN, …
the hdf5 file for the resnet10_detectnet_v2 model which you can download from inside the TLT-Docker using this command:
ngc registry model download-version
is ~40 MB
I am guessing the mode from #1 is NOT trained using DetectNet.
PS: I have fine-tuned
resnet10-detectnet,
resnet18-detectnet and
resnet50-detectnet
All models were trained on the SAME dataset, with SAME iou, SAME learning rate and the SAME number of epochs.
All models were pruned and then re-trained using the same parameters.
When running them on the NANO in FP16 mode (yes, I exported the models in FP16 mode too), I have found NO difference in performance (speed or accuracy) across the models.
Note that I am running 6x video-streams from .mp4 files on the sd-card on the Nano. The resolution of each of these streams is 640x480 (a far cry from the 1080p resolutions in the benchmark).
Even at these super low resolution, NONE of the models were able to perform inference at 30fps (there were at 4-5 fps). (jetson_clocks was on, power mode = MAXN)
Interestingly, the only way I could increase the inference speed in DeepStream on the NANO for these models was to change the inference-interval value in the config file to 6 (i.e., run the inference on every 6th frame).
Kinda, disappointing so far.
I’d love to get some tips on how to reach the speed of the resnet10 model they use in their demo DeepStream app(s).
The model in official DS sample is essentially a DetectNet_v2 model with Resnet10 as backbone.
But for TLT, the pre-trained model is not a copy of Deepstream’s model.
TLT can help users train on their own dataset,but DS cannot. Users can use those TLT pre-trained models as starting points for transfer learning and training.
What I need to know is with enough tweaking and training will the TLT “DetectNet_v2 model with Resnet10”
achieve the same performance as the DS sample?
My focus is on the getting the performance out of the nano.
If I’m not going to be able to match the performance of the DS sample using the TLT “DetectNet_v2 model with Resnet10” then It would be nice to know so I can focus my attention elsewhere on achieving the performance I need.
Im not asking to be spoon-fed but to show us the amazing potential of the nano performance and to not be more forthright
about how we can achieve this with our own data is Bad Form.
Actually I have also no answer since I have never run the experiment — to reproduce the DS model with a tlt resnet10 detectnet_v2 pretrained model.
But I have an idea that you can generate a new tlt pretrained model and replace the one from ngc.
You can transfer the DS resenet10 caffe model into h5 via Keras.
Just an idea for your reference. To change the starting points of TLT.
Hi maniKTL,
No, I do not mean that.
There should be no issue for converting “DS resenet10 Caffe model into h5 via Keras”.
But for tlt, officially mentioned in tlt doc, only the pre-trained models inside ngc.nvidia.com can be used to train.
Hi Morganh,
I finally found time to train a custom model (DetectNet_v2 + Resnet10) to reproduce the benchmark performance of that “primary detector” in deepstream. I must say, I didn’t get even close.
On 5 streams I achieve with the primary detector 28 fps, and with custom one 10 fps.
Here some information.
In deepstream I used the deepstream-app and for comparing I used the exact same config, only in the [primary-gie] the config-file of the model is switched out.
Here the two different config files of the models:
In the tlt I built on the example script detectnet_v2.ipynb, prepared my own data and switched resnet18 to resnet10. It trains successfully and I can detect the objects after training (as well in deepstream). After pruning there are 19368 weights left.
Can you tell me, why I get such a bad performance?
Hi rog07o4z,
Sorry for late reply, could you please file any tlt topic into TLT forum instead of DS forum in future? Since I will always check TLT forum topics but may miss some notifications from DS forum.
For your case, could you please generate int8 TRT engine with tlt-converter and config it into custom_detectnet_v2_resnet10.txt to retry?
For trt engine test, an example is as below.
$ /usr/src/tensorrt/bin/trtexec --int8 --loadEngine= --calib= --batch=1 --iterations=20 --output=output_cov/Sigmoid,output_bbox/BiasAdd --useSpinWait