Resnet10 "primary detector"

rog07o4z · October 8, 2019, 6:31am

Hey,
Could I get more information about the “primary detector” of Deepstream. It is a resnet10 caffemodel and I figured that it is much faster compared to standard approaches like Yolo or SSD (3-4 times faster).

I am curious how the model was trained and about the performance. Is there any documentation?

Thanks

AastaLLL · October 8, 2019, 9:36am

Hi,

You can find some model information in our NGC website:
[url]https://ngc.nvidia.com/catalog/models/nvidia:tlt_iva_object_detection_resnet10[/url]

This model can be re-trained by Transfer Learning Toolkit now.
[url]https://ngc.nvidia.com/catalog/containers/nvidia:tlt-streamanalytics[/url]

You can also find some performance data in our document:
[url]NVIDIA DeepStream SDK Developer Guide — DeepStream 6.1.1 Release documentation

Thanks.

rog07o4z · October 8, 2019, 10:10am

Thanks for the information!
I still don’t really understand with which algorithm it is trained as it doesn’t seem to go with SSD/FasterRCNN/DetectNetv2. What is the benefit of it compared to ResNet10-SSD, ResNEt10-FasterRCNN, …

Furthermore how can I re-train it?
https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#training_models
only addresses SSD, FasterRCNN, DetectNetv2.

What about performance regarding accuracy (precision, recall, or similar)?

adventuredaisy · October 9, 2019, 3:18am

I have asked the same question and got the same answer.

We all want to know what you trained the “resnet10 caffemodel” model with:SSD? FasterRCNN? DetectNetv2?

Is it trained with a special NVIDIA Secret Squirrel algorithm?

Is there a Easter Egg in the Transfer Learning Toolkit that gives us a key to unlock the answer?

For the time being I have started retraining the examples in the jupyter notebook with the resent10 model and running it on the NANO.

Hopefully one of them can match the speed of the “resnet10 caffemodel” example.

rog07o4z · October 9, 2019, 6:15am

adventuredaisy: I would appreciate if you could share your experiences regarding the different models you try out.

pushkar.chatterji · October 9, 2019, 9:59am

Here’s what I have found so far:

the hdf5 file inside this docker container (https://ngc.nvidia.com/catalog/models/nvidia:tlt_iva_object_detection_resnet10) is ~ 19 MB
the hdf5 file for the resnet10_detectnet_v2 model which you can download from inside the TLT-Docker using this command:

ngc registry model download-version

is ~40 MB

I am guessing the mode from #1 is NOT trained using DetectNet.

PS: I have fine-tuned

resnet10-detectnet,
resnet18-detectnet and
resnet50-detectnet

All models were trained on the SAME dataset, with SAME iou, SAME learning rate and the SAME number of epochs.

All models were pruned and then re-trained using the same parameters.

When running them on the NANO in FP16 mode (yes, I exported the models in FP16 mode too), I have found NO difference in performance (speed or accuracy) across the models.

Note that I am running 6x video-streams from .mp4 files on the sd-card on the Nano. The resolution of each of these streams is 640x480 (a far cry from the 1080p resolutions in the benchmark).

Even at these super low resolution, NONE of the models were able to perform inference at 30fps (there were at 4-5 fps). (jetson_clocks was on, power mode = MAXN)

Interestingly, the only way I could increase the inference speed in DeepStream on the NANO for these models was to change the inference-interval value in the config file to 6 (i.e., run the inference on every 6th frame).

Kinda, disappointing so far.

I’d love to get some tips on how to reach the speed of the resnet10 model they use in their demo DeepStream app(s).

rog07o4z · October 11, 2019, 11:23am

Thank you for sharing your experience. That saves some investigation time. Let’s wait for some answers of nvidia :)

Morganh · October 11, 2019, 3:56pm

The model in official DS sample is essentially a DetectNet_v2 model with Resnet10 as backbone.
But for TLT, the pre-trained model is not a copy of Deepstream’s model.

TLT can help users train on their own dataset,but DS cannot. Users can use those TLT pre-trained models as starting points for transfer learning and training.

adventuredaisy · October 12, 2019, 3:47pm

Thanks for the info.

What I need to know is with enough tweaking and training will the TLT “DetectNet_v2 model with Resnet10”
achieve the same performance as the DS sample?

My focus is on the getting the performance out of the nano.

If I’m not going to be able to match the performance of the DS sample using the TLT “DetectNet_v2 model with Resnet10” then It would be nice to know so I can focus my attention elsewhere on achieving the performance I need.

Im not asking to be spoon-fed but to show us the amazing potential of the nano performance and to not be more forthright
about how we can achieve this with our own data is Bad Form.

Morganh · October 12, 2019, 5:38pm

Actually I have also no answer since I have never run the experiment — to reproduce the DS model with a tlt resnet10 detectnet_v2 pretrained model.
But I have an idea that you can generate a new tlt pretrained model and replace the one from ngc.
You can transfer the DS resenet10 caffe model into h5 via Keras.
Just an idea for your reference. To change the starting points of TLT.

adventuredaisy · October 12, 2019, 5:55pm

Thanks for the tip

adventuredaisy · October 15, 2019, 5:16pm

If anybody is interested here is a link to a caffemodel to HDF5 converter:
[url]https://github.com/pierluigiferrari/caffe_weight_converter[/url]
I like this one because it only converts the weights.

Here is also a link to a nifty little .h5/HDF5 viewer:
[url]https://www.neonscience.org/explore-data-hdfview[/url]
I have only tried the Windows version.
It works pretty good and its simple.

maniKTL · November 13, 2019, 6:16am

Hi Morganh,
What’s the point of converting “DS resenet10 Caffe model into h5 via Keras”. Are you saying we should train the converted keras model in tlt.
But TLT doesn’t support Keras model as mentioned here
https://devtalk.nvidia.com/default/topic/1064782/transfer-learning-toolkit/how-to-convert-caffe-model-to-tlt-model-/post/5393763/#5393763

Morganh · November 13, 2019, 10:12am

Hi maniKTL,
No, I do not mean that.
There should be no issue for converting “DS resenet10 Caffe model into h5 via Keras”.
But for tlt, officially mentioned in tlt doc, only the pre-trained models inside ngc.nvidia.com can be used to train.

maniKTL · November 13, 2019, 10:52am

Hi Morganh,
Thanks for clearing that. But still the question remains that we all are seeking.

I am not going after accuracy right now as nvidia trained with its private dataset. But I just want the same speed as DS Detectnetv2.
If DS detectnetv2 isn’t trained with tlt, then is it trained with DIGITS?
Can you ask for favor with the team that worked on developing detectnetv2 or connect them in this thread.

rog07o4z · November 14, 2019, 10:26am

Hi Morganh,
I finally found time to train a custom model (DetectNet_v2 + Resnet10) to reproduce the benchmark performance of that “primary detector” in deepstream. I must say, I didn’t get even close.
On 5 streams I achieve with the primary detector 28 fps, and with custom one 10 fps.

Here some information.
In deepstream I used the deepstream-app and for comparing I used the exact same config, only in the [primary-gie] the config-file of the model is switched out.

Here the two different config files of the models:

config_infer_primary.txt (Benchmark primary detector):

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-file=../../models/Primary_Detector/resnet10.caffemodel
proto-file=../../models/Primary_Detector/resnet10.prototxt
model-engine-file=../../models/Primary_Detector/resnet10.caffemodel_b30_int8.engine
labelfile-path=../../models/Primary_Detector/labels.txt
int8-calib-file=../../models/Primary_Detector/cal_trt.bin
batch-size=5
process-mode=1
model-color-format=0
network-mode=1
num-detected-classes=4
interval=0
gie-unique-id=1
output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid

custom_detectnet_v2_resnet10.txt (trained and pruned with tlt):

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
int8-calib-file=/detectnet_v2_resnet_10/calibration.bin
labelfile-path=/detectnet_v2_resnet_10/classes.txt
tlt-encoded-model=/detectnet_v2_resnet_10/resnet10_detector.etlt
tlt-model-key=blablabla
batch-size=5
uff-input-blob-name=input_1
uff-input-dims=3;608;608;0
process-mode=1
model-color-format=0
network-mode=1
num-detected-classes=2
interval=0
gie-unique-id=1
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd

In the tlt I built on the example script detectnet_v2.ipynb, prepared my own data and switched resnet18 to resnet10. It trains successfully and I can detect the objects after training (as well in deepstream). After pruning there are 19368 weights left.

Can you tell me, why I get such a bad performance?

Morganh · November 21, 2019, 5:57am

Hi rog07o4z,
Sorry for late reply, could you please file any tlt topic into TLT forum instead of DS forum in future? Since I will always check TLT forum topics but may miss some notifications from DS forum.

For your case, could you please generate int8 TRT engine with tlt-converter and config it into custom_detectnet_v2_resnet10.txt to retry?

For the process, please refer to https://devtalk.nvidia.com/default/topic/1065558/transfer-learning-toolkit/trt-engine-deployment/

For trt engine test, an example is as below.
$ /usr/src/tensorrt/bin/trtexec --int8 --loadEngine= --calib= --batch=1 --iterations=20 --output=output_cov/Sigmoid,output_bbox/BiasAdd --useSpinWait

Topic		Replies	Views
How to train Deepstream resnet10.caffemodel in DIGITS DeepStream SDK	9	2229	October 12, 2021
Little to no detection on Deepstream-App compared to TLT's infer using the same model TAO Toolkit	6	629	October 12, 2021
TX2 "INT8 not supported by platform. Trying FP16 mode" TAO Toolkit	11	2750	October 12, 2021
Transfer Learning toolkit models vs Deepstream models on the Nano TAO Toolkit	9	1912	October 12, 2021
FaceDetect IR Training using TLT 3.0 and Custom Dataset TAO Toolkit tensorrt , ai-training , deep-learning	13	1606	October 12, 2021
The tlt-converter does not work well with TensorRT 6 (Jetson TX2) TAO Toolkit	7	772	October 12, 2021
Integrating Tao Models (detectnet_v2) into Deepstream SDK TAO Toolkit tao , deepstream , jetson-nano	11	979	March 24, 2023
Little to no detection using TLT Faster-RCNN trained model on Deepstream-App TAO Toolkit	13	1074	October 12, 2021
Lack of FPS after successfully deploy TLT to Deepstream. DeepStream SDK	18	1005	April 27, 2020
Can't get TLT trained model get to work on Deepstream - Jetson (NX) DeepStream SDK	4	1188	October 12, 2021

Resnet10 "primary detector"

Related topics