Resnet10 "primary detector"

Hey,
Could I get more information about the “primary detector” of Deepstream. It is a resnet10 caffemodel and I figured that it is much faster compared to standard approaches like Yolo or SSD (3-4 times faster).

I am curious how the model was trained and about the performance. Is there any documentation?

Thanks

1 Like

Hi,

You can find some model information in our NGC website:
https://ngc.nvidia.com/catalog/models/nvidia:tlt_iva_object_detection_resnet10

This model can be re-trained by Transfer Learning Toolkit now.
https://ngc.nvidia.com/catalog/containers/nvidia:tlt-streamanalytics

You can also find some performance data in our document:
https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream_Development_Guide%2Fdeepstream_performance.html

Thanks.

Thanks for the information!
I still don’t really understand with which algorithm it is trained as it doesn’t seem to go with SSD/FasterRCNN/DetectNetv2. What is the benefit of it compared to ResNet10-SSD, ResNEt10-FasterRCNN, …

Furthermore how can I re-train it?
https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#training_models
only addresses SSD, FasterRCNN, DetectNetv2.

What about performance regarding accuracy (precision, recall, or similar)?

I have asked the same question and got the same answer.

We all want to know what you trained the “resnet10 caffemodel” model with:SSD? FasterRCNN? DetectNetv2?

Is it trained with a special NVIDIA Secret Squirrel algorithm?

Is there a Easter Egg in the Transfer Learning Toolkit that gives us a key to unlock the answer?

For the time being I have started retraining the examples in the jupyter notebook with the resent10 model and running it on the NANO.

Hopefully one of them can match the speed of the “resnet10 caffemodel” example.

adventuredaisy: I would appreciate if you could share your experiences regarding the different models you try out.

Here’s what I have found so far:

  1. the hdf5 file inside this docker container (https://ngc.nvidia.com/catalog/models/nvidia:tlt_iva_object_detection_resnet10) is ~ 19 MB

  2. the hdf5 file for the resnet10_detectnet_v2 model which you can download from inside the TLT-Docker using this command:

ngc registry model download-version

is ~40 MB

I am guessing the mode from #1 is NOT trained using DetectNet.

PS: I have fine-tuned

  1. resnet10-detectnet,
  2. resnet18-detectnet and
  3. resnet50-detectnet

All models were trained on the SAME dataset, with SAME iou, SAME learning rate and the SAME number of epochs.

All models were pruned and then re-trained using the same parameters.

When running them on the NANO in FP16 mode (yes, I exported the models in FP16 mode too), I have found NO difference in performance (speed or accuracy) across the models.

Note that I am running 6x video-streams from .mp4 files on the sd-card on the Nano. The resolution of each of these streams is 640x480 (a far cry from the 1080p resolutions in the benchmark).

Even at these super low resolution, NONE of the models were able to perform inference at 30fps (there were at 4-5 fps). (jetson_clocks was on, power mode = MAXN)

Interestingly, the only way I could increase the inference speed in DeepStream on the NANO for these models was to change the inference-interval value in the config file to 6 (i.e., run the inference on every 6th frame).

Kinda, disappointing so far.

I’d love to get some tips on how to reach the speed of the resnet10 model they use in their demo DeepStream app(s).

Thank you for sharing your experience. That saves some investigation time. Let’s wait for some answers of nvidia :)

The model in official DS sample is essentially a DetectNet_v2 model with Resnet10 as backbone.
But for TLT, the pre-trained model is not a copy of Deepstream’s model.

TLT can help users train on their own dataset,but DS cannot. Users can use those TLT pre-trained models as starting points for transfer learning and training.

Thanks for the info.

What I need to know is with enough tweaking and training will the TLT “DetectNet_v2 model with Resnet10”
achieve the same performance as the DS sample?

My focus is on the getting the performance out of the nano.

If I’m not going to be able to match the performance of the DS sample using the TLT “DetectNet_v2 model with Resnet10” then It would be nice to know so I can focus my attention elsewhere on achieving the performance I need.

Im not asking to be spoon-fed but to show us the amazing potential of the nano performance and to not be more forthright
about how we can achieve this with our own data is Bad Form.

Actually I have also no answer since I have never run the experiment — to reproduce the DS model with a tlt resnet10 detectnet_v2 pretrained model.
But I have an idea that you can generate a new tlt pretrained model and replace the one from ngc.
You can transfer the DS resenet10 caffe model into h5 via Keras.
Just an idea for your reference. To change the starting points of TLT.

Thanks for the tip

If anybody is interested here is a link to a caffemodel to HDF5 converter:
https://github.com/pierluigiferrari/caffe_weight_converter
I like this one because it only converts the weights.

Here is also a link to a nifty little .h5/HDF5 viewer:
https://www.neonscience.org/explore-data-hdfview
I have only tried the Windows version.
It works pretty good and its simple.

Hi Morganh,
What’s the point of converting “DS resenet10 Caffe model into h5 via Keras”. Are you saying we should train the converted keras model in tlt.
But TLT doesn’t support Keras model as mentioned here
https://devtalk.nvidia.com/default/topic/1064782/transfer-learning-toolkit/how-to-convert-caffe-model-to-tlt-model-/post/5393763/#5393763

Hi maniKTL,
No, I do not mean that.
There should be no issue for converting “DS resenet10 Caffe model into h5 via Keras”.
But for tlt, officially mentioned in tlt doc, only the pre-trained models inside ngc.nvidia.com can be used to train.

Hi Morganh,
Thanks for clearing that. But still the question remains that we all are seeking.

  1. I am not going after accuracy right now as nvidia trained with its private dataset. But I just want the same speed as DS Detectnetv2.
  2. If DS detectnetv2 isn’t trained with tlt, then is it trained with DIGITS?
  3. Can you ask for favor with the team that worked on developing detectnetv2 or connect them in this thread.

Hi Morganh,
I finally found time to train a custom model (DetectNet_v2 + Resnet10) to reproduce the benchmark performance of that “primary detector” in deepstream. I must say, I didn’t get even close.
On 5 streams I achieve with the primary detector 28 fps, and with custom one 10 fps.

Here some information.
In deepstream I used the deepstream-app and for comparing I used the exact same config, only in the [primary-gie] the config-file of the model is switched out.

Here the two different config files of the models:

config_infer_primary.txt (Benchmark primary detector):

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-file=../../models/Primary_Detector/resnet10.caffemodel
proto-file=../../models/Primary_Detector/resnet10.prototxt
model-engine-file=../../models/Primary_Detector/resnet10.caffemodel_b30_int8.engine
labelfile-path=../../models/Primary_Detector/labels.txt
int8-calib-file=../../models/Primary_Detector/cal_trt.bin
batch-size=5
process-mode=1
model-color-format=0
network-mode=1
num-detected-classes=4
interval=0
gie-unique-id=1
output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid

custom_detectnet_v2_resnet10.txt (trained and pruned with tlt):

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
int8-calib-file=/detectnet_v2_resnet_10/calibration.bin
labelfile-path=/detectnet_v2_resnet_10/classes.txt
tlt-encoded-model=/detectnet_v2_resnet_10/resnet10_detector.etlt
tlt-model-key=blablabla
batch-size=5
uff-input-blob-name=input_1
uff-input-dims=3;608;608;0
process-mode=1
model-color-format=0
network-mode=1
num-detected-classes=2
interval=0
gie-unique-id=1
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd

In the tlt I built on the example script detectnet_v2.ipynb, prepared my own data and switched resnet18 to resnet10. It trains successfully and I can detect the objects after training (as well in deepstream). After pruning there are 19368 weights left.

Can you tell me, why I get such a bad performance?

1 Like

Hi rog07o4z,
Sorry for late reply, could you please file any tlt topic into TLT forum instead of DS forum in future? Since I will always check TLT forum topics but may miss some notifications from DS forum.

For your case, could you please generate int8 TRT engine with tlt-converter and config it into custom_detectnet_v2_resnet10.txt to retry?

For the process, please refer to https://devtalk.nvidia.com/default/topic/1065558/transfer-learning-toolkit/trt-engine-deployment/

For trt engine test, an example is as below.
$ /usr/src/tensorrt/bin/trtexec --int8 --loadEngine= --calib= --batch=1 --iterations=20 --output=output_cov/Sigmoid,output_bbox/BiasAdd --useSpinWait