Xavier: Yolo ~15fps << ResNet10 120fps ?


(Jetson Xavier, DS3 TFRT 5)

I get very good results when running the SDK samples with the ResNet10 (~120fps for single stream).
I then d/n,built and ran the Yolov3 sample modified for DS and TesnorRT (https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps)

I get ~15fps, at best.

In both I used the video supplied in the SDK sample (sample_720p.mp4 for the SDK sample, sample_720p.h264 for Yolov3).

Are these the expected results, or am I doing something completely wrong here ?


The sample is to demonstrate how to enable a plugin for deepstream and still has some room for optimization.
For example, the memory copy between the video stream and TensorRT.

We will try to optimize it this week and will update information with you later


Thanks for the answer and good luck in optimisation.

I noticed the “ResNet10” and “ResNet18” seems very fast, but I could not find any documentation how to train them myself. Is there some documentation about that ?

Also - what is the “textbook solution” from NVIDIA for object detection network, one that I wish to train myself on my classes with my training dataset, to run fast on the Jetson/TensorRT platform ? (more specifically, Xavier) ?


ResNet10” and “ResNet18” are the customized version of the standard resnet.
You can use DIGITs to train it to your use-case.
Here is the tutorial for DIGITs: https://github.com/NVIDIA/DIGITS/blob/master/examples/object-detection/README.md
You can find the pretrained model in ${DeepStream}/samples/models/Primary_Detector.

Another recommendation is detectNet.
You can find more information on our tutorial:



The picture is still a bit blur.

When I enter the ResNet18 (or ResNet10) directly into the DIGITS, its complains. I also not sure what other params to set for the learning (loss function etc).

I agree the better option, as you suggested, is to use DetectNet. I took the dog detection example, and it runs fine on images. But when I try to integrate it with DeepStream, nothing is detected (I did my best finding good dog movies :-). I tried playing with the params (for example net-scale-factor, resolution etc) , as well as using different Bounding Box Parse function - no success. I put some debug printing in my BB parse functions, the numbers for the coverage does not make sense, so its something with either the network or the stream definition.

Any ideas? Is there a working sample of how to train Object detection network, and then run it in DS ? I tried also to use some params from https://github.com/AastaNV/DeepStream, but it looks a bit old and not working smoothly.

Thanks, Best


Please update the bounding box parser since the semantic meaning of detectNet and resnet are different.
Try to compile the parser code in #5 and update the library name and path of the config file.

For example: