Xavier: Yolo ~15fps << ResNet10 120fps ?

dannykario · March 13, 2019, 10:36am

Hi,

(Jetson Xavier, DS3 TFRT 5)

I get very good results when running the SDK samples with the ResNet10 (~120fps for single stream).
I then d/n,built and ran the Yolov3 sample modified for DS and TesnorRT (GitHub - NVIDIA-AI-IOT/deepstream_reference_apps: Samples for TensorRT/Deepstream for Tesla & Jetson)

I get ~15fps, at best.

In both I used the video supplied in the SDK sample (sample_720p.mp4 for the SDK sample, sample_720p.h264 for Yolov3).

Are these the expected results, or am I doing something completely wrong here ?

AastaLLL · March 14, 2019, 5:26am

Hi,

The sample is to demonstrate how to enable a plugin for deepstream and still has some room for optimization.
For example, the memory copy between the video stream and TensorRT.

We will try to optimize it this week and will update information with you later
Thanks.

dannykario · March 14, 2019, 7:34am

Hi,

Thanks for the answer and good luck in optimisation.

I noticed the “ResNet10” and “ResNet18” seems very fast, but I could not find any documentation how to train them myself. Is there some documentation about that ?

Also - what is the “textbook solution” from NVIDIA for object detection network, one that I wish to train myself on my classes with my training dataset, to run fast on the Jetson/TensorRT platform ? (more specifically, Xavier) ?

AastaLLL · March 19, 2019, 5:46am

Hi,

“ResNet10” and “ResNet18” are the customized version of the standard resnet.
You can use DIGITs to train it to your use-case.
Here is the tutorial for DIGITs: https://github.com/NVIDIA/DIGITS/blob/master/examples/object-detection/README.md
You can find the pretrained model in ${DeepStream}/samples/models/Primary_Detector.

Another recommendation is detectNet.
You can find more information on our tutorial:

github.com

dusty-nv/jetson-inference/blob/master/docs/detectnet-console-2.md

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-header.jpg" width="100%">
<p align="right"><sup><a href="imagenet-camera-2.md">Back</a> | <a href="detectnet-camera-2.md">Next</a> | </sup><a href="../README.md#hello-ai-world"><sup>Contents</sup></a>
<br/>
<sup>Object Detection</sup></s></p>

# Locating Objects with DetectNet
The previous recognition examples output class probabilities representing the entire input image.  Next we're going to focus on **object detection**, and finding where in the frame various objects are located by extracting their bounding boxes.  Unlike image classification, object detection networks are capable of detecting many different objects per frame.

<img src="https://github.com/dusty-nv/jetson-inference/raw/dev/docs/images/detectnet.jpg" >

The [`detectNet`](../c/detectNet.h) object accepts an image as input, and outputs a list of coordinates of the detected bounding boxes along with their classes and confidence values.  [`detectNet`](../c/detectNet.h) is available to use from [Python](https://rawgit.com/dusty-nv/jetson-inference/python/docs/html/python/jetson.inference.html#detectNet) and [C++](../c/detectNet.h).  See below for various [pre-trained detection models](#pre-trained-detection-models-available)  available for download.  The default model used is a [91-class](../data/networks/ssd_coco_labels.txt) SSD-Mobilenet-v2 model trained on the MS COCO dataset, which achieves realtime inferencing performance on Jetson with TensorRT. 

As examples of using the `detectNet` class, we provide sample programs for C++ and Python:

- [`detectnet.cpp`](../examples/detectnet/detectnet.cpp) (C++) 
- [`detectnet.py`](../python/examples/detectnet.py) (Python) 

These samples are able to detect objects in images, videos, and camera feeds.  For more info about the various types of input/output streams supported, see the [Camera Streaming and Multimedia](aux-streaming.md) page.

### Detecting Objects from Images

This file has been truncated. show original

Thanks.

dannykario · March 20, 2019, 5:41am

Hi,

The picture is still a bit blur.

When I enter the ResNet18 (or ResNet10) directly into the DIGITS, its complains. I also not sure what other params to set for the learning (loss function etc).

I agree the better option, as you suggested, is to use DetectNet. I took the dog detection example, and it runs fine on images. But when I try to integrate it with DeepStream, nothing is detected (I did my best finding good dog movies :-). I tried playing with the params (for example net-scale-factor, resolution etc) , as well as using different Bounding Box Parse function - no success. I put some debug printing in my BB parse functions, the numbers for the coverage does not make sense, so its something with either the network or the stream definition.

Any ideas? Is there a working sample of how to train Object detection network, and then run it in DS ? I tried also to use some params from GitHub - AastaNV/DeepStream, but it looks a bit old and not working smoothly.

Thanks, Best
Danny

AastaLLL · March 22, 2019, 8:19am

Hi,

Please update the bounding box parser since the semantic meaning of detectNet and resnet are different.
Try to compile the parser code in #5 and update the library name and path of the config file.

For example:
[url]DeepStream/DetectNet.txt at master · AastaNV/DeepStream · GitHub

Thanks.