How to run GoogleNet example in Multimedia API


I am interested in running the googlenet car detection sample given with the multimedia api. I would like to test it in realtime video stream as detailed here:

What optimizations do I need to use? And how do I translate the googlenet output into a bounding box? It is unclear to me from just the .prototxt and .caffemodel files how to do this? Could somebody share an example program?


Here is the sample for using detectNet on jetson tx1.


I think you misunderstood me. In the link I posted I am referring to the following excerpt.

“In contrast to core image recognition, object detection provides bounding locations within the image in addition to the classification, making it useful for tracking and obstacle avoidance. The Multimedia API sample network is derived from GoogleNet with additional layers for extracting the bounding boxes. At 960×540 half-HD input resolution, the object detection network captures at higher resolution than the original GoogleNet, while retaining real-time performance on Jetson TX1 using TensorRT.”

I attended the Jetson dev meetup a few months back, and I saw this net being deployed in real time at 30 fps on video. I would like to test out this functionality for myself. Can you please guide me?

DetectNet use googleNet as classification model and also output detection bounding boxes.

Please check this jetson inference page mentioned in #2:

This sample shows how to use camera as tensorRT input as well as translate output of trt into meaningful bounding boxes.

I ran this and it definitely doesn’t run anywhere close to 30 fps? Is there anything I am missing?

I also don’t see anything specific to car detection?

The network I saw being demoed was running at 30 fps on HD video and could detect cars in real time. What was that, and how do I run it?

For car detector, please follow this page to train a network for your own use-case.

For better performance, the page you mentioned use tensorRT and MMAPI.
Both can be installed with JetPack as well as some sample can guide you to use it.

I believe the detectNet demos in the repo you linked use tensorRT and MMAPI and do not run anywhere close to 30 fps. How was it that the demo being shown off by the Jetson/NVIDIA team at the Jetson meetup I went to ran at 30fps on HD video stream?


DetectNet can reach about 11fps and gives you an overview of how to deal with DIGITs/TensorRT/MMAPI.
If you are care more about performance, it’s recommended to replace googlenet, which is embedded in detectNet, with other light weight models.

We provided detailed tutorial on how to train and deploy your own model fast with our GPU.

Then how did the demo at the Jetson dev meetup run at 30 fps and detect cars in real time? They said they were running googlenet

Not all our samples is available to the forum user but hardware is the same.

DetectNet will guide you to make good use of nvidia powerful GPU.

Would you be able to offer any hints as to how to optimize GoogleNet specifically?

11 fps on DetectNet is a signficiant dropoff in performance compared to 30 fps GoogleNet. One is barely real time and pretty much unusable in any application while the other is extremely powerful. That is a pretty big discrepancy.

There are also two samples of tensorRT located in Multimedia API package.
Please remember to install JetPack with Multimedia API package.

./04_video_dec_gie/video_dec_gie ../data/video/sample_outdoor_car_1080p_10fps.h264 H264 \
                --gie-deployfile ../data/model/GoogleNet-modified.prototxt \
                --gie-modelfile ../data/model/GoogleNet-modified-online_iter_30000.caffemodel \
                --gie-forcefp32 0 --gie-enable-perf 1
./backend/backend 1 ../data/video/sample_outdoor_car_1080p_10fps.h264 H264 \
                --gie-deployfile ../data/model/GoogleNet-modified.prototxt \
                --gie-modelfile ../data/model/GoogleNet-modified-online_iter_30000.caffemodel \
                --gie-forcefp32 0 --gie-proc-interval 1 -fps 10

You can set --gie-proc-interval to 3 which force application to run prediction every 3 frames.
This will show you 30fps display rate as well as 10fps detection rate.

Thanks but that doesn’t really answer my question. These examples run at 10 fps detection rate. My question. was how to achieve a 30 fps detection rate like the example I referred to from the dev meetup - or at least hints other than use TensorRT and the multimedia API, or use a shallower network, as all the examples you’ve shown do exactly that but don’t run in real time

Not all our samples are available to the forum user.
But we provide lots of samples to demonstrate how to make a good utilization of GPU.

Optimization is varying. Just give it a try.
Backend sample should be a good starting point for you since it focuses on car detection problem.