Deploying Deep Neural Networks with NVIDIA TensorRT

Originally published at: https://developer.nvidia.com/blog/deploying-deep-learning-nvidia-tensorrt/

Figure 1: NVIDIA Tensor RT provides 16x higher energy efficiency for neural network inference with FP16 on Tesla P100. Editor’s Note: An updated version of this, with additional tutorial content, is now available. See “How to Speed Up Deep Learning Using TensorRT“. NVIDIA TensorRT is a high-performance deep learning inference library for production environments. Power…

https://www.facebook.com/gr... All things new and interesting on the frontier of A.I. and Deep Learning.

Hi all, I would like to enquire more on the methods to use TensorRT on Faster RCNN using a ZF/VGG16 model. I'm trying to carry out a real time object detection using Faster RCNN on a Jetson TX1. I know that for convenience, I should use DetectNet instead, however, I was assigned to use the Faster RCNN framework. With ./jetson_clocks.sh, the fastest detection time took 0.48s for 300 object proposals. As such, it would like to make use of TensorRT to reduce the detection time.

I researched and read up a lot of forums, including https://github.com/dusty-nv..., but I'm still confused on the methods to implement TensorRT on a Faster RCNN caffe model. I tried executing ./giexec --model=/usr/src/gie_samples/samples/data/samples/googlenet/googlenet.caffemodel --deploy=/usr/src/gie_samples/samples/data/samples/googlenet/googlenet.prototxt --output=prob --half2=true --batch=12 and I got around 63ms. Thus, I would like to use a small net to run my detection task. (With --batch=2, I get around 14ms)

I have trouble understanding and following the steps on this page and the dusty-nv/jetson-inference page as I do not know which file to edit, which part to edit and etcetc. Are there any guides or other websites that are more comprehensive which you can recommend?

Alternatively, I tried to run demo.py (I'm using py-faster-rcnn) using the googlenet.caffemodel but I ran into "Check failed: K_ == new_K (1024 vs. 281600) Input size incompatible with inner product parameters."
Also, how do I enable the use of TensorRT when running the detection with VGG16 or ZF.

Thank you and I would really appreciate any help given!

Hi ALL, I do really want to know how to use TensorRT to run a detector on Faster-RCNN framework. Do you have any idea?

when i use rndint8calibrator to run VGG19 for ILSVRC2015, why i couldn't
get the same classification accuracy, there is about 5% decrease in
accuracy.

try new release version 2.1

Hi,
can you share more details how to use it?

TensorRT 2.1 provides a series of samples. One of those samples is a Faster R-CNN. The ROI pooling and faster R-CNN style reshape layers are provided as IPlugin layers.

Charles -- that calibrator is just intended for cases where you want to test the speed of a network in INT8 without regard to accuracy. rnd is for "random". You want to use Int8EntropyCalibrator. See the users guide for details.

Hi All,
Now I can use TensorRT to run any model, but only one model at same time. If I want to run multiple models at the same time. Can TensorRT be able to achieve this?

There are multiple ways you can do this. You can create more than one engine and switch back and forth between them within your code. Additionally TensorRT can make use of CUDA concepts like contexts and streams to allow you to coordinate work related to two or more DL models on the same GPU. This is helpful to arrange for overlapping communication and computation. See the CUDA documentation for details.

Hi Leon,

I am working in a similar project than yours to detect and to track players in a soccer field, I have used different techniques though. But, I can see you are using Yolo to the same task and it seems quite interesting. I wonder if it would be possible to contact via email so I can discuss with you my steps I have done until now and also yours as I said before it seems quite promising.

But, if you prefer it we can open a discuss here in this forum or if you have any other suggestion, please let me know as well.

I am looking forward hearing from you.

Kind regards,

Hemecha

Hi Chris! I have a similar question. I can see that TensorRT allows us to make use of CUDA concepts like contexts and stream to run multiple models on the same GPU.

I'm curious if this is an advisable approach? How would you expect it to affect throughput and overall latency? Does it matter whether the models are run from the same host thread or if they are run from different host threads?

I just call the same caffeToGIEModel twice, with different stream out; but it broke with" segmentation fault" only the second time ; I locate it on pasering the model ;
solved now.
if load multi-models, note: the function (ShutdownProtobufLibrary())in last line of caffeToGIEModel, only shutdown once;

Chris, what kinds of DNNs require fp16 for inference vs int8? Do you think apps which require fp16 could be refactored to use the efficiency of int8? Promising improvements that you reported!

Hi Chris & Team!

My company is trying to deploy a DNN and we want to optimize it with TensorRT, however we need to deploy it within the client application, and this means deploying it in a Windows10 / C++ environment. On the NVidia site it states that you can use TensorRT to " deploy fast AI services to platforms running Linux, Microsoft Windows, BlackBerry QNX or Android operating systems". I have downloaded TRT 3 RC1 but do not see an infer library for Windows. Is this supported? Do I need to install it all on an Ubuntu system first to get access to the Windows library? And if not supported, any idea where Windows support is on the Roadmap?

Thanks!
Brian

Brian,

Thanks for writing. We are working on adding windows support to TensorRT, but that will roll out to customers in a 3.X or 4.0 release early next year. *

*: current estimate, all plans and estimates subject to change at any time without notice.

I will work with my colleagues to find the potentially misleading comment about windows and remove it.

Thanks,
Chris

Hi Chris,

I am trying to compare googlenet in Tensorrt and Caffe, when I run the googlenet sample in Tensorrt, I found there is no real data input, so I want to know do you have official sample code of googlenet dealing with real data input or what did you use to train your googlenet?