Production Deep Learning with NVIDIA GPU Inference Engine

Originally published at:

Figure 1. NVIDIA GPU Inference Engine (GIE) provides even higher efficiency and performance for neural network inference. Tests performed using GoogLenet. CPU-only: Single-socket Intel Xeon (Haswell) E5-2698 v3@2.3GHz with HT.GPU: NVIDIA Tesla M4 + cuDNN 5 RC.GPU + GIE: NVIDIA Tesla M4 + GIE. [Update September 13, 2016: GPU Inference Engine is now TensorRT] Today…

Will the new fused kernels be available in cuDNN?

The fusion of layers is dynamic, so the fused kernels are generated by TensorRT at optimization time.

Thanks for your rapid reply! nvprof on GIE shows that there are kernels like `maxwell_scudnn_winograd_128x128_mobile_relu_tile148t_nt`; and for normal developers without access to cudnn source code, it would be difficult to implement such kernel fusing. The problem is that we do have our own inference framework that needs to be deployed on all platforms, not only GPU, and completely migrating to TensorRT is not really feasible. So would it be possible that TensorRT provides a lower-level API, like cuDNN, to enable users directly calling the fused kernels? Thanks a lot!

Thank your for the great article.

I have just tested out the listing 1 code by specifying the path to the caffe model and the path to prototxt file. Unfortunately I got a seg fault in the cuda engine build (last line). How can I debug this ?

can someone tell me how to find the document or the use code (example) of TensorRT? it's so wired.

Hi Kai. We are working on including the fused kernels in a future release of cuDNN. Thanks!

Sign up for the release candidate testing at and when you get the download bundle it will contain the header files, a full set of documentation, and three example programs that you can examine, build and run.

We will be releasing fused kernels in a future release of cuDNN.

Really glad to hear that! Thanks very much for your effort :)

This article mentions that GIE supports networks trained in tensorflow and other frameworks. However, the release candidate examples focus specifically on Caffe and it only has a parser for Caffe network definition files. Will there be a parser or some examples for tensorflow in the first release?

What operating systems are supported for deployment? Does this only work on Linux? Is Windows 7 or 10 supported as a deployment OS?

Linux right now. We are looking at Windows support in a future release.

GIE (TensorRT) has a documented API that you can use to describe a network that you trained using any framework. Right now it has a parser which makes it especially easy to import a model from Caffe. We will have an example code as part of the bundle that will show using the API to express a network.

Djeb, sorry for the delay in responding. Is this still an issue or have you resolved it?

Hi Chris, Yeah, still having this issue. Ant thoughts ? I have cudnn 5.1 with a GTX 1070.

How to benchmark GoogleNet performance for TensorRT? I have Jetpack 2.3 installed on TX1, I can run ImageNet classification but I cannot find out the images/sec performance.

I tried to test yours TensorRT samples with my caffe nets. And I recieved the following messages.

1) If my net contains Eltwise-Max layer then error:
"cudnnElementWiseLayer.cpp:51: virtual void nvinfer1::cudnn::ElementWiseLayer::execute(const nvinfer1::cudnn::CommonContext&): Assertion `mParams.operation == ElementWiseOperation::kSUM' failed."

2) If my net contains TanH layer then error:
"could not parse layer type TanH
Engine could not be created".

Here ( is written that "GIE supports the following layer types.
- Activation: ReLU, tanh and sigmoid
- ElementWise: sum, product or max of two tensors"
This is exactly my cases.
Are these two layers not supported yet in first release indeed? Is it my bugs?

Thank you.

Thanks very much for reporting this! There is a bug with eltwise and a gap in the parser for tanh. We have bugs filed for each of these in our tracking system.