Unable to generate TensorRT graph on RTX 2080 Ti

lovaraju · September 19, 2019, 12:01pm

I generated tensorrt inference graph from tensorflow yolo model using TF-TRT. I could not find ‘TRTEngineOp’ in the generated graph. That means tensorrt is falling back to tensorflow. Program is running without any error.
There is no ‘TensorRTOptimizer’ call in the log.
Here is the TensorRT graph generation log.

2019-09-18 11:28:03.893139: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:716] Optimization results for grappler item: tf_graph
2019-09-18 11:28:03.893229: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718]   constant folding: Graph size after: 113 nodes (0), 114 edges (0), time = 31.321ms.
2019-09-18 11:28:03.893271: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718]   layout: Graph size after: 141 nodes (28), 166 edges (52), time = 15.636ms.
2019-09-18 11:28:03.893299: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718]   constant folding: Graph size after: 141 nodes (0), 166 edges (0), time = 17.429ms.

2019-09-18 11:28:04.171328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545

Here are my cuda stack versions.
CUDA : 10.0
cudnn : 7.6.2.24-1+cuda10.0
tensorflow : 1.14.0
tensorrt : 5.1.5-1+cuda10.0
OS : ubuntu 16.04
GPU : RTX 2080 Ti

Earlier I was able to run my models on tensorrt on GTX 1080 Ti with following configuration.
CUDA : 9.0
cudnn : 7.5.0+cuda9.0
tensorflow : 1.11.0
tensorrt : 4.0.1.6-1+cuda9.0
OS : ubuntu 16.04
GPU : GTX 1080 Ti

I upgraded to higher versions as the same configuration is not working on RTX 2080 Ti.
Please help me in generating proper inference graph using tensorrt.

SunilJB · November 18, 2019, 6:45am

Hi,

You can use nvprof as below to check if your algorithm is using Tensor Cores:
nvprof python run_inference.py

If your algorithm is not using Tensor Cores, you can do a few things to debug and understand why:

Use command nvidia-smi on the command line to confirm that the current hardware architecture are Volta or Turing GPUs.
Operators such as Fully Connected, MatMul, and Conv can use Tensor Cores. Make sure that all dimensions in these ops are multiples of 8 to trigger Tensor Core usage.
For Matrix multiplication: M, N, K sizes must be multiples of 8. Fully-connected layers should use multiple-of-8 dimensions. If possible, pad input/output dictionaries to multiples of 8.

Note that in some cases TensorRT might select alternative algorithms not based on Tensor Cores if they perform faster for the chosen data and operations.
Please refer below blog for more details:

Can you please share the verbose log using below command along with the script & model to reporduce the issue?
TF_CPP_VMODULE=segment=2,convert_graph=2,convert_nodes=2,trt_engine=1,trt_logger=2 python …

Meanwhile, please refer to examples in below link:

Thanks