TensorRT 5 and TensorRT 7 conversion discrepancy

Provide details on the platforms you are using:

Linux distro and version - Ubuntu 18.04.4 LTS
GPU type - Tesla T4 (AWS g4dn.xlarge) and Volta V100 (AWS p3.2xlarge)
Nvidia driver version - 450.36.06
CUDA version - 10.0
CUDNN version - 7.6
Python version [if using python] - 3.6
Tensorflow version - 1.14
TensorRT version - 7

Describe the problem

Using tensorrt 5 (tf 1.15) for conversion, results:
numb. of all_nodes in frozen graph: 3408
numb. of trt_engine_nodes in TensorRT graph: 7
numb. of all_nodes in TensorRT graph: 892

using Tensort 7.0.0 (tf 1.14) for conversion. results:
numb. of all_nodes in frozen graph: 3408
numb. of trt_engine_nodes in TensorRT graph: 0
numb. of all_nodes in TensorRT graph: 1789

Files

Code used for conversion:
import tensorflow as tf
from tensorflow.python.platform import gfile
from tensorflow.python.compiler.tensorrt import trt_convert as trt

frozen_graph = ‘/home/ubuntu/Tensorrt/frozen_model.pb’

with open(frozen_graph, ‘rb’) as f:
frozen_graph_gd = tf.GraphDef()
frozen_graph_gd.ParseFromString(f.read())

if tf.test.gpu_device_name():
print(‘Default GPU Device:{}’.format(tf.test.gpu_device_name()))
else:
print(“Please install GPU version of TF”)

trt_graph = trt.create_inference_graph(
input_graph_def = frozen_graph_gd,
is_dynamic_op=True,
outputs=[‘num_detections:0’, ‘detection_boxes:0’, ‘detection_scores:0’,‘detection_classes:0’],
max_batch_size=32,
max_workspace_size_bytes=2*(10**9),
precision_mode=“FP16”)

with gfile.FastGFile(“/home/ubuntu/Tensorrt/TensorRT_model.pb”, ‘wb’) as f:
f.write(trt_graph.SerializeToString())
print(“TensorRT model is successfully stored!”)

all_nodes = len([1 for n in frozen_graph_gd.node])
print(“numb. of all_nodes in frozen graph:”, all_nodes)

#check how many ops that is converted to TensorRT engine
trt_engine_nodes = len([1 for n in trt_graph.node if str(n.op) == ‘TRTEngineOp’])
print(“numb. of trt_engine_nodes in TensorRT graph:”, trt_engine_nodes)
all_nodes = len([1 for n in trt_graph.node])
print(“numb. of all_nodes in TensorRT graph:”, all_nodes)

Include any logs, source, models (.uff, .pb, etc.) that would be helpful to diagnose the problem.

Cannot share .pb due to infosec.
No error reported in logs. Identical for both cases.

Reproducibility

Using the above code and any detection model.

Hi @mishall.swain,
Request you to share your model and verbose logs.
Also is this related to TF-TRT conversion?

Thanks!

I am able to reproduce the same issue with this SSD.pb

Is there any specific command or param to get these logs?

2020-09-22 16:00:31.873374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2020-09-22 16:00:31.873390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2020-09-22 16:00:33.137892: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:716] Optimization results for grappler item: tf_graph
2020-09-22 16:00:33.137946: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 6823 nodes (1), 11475 edges (2), time = 547.209ms.
2020-09-22 16:00:33.137961: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] layout: layout did nothing. time = 4.224ms.
2020-09-22 16:00:33.137972: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 6823 nodes (0), 11475 edges (0), time = 196.495ms.
WARNING:tensorflow:From convert1.py:28: FastGFile.init (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
numb. of all_nodes in frozen graph: 6822
numb. of trt_engine_nodes in TensorRT graph: 0
numb. of all_nodes in TensorRT graph: 6823

For this model, TRT7 is unable to create any trt_engine_nodes as well as almost no reduction in all_nodes.
TRT5 is able to create engine nodes and reduce all nodes as well.

Yes

Please check the below link

Thanks!

Hi @AakankshaS due to the verbose logging, I realized that my code was not using GPU at all.

I was using deepstream:20.07-triton docker with CUDA 10.2 and TRT 7.0 installed for the conversion. I had installed TF-gpu=1.15 myself inside the docker. Due to incompatibility, the TF was not utilizing the GPU. It installs correctly but does not use GPU due to few libraries being incompatible.

I also tried using tensorrt:20.01 docker with CUDA 10.2 and TRT 7.0. But I’m facing a similar issue during installing TF-GPU and using it.

[For TRT-5] I had installed TF and TRT-5 on the base instance directly, hence it was able to use the GPU.

Is there any easy way to test TFTRT conversion using TRT 7.0. I intend to use Triton Inference Server for my ML Pipeline. I have trained TF .pb files already.