Linux distro and version: Ubuntu 16.04.5 LTS
GPU type: GTX 1080TI / Tesla V100-SXM2-16GB
nvidia driver version: 396.26 / 396.54
CUDA version: 7.0? or the one installed in the nvcr.io/nvidia/tensorrt:18.08-py3
CUDNN version: V9.0.176
Python version: 3.5.2
Tensorflow version: Built from source: r.1.11
TensorRT version: 4.0.1.6
I have been testing optimizing SSD Inception V2 with TensorRT. I have a GTX 1080Ti locally and I have ran some tests on my own datasets. I am using the same code for converting and running the model. And everything is ran inside of a Docker container. On my local PC with GTX 1080Ti has 75% maP and on the V100 it drops to 55% maP. The speed also got worse, from 0.038 on GTX 1080Ti to 0.05 on the V100. I have tried saving the converted network from my local 1080Ti and launching it on the V100, but it throws an error with customWinograd… Internet ways that the TensorRT builds the networks for specific GPU architectures and that they are not portable.
I am also converting the networks using examples from this repository: https://github.com/NVIDIA-Jetson/tf_trt_models
The code for launching the network is below:
import tensorflow.contrib.tensorrt as trt
import tensorflow as tf
import numpy as np
from tf_trt_models.detection import download_detection_model, build_detection_graph
config_path, checkpoint_path = download_detection_model(MODEL, 'data')
frozen_graph, input_names, output_names = build_detection_graph(
config=config_path,
checkpoint=checkpoint_path
)
trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,
outputs=output_names,
max_batch_size=8,
minimum_segment_size=50,
max_workspace_size_bytes=1 << 25,
precision_mode='FP32',
is_dynamic_op=False
)
tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True
self.tf_sess = tf.Session(config=tf_config)
tf.import_graph_def(trt_graph, name='')
self.tf_input = self.tf_sess.graph.get_tensor_by_name(input_names[0] + ':0')
self.tf_scores = self.tf_sess.graph.get_tensor_by_name('scores:0')
self.tf_boxes = self.tf_sess.graph.get_tensor_by_name('boxes:0')
self.tf_classes = self.tf_sess.graph.get_tensor_by_name('classes:0')
scores, boxes, classes = self.tf_sess.run([self.tf_scores, self.tf_boxes, self.tf_classes], feed_dict={
self.tf_input: frames
})