Don't get any 'TRTEngineOp' after optimizing model via TensorRT in Jeton TX2

ardianumam · January 4, 2019, 5:23am

Hi,

From my frozen model here:https://drive.google.com/file/d/1tWFt7RZl4dlpud0ywy1XwetobY5qHOYO/view?usp=sharing, I get 5 ‘TRTEngineOp’ after optimizing it to TensorRT model in Desktop GPU (Geforce 1060 6Gb) so that I get faster inference time after optimizing my model. However, I get 0 ‘TRTEngineOp’ in Jetson TX2 using exactly same code below. Therefore, in TX2, tensorrt_graph gets longer inference time than its native frozen_graph. How can I solve this to get an improvement of inference time in TX2?

Thanks.

import tensorflow as tf
import tensorflow.contrib.tensorrt as trt
from tensorflow.python.platform import gfile

# function to read frozen model
def read_pb_graph(model):
  with gfile.FastGFile(model,'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
  return graph_def

graph = tf.Graph()
with graph.as_default():
    with tf.Session() as sess:
        # read frozen model
        frozen_graph = read_pb_graph('frozen_model_depth_estimation.pb')

# count how many ops in frozen model
frozen_graph_ops = len([1 for n in frozen_graph.node if str(n.op) == 'TRTEngineOp'])
print("numb. of trt_engine_ops in frozen_graph:", trt_engine_ops)
all_ops = len([1 for n in frozen_graph.node])
print("numb. of all_ops in frozen_graph:", all_ops)

# convert to trt_graph
your_outputs = ["g_conv14/Conv2D"]
trt_graph = trt.create_inference_graph(
        input_graph_def=frozen_graph,
        outputs=your_outputs,
        max_batch_size=BATCH_SIZE,
        max_workspace_size_bytes=2,
        precision_mode=MODE)

#count how many ops in trt_graph
trt_engine_ops = len([1 for n in trt_graph.node if str(n.op)=='TRTEngineOp'])
print("numb. of trt_engine_ops in trt_graph", trt_engine_ops)
all_ops = len([1 for n in trt_graph.node])
print("numb. of all_ops in in trt_graph:", all_ops)

Output in Geforce 1060

numb. of trt_engine_ops in frozen_graph: 5
numb. of all_ops in frozen_graph: 1251
INFO:tensorflow:Running against TensorRT version 4.0.1
numb. of trt_engine_ops in trt_graph: 5
numb. of all_ops in in trt_graph: 16

Output in Jetson TX2

numb. of trt_engine_ops in frozen_graph: 0
numb. of all_ops in frozen_graph: 1251
INFO:tensorflow:Running against TensorRT version 4.0.1
numb. of trt_engine_ops in trt_graph: 0
numb. of all_ops in in trt_graph: 673

NVES · January 4, 2019, 4:42pm

Hello,

Might be unrelated, but what is your BATCH_SIZE in the script? and MODE?

NVES · January 4, 2019, 5:11pm

Also, as far as I know, Python API is not available for Jetson platform yet. How did you run your python script on tx2?

ardianumam · January 5, 2019, 1:08am

Sorry, I forgot to write for the BATCH_SIZE and MODE.
BATCH_SIZE = 3
MODE = “FP32”

There are two options for optimizing deep learning model via NVIDIA tools: (i) TF-TRT and (ii) TensorRT.
Option (i) can be run in Jetson TX2, whether option (ii) TensorRT, it cannot (must use C++ API). In this case, I use option (i).

Looking forward to hearing you.

Thanks.

NVES · January 8, 2019, 5:28pm

Hello, can you provide details on the TWO platforms you are using?

Linux distro and version
GPU type
nvidia driver version
CUDA version
CUDNN version
Python version [if using python]
Tensorflow version
TensorRT version

ardianumam · January 9, 2019, 1:31am

Here is the env:

Platfrom 1 (Dekstop)
Linux distro and version: Ubuntu 16.04 64bit
GPU type: Geforce 1060 6Gb
nvidia driver version: 384.130
CUDA version: 9
CUDNN version: 7
Python version : 3.5.6
Tensorflow version: 1.9.0
TensorRT version: 4.0.1.6

Platfrom 2 (Jetson TX2)
Linux distro and version: Ubuntu 16.04 64bit
GPU type: Nvidia tegra (Jetson TX2)
nvidia driver version: given along flashing with Jetpack 3.3
CUDA version: 9.0
CUDNN version: 7
Python version : 3.5.2
Tensorflow version: 1.9.0
TensorRT version: 4.1.3

NVES · January 9, 2019, 7:42pm

Hello per engineering:

Don’t expect any difference between platforms as long as Tesorflow and its dependencies (incl. TRT) are in the same version, and the same parameters are used in TF-TRT API.

That being said, the amount of GPU memory can affect the TF-TRT conversion.
Note the TX2 has only 4GB of memory. Can the you try to use per_process_gpu_memory_fraction to reduce the amount of memory used by TF.

Could you attach the log of TX2 conversion?

ardianumam · January 10, 2019, 6:25am

Hi,

Please find the log of TX2 for the TRT conversion below. Further inquires:

There is “Number of eligible GPUs (core count >= 8): 0”, what does it mean? Because I got ‘zero’ too for converting another model in TX2, but it was successfully converted, i.e., got some ‘TensorRTEngine’ node.
I already try some values in ‘per_process_gpu_memory_fraction’, but it returns the same result, getting ‘zero’ TensorRTEngine, even though my frozen model is only 1.4Mb as I attached in the first post of this discussion. Theoretically (based on the documentation), ‘per_process_gpu_memory_fraction’ means how much memory we wanna allocate to TF and the rest will be for TensorRT, right? In platform 1 (Desktop with Geforce 1060), even I give it 0.99, meaning that TensorRT only gets 0.01 memory, plus setting “max_workspace_size_bytes” only ‘10’, the conversion is still successfully executed in Geforece 1060 (got some TensorRTEngine).

2018-12-22 01:12:54.563860: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2018-12-22 01:12:54.564063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.41GiB
2018-12-22 01:12:54.564141: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-12-22 01:12:57.573095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-22 01:12:57.573204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2018-12-22 01:12:57.573246: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2018-12-22 01:12:57.573453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1570 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-12-22 01:12:59.197879: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
2018-12-22 01:13:00.895557: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:438] MULTIPLE tensorrt candidate conversion: 23
2018-12-22 01:13:00.897676: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.897790: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:0 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 20 nodes)
2018-12-22 01:13:00.898533: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.898614: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:1 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 44 nodes)
2018-12-22 01:13:00.899103: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.899162: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:2 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 24 nodes)
2018-12-22 01:13:00.899698: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.899759: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:3 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 22 nodes)
2018-12-22 01:13:00.900325: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.900391: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:4 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 44 nodes)
2018-12-22 01:13:00.901064: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.901136: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:5 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 44 nodes)
2018-12-22 01:13:00.901528: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.901583: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:6 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 3 nodes)
2018-12-22 01:13:00.901891: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.901943: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:7 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 3 nodes)
2018-12-22 01:13:00.902392: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.902449: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:8 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 22 nodes)
2018-12-22 01:13:00.904617: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:9 due to: "Invalid argument: Output node 'model_3/block_10_expand_relu/Const' is weights not tensor" SKIPPING......( 65 nodes)
2018-12-22 01:13:00.905360: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:10 due to: "Invalid argument: Output node 'concatenate_4/concat-2-LayoutOptimizer' is weights not tensor" SKIPPING......( 4 nodes)
2018-12-22 01:13:00.905783: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.905841: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:11 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 22 nodes)
2018-12-22 01:13:00.906375: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.906434: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:12 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 44 nodes)
2018-12-22 01:13:00.906828: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.906881: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:13 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 22 nodes)
2018-12-22 01:13:00.907259: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.907320: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:14 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 22 nodes)
2018-12-22 01:13:00.907749: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.907806: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:15 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 22 nodes)
2018-12-22 01:13:00.908170: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.908244: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:16 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 22 nodes)
2018-12-22 01:13:00.908630: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.908684: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:17 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 22 nodes)
2018-12-22 01:13:00.909112: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.909168: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:18 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 42 nodes)
2018-12-22 01:13:00.909637: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.909721: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:19 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 42 nodes)
2018-12-22 01:13:00.910174: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.910233: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:20 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 22 nodes)
2018-12-22 01:13:00.910666: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.910747: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:21 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 42 nodes)
2018-12-22 01:13:00.911140: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2018-12-22 01:13:00.911193: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:22 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 22 nodes)
TRT model is stored!
trt_engine_numb: 0
all ops: 673

NVES · January 10, 2019, 5:51pm

Hello,

Per engineering,

Can you try calling create_inference_graph with is_dynamic_op set to True
Some of those messages look like they came from bugs that have been fixed in newer versions of TensorFlow. Can you try JetPack 4.1.1 Developer Preview? or update to newer tensorflow?

ardianumam · January 11, 2019, 3:09am

Hi,

I just changed TF 1.9.0 to TF 1.11 in Jetson TX2, and yes, it solves the issue. Here is the log output. By the way, is there any performance difference between: (i) TF-TRT and (ii) TensorRT C++ API? If yes, I will consider TensorRT C++ API for the further work.

2019-01-11 02:57:50.213044: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:931] ARM64 does not support NUMA - returning NUMA node zero
2019-01-11 02:57:50.213336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 3.21GiB
2019-01-11 02:57:50.213443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2019-01-11 02:57:53.465184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-11 02:57:53.465309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977]      0 
2019-01-11 02:57:53.465351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0:   N 
2019-01-11 02:57:53.465584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1570 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2019-01-11 02:57:58.318443: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
2019-01-11 02:57:58.319695: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-01-11 02:57:58.322288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2019-01-11 02:57:58.322366: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-11 02:57:58.322402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977]      0 
2019-01-11 02:57:58.322425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0:   N 
2019-01-11 02:57:58.322587: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1570 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2019-01-11 02:57:59.880434: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:853] MULTIPLE tensorrt candidate conversion: 5
2019-01-11 02:57:59.883557: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2952] Segment @scope '', converted to graph
2019-01-11 02:57:59.883671: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can't find a device placement for the op!
2019-01-11 02:57:59.907406: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2952] Segment @scope '', converted to graph
2019-01-11 02:57:59.907532: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can't find a device placement for the op!
2019-01-11 02:57:59.916611: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2952] Segment @scope '', converted to graph
2019-01-11 02:57:59.916748: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can't find a device placement for the op!
2019-01-11 02:57:59.927615: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2952] Segment @scope '', converted to graph
2019-01-11 02:57:59.927733: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can't find a device placement for the op!
2019-01-11 02:57:59.942567: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2952] Segment @scope 'model_3/', converted to graph
2019-01-11 02:57:59.942766: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can't find a device placement for the op!
2019-01-11 02:58:17.536772: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_0 creation for segment 0, composed of 113 nodes succeeded.
2019-01-11 02:58:27.695070: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_1 creation for segment 1, composed of 85 nodes succeeded.
2019-01-11 02:58:37.175154: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_2 creation for segment 2, composed of 86 nodes succeeded.
2019-01-11 02:58:45.060914: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_3 creation for segment 3, composed of 55 nodes succeeded.
2019-01-11 02:59:55.515757: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model_3/my_trt_op_4 creation for segment 4, composed of 469 nodes succeeded.
2019-01-11 02:59:55.786882: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-01-11 02:59:55.835138: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-01-11 02:59:55.943941: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-01-11 02:59:55.987589: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-01-11 02:59:56.159174: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-01-11 02:59:56.242725: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-01-11 02:59:56.357663: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-01-11 02:59:56.400123: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-01-11 02:59:56.505805: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-01-11 02:59:56.550536: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-01-11 02:59:56.582306: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:404] Optimization results for grappler item: tf_graph
2019-01-11 02:59:56.582462: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   constant folding: Graph size after: 815 nodes (-436), 836 edges (-436), time = 212.766ms.
2019-01-11 02:59:56.582493: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   layout: Graph size after: 822 nodes (7), 840 edges (4), time = 115.338ms.
2019-01-11 02:59:56.582676: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   TensorRTOptimizer: Graph size after: 19 nodes (-803), 18 edges (-822), time = 116121.641ms.
2019-01-11 02:59:56.582723: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   constant folding: Graph size after: 16 nodes (-3), 18 edges (0), time = 83.763ms.
2019-01-11 02:59:56.582746: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   TensorRTOptimizer: Graph size after: 16 nodes (0), 18 edges (0), time = 71.174ms.
2019-01-11 02:59:56.582772: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:404] Optimization results for grappler item: my_trt_op_0_native_segment
2019-01-11 02:59:56.582795: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   constant folding: Graph size after: 115 nodes (0), 117 edges (0), time = 34.937ms.
2019-01-11 02:59:56.582815: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   layout: Invalid argument: The graph is already optimized by layout optimizer.
2019-01-11 02:59:56.582839: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   TensorRTOptimizer: Graph size after: 115 nodes (0), 117 edges (0), time = 6.598ms.
2019-01-11 02:59:56.582861: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   constant folding: Graph size after: 115 nodes (0), 117 edges (0), time = 39.01ms.
2019-01-11 02:59:56.582885: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   TensorRTOptimizer: Graph size after: 115 nodes (0), 117 edges (0), time = 10.602ms.
2019-01-11 02:59:56.582907: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:404] Optimization results for grappler item: my_trt_op_1_native_segment
2019-01-11 02:59:56.582931: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   constant folding: Graph size after: 87 nodes (0), 88 edges (0), time = 34.926ms.
2019-01-11 02:59:56.582954: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   layout: Invalid argument: The graph is already optimized by layout optimizer.
2019-01-11 02:59:56.582976: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   TensorRTOptimizer: Graph size after: 87 nodes (0), 88 edges (0), time = 7.21ms.
2019-01-11 02:59:56.583001: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   constant folding: Graph size after: 87 nodes (0), 88 edges (0), time = 36.272ms.
2019-01-11 02:59:56.583023: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   TensorRTOptimizer: Graph size after: 87 nodes (0), 88 edges (0), time = 7.068ms.
2019-01-11 02:59:56.583045: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:404] Optimization results for grappler item: model_3/my_trt_op_4_native_segment
2019-01-11 02:59:56.583080: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   constant folding: Graph size after: 470 nodes (0), 479 edges (0), time = 68.504ms.
2019-01-11 02:59:56.583104: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   layout: Graph size after: 470 nodes (0), 479 edges (0), time = 52.992ms.
2019-01-11 02:59:56.583220: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   TensorRTOptimizer: Graph size after: 470 nodes (0), 479 edges (0), time = 9.753ms.
2019-01-11 02:59:56.583245: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   constant folding: Graph size after: 470 nodes (0), 479 edges (0), time = 73.608ms.
2019-01-11 02:59:56.583267: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   TensorRTOptimizer: Graph size after: 470 nodes (0), 479 edges (0), time = 10.241ms.
2019-01-11 02:59:56.583290: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:404] Optimization results for grappler item: my_trt_op_3_native_segment
2019-01-11 02:59:56.583370: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   constant folding: Graph size after: 56 nodes (0), 56 edges (0), time = 32.06ms.
2019-01-11 02:59:56.583428: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   layout: Graph size after: 56 nodes (0), 56 edges (0), time = 23.991ms.
2019-01-11 02:59:56.583455: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   TensorRTOptimizer: Graph size after: 56 nodes (0), 56 edges (0), time = 6.843ms.
2019-01-11 02:59:56.583521: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   constant folding: Graph size after: 56 nodes (0), 56 edges (0), time = 35.446ms.
2019-01-11 02:59:56.583547: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   TensorRTOptimizer: Graph size after: 56 nodes (0), 56 edges (0), time = 7.067ms.
2019-01-11 02:59:56.583566: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:404] Optimization results for grappler item: my_trt_op_2_native_segment
2019-01-11 02:59:56.583680: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   constant folding: Graph size after: 88 nodes (0), 90 edges (0), time = 34.02ms.
2019-01-11 02:59:56.583718: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   layout: Invalid argument: The graph is already optimized by layout optimizer.
2019-01-11 02:59:56.583756: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   TensorRTOptimizer: Graph size after: 88 nodes (0), 90 edges (0), time = 7.201ms.
2019-01-11 02:59:56.583780: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   constant folding: Graph size after: 88 nodes (0), 90 edges (0), time = 36.758ms.
2019-01-11 02:59:56.583803: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:406]   TensorRTOptimizer: Graph size after: 88 nodes (0), 90 edges (0), time = 7.879ms.
TRT model is stored!
trt_engine opss: 5
all ops: 16

ardianumam · January 18, 2019, 2:53am

Hi All,

After starting to try TensorRT optimization and I personally found difficulties here and there, so, I decide to make a video tutorial here how we can optimize deep learning model obtained using Keras and Tensorflow. I also demonstrate to optimize YOLOv3. Hope it helps for those who begins trying to use TensorRT, and you don’t encounter similar difficulties as I experienced before.

Optimizing Tensorflow to TensorRT:
01 Optimizing Tensorflow Model Using TensorRT with 3.7x Faster Inference Time - YouTube
Visualizing model graph before and after TensorRT optimization:
02 Visualizing Deep Learning Graph Before and After TensorRT Optimization - YouTube
Optimizing Keras model to TensorRT:
03 Optimizing Keras Model to TensorRT - YouTube
Optimizing YOLOv3:
06 Optimizing YOLO version 3 Model using TensorRT with 1.5x Faster Inference Time - YouTube
YOLOv3 sample result, before and after TensorRT optimization:
07 Another YOLOv3 Detection Result (Native Tensorflow vs TensorRT optimized) - YouTube

saurabh.agrawalmlc27 · March 5, 2019, 3:51am

Hi ardianumam,

I am following your video on the YoloV3 optimization using TensorRT on TX2 and LambdaLabs deep learning workstation. I am getting the same error as you in your Post number 8 above.

Can you please post how did you solve this issue?

Thanks
Saurabh

saurabh.agrawalmlc27 · March 5, 2019, 3:56am

Output Log:

2019-03-05 09:22:57.897641: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2019-03-05 09:22:57.897847: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
2019-03-05 09:23:08.293207: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:438] MULTIPLE tensorrt candidate conversion: 8
2019-03-05 09:23:10.659763: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: …/builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-03-05 09:23:10.659928: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:0 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 3 nodes)
2019-03-05 09:23:10.660836: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: …/builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-03-05 09:23:10.660905: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:1 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 6 nodes)
2019-03-05 09:23:10.661388: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: …/builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-03-05 09:23:10.661453: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:2 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 6 nodes)
2019-03-05 09:23:10.662183: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: …/builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-03-05 09:23:10.662248: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:3 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 62 nodes)
2019-03-05 09:23:10.662693: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: …/builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-03-05 09:23:10.662760: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:4 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 6 nodes)
2019-03-05 09:23:14.515738: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:5 due to: “Invalid argument: Output node ‘yolov3/yolo-v3/Conv_5/LeakyRelu/alpha’ is weights not tensor” SKIPPING…( 506 nodes)
2019-03-05 09:23:14.517134: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: …/builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-03-05 09:23:14.517232: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:6 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 8 nodes)
2019-03-05 09:23:14.517855: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: …/builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-03-05 09:23:14.517919: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:7 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 53 nodes)
Traceback (most recent call last):
File “/home/nvidia/py3tf/lib/python3.5/site-packages/tensorflow/python/framework/importer.py”, line 418, in import_graph_def
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: NodeDef mentions attr ‘Truncate’ not in Op<name=Cast; signature=x:SrcT → y:DstT; attr=SrcT:type; attr=DstT:type>; NodeDef: import/Cast = CastDstT=DT_FLOAT, SrcT=DT_INT32, Truncate=false. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “yolo.py”, line 64, in
[“Placeholder:0”, “concat_9:0”, “mul_9:0”])
File “/home/nvidia/Desktop/saurabh/Tensorflow-TensorRT/YOLOv3/utils.py”, line 231, in read_pb_return_tensors
return_elements=return_elements)
File “/home/nvidia/py3tf/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py”, line 432, in new_func
return func(*args, **kwargs)
File “/home/nvidia/py3tf/lib/python3.5/site-packages/tensorflow/python/framework/importer.py”, line 422, in import_graph_def
raise ValueError(str(e))
ValueError: NodeDef mentions attr ‘Truncate’ not in Op<name=Cast; signature=x:SrcT → y:DstT; attr=SrcT:type; attr=DstT:type>; NodeDef: import/Cast = CastDstT=DT_FLOAT, SrcT=DT_INT32, Truncate=false. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

Python Code:

# Import the needed libraries
import cv2
import time
import numpy as np
import tensorflow as tf
import tensorflow.contrib.tensorrt as trt
from tensorflow.python.platform import gfile
from PIL import Image
from YOLOv3 import utils 

print("Import Done!")
# function to read a ".pb" model 
# (can be used to read frozen model or TensorRT model)
tf.expand_dims
def read_pb_graph(model):
  with gfile.FastGFile(model,'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
  return graph_def
frozen_graph = read_pb_graph("./YOLOv3/yolov3_gpu_nms.pb")
tf.squeeze
your_outputs = ["Placeholder:0", "concat_9:0", "mul_9:0"]

print("PB Read Done!")
# convert (optimize) frozen model to TensorRT model
trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,# frozen model
    outputs=your_outputs,
    max_batch_size=2,# specify your max batch size
    max_workspace_size_bytes=2*(10**9),# specify the max workspace
    precision_mode="FP32") # precision, can be "FP32" (32 floating point precision) or "FP16"

print("Convert/optimize TRT Done!")
#write the TensorRT model to be used later for inference
with gfile.FastGFile("./YOLOv3/TensorRT_YOLOv3_2.pb", 'wb') as f:
    f.write(trt_graph.SerializeToString())
print("TensorRT model is successfully stored!")


print("Write PB Done!")
# check how many ops of the original frozen model
all_nodes = len([1 for n in frozen_graph.node])
print("numb. of all_nodes in frozen graph:", all_nodes)

# check how many ops that is converted to TensorRT engine
trt_engine_nodes = len([1 for n in trt_graph.node if str(n.op) == 'TRTEngineOp'])
print("numb. of trt_engine_nodes in TensorRT graph:", trt_engine_nodes)
all_nodes = len([1 for n in trt_graph.node])
print("numb. of all_nodes in TensorRT graph:", all_nodes)

# config
SIZE = [416, 416] #input image dimension
# video_path = 0 # if you use camera as input
video_path = "./dataset/demo_video/road2.mp4" # path for video input
classes = utils.read_coco_names('./YOLOv3/coco.names')
num_classes = len(classes)
GIVEN_ORIGINAL_YOLOv3_MODEL = "./YOLOv3/yolov3_gpu_nms.pb" # to use given original YOLOv3
TENSORRT_YOLOv3_MODEL = "./YOLOv3/TensorRT_YOLOv3_2.pb" # to use the TensorRT optimized model

# get input-output tensor
input_tensor, output_tensors = \
utils.read_pb_return_tensors(tf.get_default_graph(),
                             TENSORRT_YOLOv3_MODEL,
                             ["Placeholder:0", "concat_9:0", "mul_9:0"])

# perform inference
with tf.Session(config=tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.5))) as sess:
    vid = cv2.VideoCapture(video_path) # must use opencv >= 3.3.1 (install it by 'pip install opencv-python')
    while True:
        return_value, frame = vid.read()
        if return_value == False:
            print('ret:', return_value)
            vid = cv2.VideoCapture(video_path)
            return_value, frame = vid.read()
        if return_value:
            image = Image.fromarray(frame)
        else:
            raise ValueError("No image!")
            
        img_resized = np.array(image.resize(size=tuple(SIZE)), 
                               dtype=np.float32)
        img_resized = img_resized / 255.
        prev_time = time.time()

        boxes, scores = sess.run(output_tensors, 
                                 feed_dict={input_tensor: 
                                            np.expand_dims(
                                                img_resized, axis=0)})
        boxes, scores, labels = utils.cpu_nms(boxes, 
                                              scores, 
                                              num_classes, 
                                              score_thresh=0.4, 
                                              iou_thresh=0.5)
        image = utils.draw_boxes(image, boxes, scores, labels, 
                                 classes, SIZE, show=False)

        curr_time = time.time()
        exec_time = curr_time - prev_time
        result = np.asarray(image)
        info = "time:" + str(round(1000*exec_time, 2)) + " ms, FPS: " + str(round((1000/(1000*exec_time)),1))
        cv2.putText(result, text=info, org=(50, 70), 
                    fontFace=cv2.FONT_HERSHEY_SIMPLEX,
                    fontScale=1, color=(255, 0, 0), thickness=2)
        #cv2.namedWindow("result", cv2.WINDOW_AUTOSIZE)
        cv2.imshow("result", result)
        if cv2.waitKey(10) & 0xFF == ord('q'): break

ardianumam · March 5, 2019, 5:39am

Hi Saurabh,

As I said in post #10, my probles was solved by updating tensorflow version from 1.19 to 1.11 (ver 1.11 was the newest version available for jetson tx in that time).

kamatrohan13 · July 18, 2019, 5:31pm

Im using NVIDIA 1080 GTX GPU, CUDA 10.0, Tensorflow-GPU=1.14.0, CUDAnn 7.4.2, TensorRT 5.1.5.0.
I have 2 issue:

I’m using GitHub - ardianumam/Tensorflow-TensorRT: This repository is for my YT video series about optimizing a Tensorflow deep learning model using TensorRT. We demonstrate optimizing LeNet-like model and YOLOv3 model, and get 3.7x and 1.5x faster for the former and the latter, respectively, compared to the original models. for trying out TRT.
I believe even after using TRT both the graphs i.e optimized and original TF graphs are the same. There is no improvement in performance.(I do not see any TRT engine nodes using Tensorboard)
a. Is there a way to verify that the graph generated is actually a TRT graph?
b. Does TRT provide any logs? I have used NVPROF but I do not find TRT processes.
c. I had read somewhere that NVIDIA 1080 does not support FP16, I have used that and I do not get errors, issue being with any precision mode, results are same.

Thank you in advance!

652209287 · July 22, 2019, 8:15am

Im using NVIDIA 1080 GTX GPU, CUDA 10.0, Tensorflow-GPU=1.14.0, CUDAnn 7.4.2, TensorRT 5.1.5.0.
I have 2 issue:

I’m using GitHub - ardianumam/Tensorflow-TensorRT: This repository is for my YT video series about optimizing a Tensorflow deep learning model using TensorRT. We demonstrate optimizing LeNet-like model and YOLOv3 model, and get 3.7x and 1.5x faster for the former and the latter, respectively, compared to the original models. for trying out TRT.
I believe even after using TRT both the graphs i.e optimized and original TF graphs are the same. There is no improvement in performance.(I do not see any TRT engine nodes using Tensorboard)

a. Is there a way to verify that the graph generated is actually a TRT graph?
b. Does TRT provide any logs? I have used NVPROF but I do not find TRT processes.
c. I had read somewhere that NVIDIA 1080 does not support FP16, I have used that and I do not get errors, issue being with any precision mode, results are same.

Thank you in advance!

Same question, how to check if a graph generated by create_inference_graph is an optimized graph?
BTW, how to use C/C++ API for tensorflow-tensorrt?

Thx

kamatrohan13 · July 22, 2019, 3:15pm

It worked for me, try out NGC container.

Topic		Replies	Views
TRT issue with Graph Creation - TRTEngineOP TensorRT	12	3394	November 4, 2019
No improvements from TensorRT on NVIDIA-AI-IOT/tf_trt_models TensorRT	3	1668	February 21, 2019
No improvement in inference performance after Opt. with TensorRT TensorRT	6	1337	April 15, 2020
TF-TRT issue Jetson TX2	25	4291	February 22, 2019
TensorRT (TF-TRT) doesn't improve TF model in GeForce 1060? TensorRT	7	3080	January 18, 2019
TensorRT optimization random outcome Jetson Nano	4	932	November 15, 2019
Error while optimizing frozen Tensorflow graph TensorRT	4	1265	February 26, 2019
TRT optimize graph not faster than unoptimized (nvidia/tensorrt:19.01-py3 image) TensorRT	6	2315	March 11, 2019
TensorFlow object detection and image classification accelerated for NVIDIA Jetson Jetson TX2	25	10904	June 3, 2019
TensorFlow Issue - 'NonMaxSuppressionV3' in binary Jetson TX2	15	3449	March 22, 2019

Don't get any 'TRTEngineOp' after optimizing model via TensorRT in Jeton TX2

Related topics