no "..TensorRTOptimizer: Graph size.." at end of conversion

Hi,

When I’m converting some models to tensortRT, I’m not getting the last line:
“…TensorRTOptimizer: Graph size after: …”
obviously, on these, conversion wasn’t successful, while on others I do get the last line with a successful conversion.
Any thoughts on why this could be happening are more than appreciated.

my output for when it fails is:

2019-12-09 13:41:58.823081: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-09 13:41:58.823238: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2019-12-09 13:41:58.823436: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-12-09 13:41:58.823736: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-09 13:41:58.823870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1060 with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.48
pciBusID: 0000:01:00.0
2019-12-09 13:41:58.823898: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-12-09 13:41:58.823908: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-12-09 13:41:58.823916: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-12-09 13:41:58.823924: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-12-09 13:41:58.823932: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-12-09 13:41:58.823940: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-12-09 13:41:58.823948: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-12-09 13:41:58.823979: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-09 13:41:58.824127: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-09 13:41:58.824249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-12-09 13:41:58.824264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-09 13:41:58.824268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-12-09 13:41:58.824271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-12-09 13:41:58.824316: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-09 13:41:58.824477: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-09 13:41:58.824602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4949 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-12-09 13:41:58.926672: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:716] Optimization results for grappler item: tf_graph
2019-12-09 13:41:58.926709: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 96 nodes (-57), 95 edges (-57), time = 49.648ms.
2019-12-09 13:41:58.926725: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] layout: Graph size after: 122 nodes (26), 143 edges (48), time = 11.707ms.
2019-12-09 13:41:58.926728: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 122 nodes (0), 143 edges (0), time = 19.06ms.

maybe this will help too, the code to run the conversion:

max_work_space = 1 << 25 # same for higher values. run on GeForce 1060
tensorRT_precision = FP16 # same for FP32

freeze_var_names = list(set(v.op.name for v in tf.global_variables()).difference(keep_var_names or ))
frozen_graph = tf.graph_util.convert_variables_to_constants(session, input_graph_def, output_names, freeze_var_names)

trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,
outputs=model_out_orig,
max_batch_size=1,
max_workspace_size_bytes=max_work_space,
precision_mode=tensorRT_precision)

Hi,

GeForce GTX 1060 CUDA compute capability is 6.1. FP16 is not supported on devices with CUDA compute capability of 6.1.
Please refer below link:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-601/tensorrt-support-matrix/index.html#hardware-precision-matrix

For supported precision type there are various ways to check what operators are converted or not converted.
Try verbose logging of the model optimization for further debugging
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#verbose

Please refer below link for more options:
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#uncon-tf-op

Subgraphs with fewer nodes than minimum_segment_size are not converted to TensorRT.

Thanks

SunilJB,

Thanks for the reply but I haven’t been able to make any progress.
Same outcome when using FP32.
I also tried various minimum_segment_size values and nothing helped.

Further suggestions are most appreciated, and I’ll share successful results, if any.

Best

Hi,

Can you please share the sample script and model file to reproduce the issue?

Thanks

Hi,

Sadly I cannot share the model.
The code above is the one I have used - not much else.

Best

Hi,

Can you please share complete verbose log of the model optimization process?

Meanwhile, could you please try to use “trtexec” command to test the model.
“trtexec” useful for benchmarking networks and would be faster and easier to debug the issue.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Another alternative is to convert your model to ONNX instead using tf2onnx and then convert to TensorRT using ONNX parser. Any layer that are not supported needs to be replaced by custom plugin.
https://github.com/onnx/tensorflow-onnx
https://github.com/onnx/onnx-tensorrt/blob/master/operators.md

Thanks

Many thanks SunilJB

Will try!

SunilJB,

Sorry for the late reply but was able to convert my model using “trtexec” on the laptop - many thanks for that!

Happy new 2020