Tensorflow RNN UFF conversion not yet supported?

Hello, I am testing some tensorflow rnn code to be used with TensorRT, but it doesn’t seem to work.

import random
import tensorflow as tf
import numpy as np

with tf.name_scope("Operation"):
    batch_size = 10
    rnn_cell = tf.nn.rnn_cell.BasicRNNCell(1)
    initial_state = rnn_cell.zero_state(batch_size, dtype=tf.float32)

    inputs = [tf.placeholder(tf.float32, shape=[batch_size, 1])]

    outputs, state = tf.nn.static_rnn(rnn_cell, inputs,
                                   initial_state=initial_state,
                                   dtype=tf.float32)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    tf.train.write_graph(sess.graph, "", "output.pb", as_text=False)

    for i in range(1000):
        feed_dict = {inputs[0]: np.repeat([[1]], 10, axis=0)}
        sess.run([outputs, state], feed_dict=feed_dict)

    print("Done")

So this is my sample code, and I used convert-to-uff to convert output.pb to output.uff, but it generates following error.

Traceback (most recent call last):
  File "/home/skyser2003/.local/bin/convert-to-uff", line 11, in <module>
    sys.exit(main())
  File "/home/skyser2003/.local/lib/python3.4/site-packages/uff/bin/convert_to_uff.py", line 79, in main
    output_filename=args.output
  File "/home/skyser2003/.local/lib/python3.4/site-packages/uff/converters/tensorflow/conversion_helpers.py", line 159, in from_tensorflow_frozen_model
    return from_tensorflow(graphdef, output_nodes, preprocessor, **kwargs)
  File "/home/skyser2003/.local/lib/python3.4/site-packages/uff/converters/tensorflow/conversion_helpers.py", line 132, in from_tensorflow
    name="main")
  File "/home/skyser2003/.local/lib/python3.4/site-packages/uff/converters/tensorflow/converter.py", line 77, in convert_tf2uff_graph
    uff_graph, input_replacements)
  File "/home/skyser2003/.local/lib/python3.4/site-packages/uff/converters/tensorflow/converter.py", line 64, in convert_tf2uff_node
    op, name, tf_node, inputs, uff_graph, tf_nodes=tf_nodes)
  File "/home/skyser2003/.local/lib/python3.4/site-packages/uff/converters/tensorflow/converter.py", line 43, in convert_layer
    return cls.registry_[op](name, tf_node, inputs, uff_graph, **kwargs)
  File "/home/skyser2003/.local/lib/python3.4/site-packages/uff/converters/tensorflow/converter_functions.py", line 375, in convert_bias_add
    kwargs["tf_nodes"][biases_name])
  File "/home/skyser2003/.local/lib/python3.4/site-packages/uff/converters/tensorflow/converter.py", line 136, in convert_tf2numpy_const_node
    return array.reshape(shape)
ValueError: cannot reshape array of size 0 into shape ()

I looked into convert_bias_add method, and it seemed that bias tensor used in BasicRNNCell is basically empty, so that’s why it gives out array of size 0 error. And this means that TensorRT does not support BasicRNNCell, contrary to what is written in official document.

So my questions are

  1. tf.nn.static_rnn is currently supported, but am I doing something wrong?
  2. If it isn’t supported, is there any plan to support it in near future?

Tested with both TensorRT4 and TensorRT5 RC version.

Thanks.

Hello,

I’m able to run your code w/o error. On both TRT4 and 5. Am I missing something?

root@c2e1e8e462ec:/mnt# dpkg -l | grep nvinfer
ii  libnvinfer-dev              4.1.2-1+cuda9.0                       amd64        TensorRT development libraries and headers
ii  libnvinfer-samples          4.1.2-1+cuda9.0                       amd64        TensorRT samples and documentation
ii  libnvinfer4                 4.1.2-1+cuda9.0                       amd64        TensorRT runtime libraries
ii  python3-libnvinfer          4.1.2-1+cuda9.0                       amd64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev      4.1.2-1+cuda9.0                       amd64        Python 3 development package for TensorRT
ii  python3-libnvinfer-doc      4.1.2-1+cuda9.0                       amd64        Documention and samples of python bindings for TensorRT
root@c2e1e8e462ec:/mnt# python test.py
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
2018-10-12 03:18:35.157804: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-10-12 03:18:35.988647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:06:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2018-10-12 03:18:36.355028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 1 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:07:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2018-10-12 03:18:36.725928: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 2 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:0a:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2018-10-12 03:18:37.104838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 3 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:0b:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2018-10-12 03:18:37.520510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 4 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:85:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2018-10-12 03:18:37.954920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 5 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:86:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2018-10-12 03:18:38.395103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 6 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:89:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2018-10-12 03:18:38.845147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 7 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:8a:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2018-10-12 03:18:38.845597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2018-10-12 03:18:41.578250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-12 03:18:41.578307: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977]      0 1 2 3 4 5 6 7
2018-10-12 03:18:41.578316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0:   N Y Y Y Y N N N
2018-10-12 03:18:41.578321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1:   Y N Y Y N Y N N
2018-10-12 03:18:41.578326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2:   Y Y N Y N N Y N
2018-10-12 03:18:41.578332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3:   Y Y Y N N N N Y
2018-10-12 03:18:41.578338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 4:   Y N N N N Y Y Y
2018-10-12 03:18:41.578362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 5:   N Y N N Y N Y Y
2018-10-12 03:18:41.578385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 6:   N N Y N Y Y N Y
2018-10-12 03:18:41.578391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 7:   N N N Y Y Y Y N
2018-10-12 03:18:41.582431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30377 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
2018-10-12 03:18:42.021879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 30377 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0)
2018-10-12 03:18:42.451782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 30377 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0)
2018-10-12 03:18:42.885935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 30377 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0b:00.0, compute capability: 7.0)
2018-10-12 03:18:43.313926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 30377 MB memory) -> physical GPU (device: 4, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0)
2018-10-12 03:18:43.742701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 30377 MB memory) -> physical GPU (device: 5, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0)
2018-10-12 03:18:44.177776: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 30377 MB memory) -> physical GPU (device: 6, name: Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0)
2018-10-12 03:18:44.606688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 30377 MB memory) -> physical GPU (device: 7, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8a:00.0, compute capability: 7.0)
Done

Hello, I am using the following code to convert and test TensorRT4.

import pycuda.driver as cuda
import pycuda.autoinit
 
from tensorrt.parsers import uffparser
import tensorrt as trt
import uff
import numpy as np
 
output_name = "Operation/basic_rnn_cell/Tanh"
uff_model = uff.from_tensorflow_frozen_model("output.pb", [output_name])
 
G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.ERROR)
parser = uffparser.create_uff_parser()
parser.register_input("Test/x_holder", (1, 1, 1), 0)
parser.register_output(output_name)
engine = trt.utils.uff_to_trt_engine(G_LOGGER, uff_model, parser, 1, 1 << 20)
parser.destroy()
 
runtime = trt.infer.create_infer_runtime(G_LOGGER)
context = engine.create_execution_context()
 
inp = np.ones((1,1,1), dtype=np.float32)
output = np.empty(1, dtype=np.float32)
 
d_input = cuda.mem_alloc(inp.nbytes)
d_output = cuda.mem_alloc(output.nbytes)
 
bindings = [int(d_input), int(d_output)]
 
stream = cuda.Stream()
 
cuda.memcpy_htod_async(d_input, inp.tobytes(), stream)
context.enqueue(1, bindings, stream.handle, None)
cuda.memcpy_dtoh_async(output, d_output, stream)
stream.synchronize()
 
print(output)
 
context.destroy()
engine.destroy()
runtime.destroy()

And line ‘uff_model = uff.from_tensorflow_frozen_model(“output.pb”, [output_name])’ generates the same error.

Also when I enter ‘dpkg -l | grep nvinfer’ command, I don’t see any results.
I installed TensorRT4 through tar, maybe that’s the problem?

It’s expected that if you installed TRT with tar, you won’t see it as a dpkg item.

It’d help us debug the symptoms you are seeing if you can share a small package that contains the source, model, and dataset that can reproduce the error you are seeing. You can DM me if you don’t want to share publically.

Currently I cannot even run the test code I wrote above, so I am not testing any existincg models.
May I ask which version of python and tensorflow you used to test my code?

Hello,

I’m using

root@9df53bfc4c73:/mnt# python -c 'import tensorflow as tf; print(tf.__version__)'
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
1.11.0
root@9df53bfc4c73:/mnt# python --version
Python 3.5.2

Hello,
I looked into your initial answer again, and I think it is just raw tensorflow output,
not tensor rt output.

https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#import_tf_python

I am trying to follow this instruction, and if I am correct,
I need a frozen.pb file converted to uff file,
and then use that uff file to execute inference, is it?

The point I’m having problem doing is conversion to uff.
Document says you can convert it by either using ‘convert-to-uff’ command line tool or using python uff parser module,
but both produces the same error I wrote initially.

Also, I tried converting some simple cnn layer model, and it worked perfectly.
So I think that the problem is, tensorrt yet does not support rnn.

I upgraded tensorflow version from 1.10.0 to 1.11.0, and original problem was solved, but other errors began to come out. I’ll update again if I cannot solve new errors.

Thanks.