Tensorflow 1.7 with TensorRT fails

Hello together,
I installed Tensorflow 1.7 with TensorRT integration from https://devtalk.nvidia.com/default/topic/1031300/jetson-tx2/tensorflow-1-7-wheel-with-jetpack-3-2-/2#reply. I tried running a simple code, but the it fails with

2018-04-05 11:48:57.688753: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-04-05 11:48:57.688893: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 13) > this->size() (which is 12)
Aborted (core dumped)

For better explanation the code that should load the model.

self.__graph_def = self.load_graphdef(graphfilename)
        self.__graph = trt.create_inference_graph(self.__graph_def, 
                outputs=["Argmax_Image:0"], 
                max_batch_size=1,
                max_workspace_size_bytes=2000000000,
                precision_mode="FP16")
        self.__x = self.__graph.get_tensor_by_name("Input:0")
        self.__y = self.__graph.get_tensor_by_name("Argmax_Image:0")

        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.90)
        self.__sess = tf.Session(graph=self.__graph, config=tf.ConfigProto(gpu_options=gpu_options))

Has anyone had an similar issue or an idea how to resolve it?

Hi lil_scorpion, our guess is your max_workspace_size_bytes is set too high, please see the tftrt test script which has a lower setting:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/tensorrt/test/test_tftrt.py

trt_graph = trt.create_inference_graph(
      input_graph_def=orig_graph,
      outputs=["output"],
      max_batch_size=inp_dims[0],
      max_workspace_size_bytes=1 << 25,
      precision_mode="FP32",  # TRT Engine precision "FP32","FP16" or "INT8"
      minimum_segment_size=2  # minimum number of nodes in an engine
  )

(note that 1 << 25 == 33554432, smaller than 2000000000 or 2GB)

Hi dusty_nv, unfortunately a smaller workspace size did not help, the same error occurs.
Any futher ideas?

Hi,

Could you set precision_mode=“FP32” for a test?
More, does your model have the conv2d layer with the padding=‘SAME’?

Thanks.

Hi lil_scorpion_,

Have you tried the suggestion in comments#4?
Does it help? Any result can be shared?

Thanks

Hi kaycc and AastaLLL,
unfortunately the system had a hardware revision and I got sick, hopefully I can test and share the result of the suggestion on Friday.

Hi,
I could now verify the behavious with FP32, but the exact same error as with other precisions.
Yes my model has convolutions with padding=“SAME”. More precisely it is a GoogLeNet FCN model.

Hi,

We have a dimension bug on the convolution op with padding=‘SAME’.
Check this topic for details: https://devtalk.nvidia.com/default/topic/1028045

We already fixed this issue in TensorRT 4.0 and will be included in our next JetPack release.
If possible, please use padding=‘VALID’ to avoid this bug.

Thanks

Hi , i want to know that the python function “trt.create_inference_graph” in tensorrt c++ inference name ?

Hi,

trt.create_inference_graph is a wrapper to create TensorRT engine which is a C++ implementation.
You can find the detail implementation here:
[url]https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/tensorrt/python/trt_convert.py[/url]

Thanks.

hello AastaLLL, thanks for response. Now i have realize use tensorrt c++ inference to get faster speed by load model in uff format. However, I want to use TensorRT c++ inference to load model from *.pb file just as trt.create_infeerence_graph in python. Do we have correspond function in tensorrt c++ inference?

Hi,

Here is a sample for your reference:
[url]https://github.com/AastaNV/ChatBot/blob/master/src/tensorNet.cpp#L55[/url]

Thanks.

Please write as follows, instead of using “Argmax_Image:0” in outputs. (This is C++ string error.) :

self.__graph = trt.create_inference_graph(self.__graph_def, 
    outputs=["Argmax_Image"], 
    max_batch_size=100,
    max_workspace_size_bytes=1 << 25,
    precision_mode="FP32")