TF-TRT issue

Hi,
I’m trying to use the repository to test TF-TRT on the JetsonTX2. I installed successfully tensorflow 1.9 using the pip3 installation as described in the link (https://devtalk.nvidia.com/default/topic/1038957/jetson-tx2/tensorflow-for-jetson-tx2-/)

Then followed this procedure, which went all well.
https://github.com/NVIDIA-AI-IOT/tf_trt_models

But when I want to test the sample, using python3:

from tf_trt_models.detection import download_detection_model

I get the error

ImportError: No module named 'object_detection'

Someone has any idea ?

Thanks in advance

Hi,

Do you execute this installation script first?

./install.sh python3

Thanks.

Hi,

I ran only:

./install.sh

without precising a python version.

If I try using your command, I have this error during installation:

warning: no files found matching 'Doc/*'
warning: no files found matching '*.pyx' under directory 'Cython/Debugger/Tests'
warning: no files found matching '*.pxd' under directory 'Cython/Debugger/Tests'
warning: no files found matching '*.pxd' under directory 'Cython/Utility'
warning: no files found matching 'pyximport/README'

even though I do have cython installed…

And when I use python2.7 to import the tf-trt module (after having the full installation that went right) I get this error:

ImportError: No module named google.protobuf

I don’t get what’s missing…

I finally managed to install correctly by reinstalling cython, I wonder if I’ll have to reinstall each time…

Anyway, I tried to create my own python script from this link (https://github.com/NVIDIA-AI-IOT/tf_trt_models/blob/master/examples/detection/detection.ipynb)
I basically copied all the command and execute the script.

#!/usr/bin/env python

from PIL import Image
import sys
import os
import urllib
import tensorflow.contrib.tensorrt as trt
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import tensorflow as tf
import numpy as np
import time
from tf_trt_models.detection import download_detection_model, build_detection_graph

MODEL = 'ssd_mobilenet_v1_coco'
DATA_DIR = './data/'
CONFIG_FILE = MODEL + '.config'   # ./data/ssd_inception_v2_coco.config 
CHECKPOINT_FILE = 'model.ckpt'    # ./data/ssd_inception_v2_coco/model.ckpt
IMAGE_PATH = './data/huskies.jpg'

# Download the pretrained model #
config_path, checkpoint_path = download_detection_model(MODEL, 'data')

# Build the frozen graph #
frozen_graph, input_names, output_names = build_detection_graph(
    config=config_path,
    checkpoint=checkpoint_path,
    score_threshold=0.3,
    batch_size=1
)

# Optimize the model with TensorRT #
print(output_names)

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',
    minimum_segment_size=20
)

with open('./data/ssd_inception_v2_coco_trt.pb', 'wb') as f:
    f.write(trt_graph.SerializeToString())

# Create session and load graph #

tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True

tf_sess = tf.Session(config=tf_config)

tf.import_graph_def(trt_graph, name='')

tf_input = tf_sess.graph.get_tensor_by_name(input_names[0] + ':0')
tf_scores = tf_sess.graph.get_tensor_by_name('detection_scores:0')
tf_boxes = tf_sess.graph.get_tensor_by_name('detection_boxes:0')
tf_classes = tf_sess.graph.get_tensor_by_name('detection_classes:0')
tf_num_detections = tf_sess.graph.get_tensor_by_name('num_detections:0')

# Load and Preprocess Image #
image = Image.open(IMAGE_PATH)

plt.imshow(image)

image_resized = np.array(image.resize((300, 300)))
image = np.array(image)

# Run network on Image #
scores, boxes, classes, num_detections = tf_sess.run([tf_scores, tf_boxes, tf_classes, tf_num_detections], feed_dict={
    tf_input: image_resized[None, ...]
})

boxes = boxes[0] # index by 0 to remove batch dimension
scores = scores[0]
classes = classes[0]
num_detections = num_detections[0]

# Display Results #
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

ax.imshow(image)

# plot boxes exceeding score threshold
for j in range(num_detections):
    # scale box to image coordinates
    box = boxes[j] * np.array([image.shape[0], image.shape[1], image.shape[0], image.shape[1]])

    # display rectangle
    patch = patches.Rectangle((box[1], box[0]), box[3] - box[1], box[2] - box[0], color='g', alpha=0.3)
    ax.add_patch(patch)

    # display class index and score
    plt.text(x=box[1] + 10, y=box[2] - 10, s='%d (%0.2f) ' % (classes[j], scores[j]), color='w')

plt.show()

# Benchmark #
num_samples = 50

t0 = time.time()
for i in range(num_samples):
    scores, boxes, classes, num_detections = tf_sess.run([tf_scores, tf_boxes, tf_classes, tf_num_detections], feed_dict={
        tf_input: image_resized[None, ...]
    })
t1 = time.time()
print('Average runtime: %f seconds' % (float(t1 - t0) / num_samples))

# Close session to release resources #
tf_sess.close()

Here is the error I get each time (always at the trt.create_inference_graph function):

python3 detection_tf_trt.py 
2018-11-28 14:06:23.109193: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2018-11-28 14:06:23.109418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 2.28GiB
2018-11-28 14:06:23.109482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-11-28 14:06:24.274927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-28 14:06:24.275049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2018-11-28 14:06:24.275077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2018-11-28 14:06:24.275254: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1774 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
WARNING:tensorflow:From /home/nvidia/.local/lib/python3.5/site-packages/object_detection-0.1-py3.5.egg/object_detection/exporter.py:356: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
2018-11-28 14:07:40.012185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-11-28 14:07:40.012332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-28 14:07:40.012377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2018-11-28 14:07:40.012432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2018-11-28 14:07:40.012569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1774 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-11-28 14:08:33.326804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-11-28 14:08:33.326981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-28 14:08:33.327012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2018-11-28 14:08:33.327044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2018-11-28 14:08:33.327152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1774 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-11-28 14:08:46.371327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-11-28 14:08:46.371478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-28 14:08:46.371510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 
2018-11-28 14:08:46.371535: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N 
2018-11-28 14:08:46.371636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1774 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
['detection_boxes', 'detection_classes', 'detection_scores', 'num_detections']
2018-11-28 14:09:41.088915: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
2018-11-28 14:09:53.230506: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:438] MULTIPLE tensorrt candidate conversion: 7
Segmentation fault (core dumped)

I precise that I installed TensorFlow 1.9.
I don’t get why it says

I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0

Also how do I clear some RAM ?

totalMemory: 7.66GiB freeMemory: 2.28GiB

Thank you in advance

Hi,

You have duplicate TensorRT version:

2018-11-28 14:09:53.230506: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:438] MULTIPLE tensorrt candidate conversion: 7

And based on the TensorFlow log, your CUDA library may also have some issue.
It’s recommended to reflash your device with JetPack to get a clean environment.

Thanks.

Hi,
it’s not really possible for me to reflash completely I have too much work done on it and lots of installation processes I did.

Is it possible to purge completely TensorRT and reinstall it properly ?

Concerning CUDA, do you think it can be related to OpenCV3.4.1 that I installed manually ? Again is it possible to purge CUDA to reinstall it properly ?

Thank in advance

EDIT: I remove tensorrt and reinstalled it (removed libnvinfer etc…) but now I need the “uff-converter-tf” module which neither apt-get install or pip install manage to find… I don’t know what to do…

Hi,
still struggling with those errors, any idea ?

I saw the PythonAPI wasn’t available on the Jetson, so how am I suppose to use TensorRT ?

Thanks in advance

Hi,

Try this command to check the TensorRT version on your environment:

$ dpkg -l | grep nvinfer

Please remember that you need to use CUDA/cuDNN/TensorRT from JetPack installer.
The package shared in website is for desktop environment, not compatible for Jetson.

TensorRT python API doesn’t support Jetson due to pyCUDA is not available on ARM platform.
But the python interface for uff parser can work well on Jetson.

For this sample, it use python parser and do the inference directly from C++ TensorRT API(with python wrapper).
So it is workable on Jetson.

Thanks.

Hi AastaLLL,
thanks for the answer.

Executing your command I get this:

ii  libnvinfer-dev                                           4.0.4-1+cuda9.0                              arm64        TensorRT development libraries and headers
ii  libnvinfer-samples                                       4.0.4-1+cuda9.0                              arm64        TensorRT samples and documentation
ii  libnvinfer4                                              4.0.4-1+cuda9.0                              arm64        TensorRT runtime libraries

I also executed

$ dpkg -l | grep cuda

to get the version of cuda and here is what I get:

ii  cuda-command-line-tools-9-0                              9.0.252-1                                    arm64        CUDA command-line tools
ii  cuda-core-9-0                                            9.0.252-1                                    arm64        CUDA core tools
ii  cuda-cublas-9-0                                          9.0.252-1                                    arm64        CUBLAS native runtime libraries
ii  cuda-cublas-dev-9-0                                      9.0.252-1                                    arm64        CUBLAS native dev links, headers
ii  cuda-cudart-9-0                                          9.0.252-1                                    arm64        CUDA Runtime native Libraries
ii  cuda-cudart-dev-9-0                                      9.0.252-1                                    arm64        CUDA Runtime native dev links, headers
ii  cuda-cufft-9-0                                           9.0.252-1                                    arm64        CUFFT native runtime libraries
ii  cuda-cufft-dev-9-0                                       9.0.252-1                                    arm64        CUFFT native dev links, headers
ii  cuda-curand-9-0                                          9.0.252-1                                    arm64        CURAND native runtime libraries
ii  cuda-curand-dev-9-0                                      9.0.252-1                                    arm64        CURAND native dev links, headers
ii  cuda-cusolver-9-0                                        9.0.252-1                                    arm64        CUDA solver native runtime libraries
ii  cuda-cusolver-dev-9-0                                    9.0.252-1                                    arm64        CUDA solver native dev links, headers
ii  cuda-cusparse-9-0                                        9.0.252-1                                    arm64        CUSPARSE native runtime libraries
ii  cuda-cusparse-dev-9-0                                    9.0.252-1                                    arm64        CUSPARSE native dev links, headers
ii  cuda-documentation-9-0                                   9.0.252-1                                    arm64        CUDA documentation
ii  cuda-driver-dev-9-0                                      9.0.252-1                                    arm64        CUDA Driver native dev stub library
ii  cuda-libraries-dev-9-0                                   9.0.252-1                                    arm64        CUDA Libraries 9.0 development meta-package
ii  cuda-license-9-0                                         9.0.252-1                                    arm64        CUDA licenses
ii  cuda-misc-headers-9-0                                    9.0.252-1                                    arm64        CUDA miscellaneous headers
ii  cuda-npp-9-0                                             9.0.252-1                                    arm64        NPP native runtime libraries
ii  cuda-npp-dev-9-0                                         9.0.252-1                                    arm64        NPP native dev links, headers
ii  cuda-nvgraph-9-0                                         9.0.252-1                                    arm64        NVGRAPH native runtime libraries
ii  cuda-nvgraph-dev-9-0                                     9.0.252-1                                    arm64        NVGRAPH native dev links, headers
ii  cuda-nvml-dev-9-0                                        9.0.252-1                                    arm64        NVML native dev links, headers
ii  cuda-nvrtc-9-0                                           9.0.252-1                                    arm64        NVRTC native runtime libraries
ii  cuda-nvrtc-dev-9-0                                       9.0.252-1                                    arm64        NVRTC native dev links, headers
ii  cuda-repo-l4t-9-0-local                                  9.0.252-1                                    arm64        cuda repository configuration files
ii  cuda-samples-9-0                                         9.0.252-1                                    arm64        CUDA example applications
ii  cuda-toolkit-9-0                                         9.0.252-1                                    arm64        CUDA Toolkit 9.0 meta-package
ii  libcudnn7                                                7.0.5.15-1+cuda9.0                           arm64        cuDNN runtime libraries
ii  libcudnn7-dev                                            7.0.5.15-1+cuda9.0                           arm64        cuDNN development libraries and headers
ii  libcudnn7-doc                                            7.0.5.15-1+cuda9.0                           arm64        cuDNN documents and samples
ii  libnvinfer-dev                                           4.0.4-1+cuda9.0                              arm64        TensorRT development libraries and headers
ii  libnvinfer-samples                                       4.0.4-1+cuda9.0                              arm64        TensorRT samples and documentation
ii  libnvinfer4                                              4.0.4-1+cuda9.0                              arm64        TensorRT runtime libraries
ii  nv-tensorrt-repo-ubuntu1604-ga-cuda9.0-trt3.0.4-20180208 1-1                                          arm64        nv-tensorrt repository configuration files
ii  tensorrt                                                 3.0.4-1+cuda9.0                              arm64        Meta package of TensorRT

From what I see, I have CUDA 9.0, cuDNN 7.0, libnvinfer 4.0.4 but also a tensorrt 3.0.4 Meta Package … I suppose this is where the problem come from right ?

Hi,

YES. Could you try to remove these two package?

ii  nv-tensorrt-repo-ubuntu1604-ga-cuda9.0-trt3.0.4-20180208 1-1                                          arm64        nv-tensorrt repository configuration files
ii  tensorrt                                                 3.0.4-1+cuda9.0                              arm64        Meta package of TensorRT

Like this:

sudo apt-get purge nv-tensorrt-repo-ubuntu1604-ga-cuda9.0-trt3.0.4-20180208 1-1
sudo apt-get purge tensorrt

Thanks.

Hi,
thanks for answering.

I purged as you said… still the same error log (MULTIPLE tensorrt candidate conversion: 7)

Any other idea ?

Hi,

We found that your issue may not come from duplicate TensorRT package.
Could you update your script to this setting:

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',
    minimum_segment_size=50
)

Thanks.

Hi,

sorry for the delay, finally I followed your first advice and re-flashed completely the Jetson, I’m currently reinstalling all python libraries etc…

I’ll let you know when I’m done and have given a try to your modification.

Hi guys,
I’m facing another issue concerning the use of TensorFlow on the Jetson.

After having re-flashed the Jetson I’m currently using virtualenv and virtualenvwrapper to isolate all the libraries and avoid conflict.

I managed installing all the necessary libraries but when I want to use a script based on a .pb file generated with tensorflow 1.12 I have 2 distinct errors whether I try with TF1.8 or 1.10:

With TF 1.8:

GPU is available!
[_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:0, GPU, 112689152)]
Model path: trained_models/mask_rcnn_plantule_V0_epoch5.pb
<BEGIN Loading Graph>
Traceback (most recent call last):
  File "detect_instances.py", line 596, in <module>
    main(sys.argv)
  File "detect_instances.py", line 440, in main
    tf.import_graph_def(graph_def, name="")
  File "/home/nvidia/PythonEnv/InstallationEnv/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
    return func(*args, **kwargs)
  File "/home/nvidia/PythonEnv/InstallationEnv/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 489, in import_graph_def
    graph._c_graph, serialized, options)  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered 'NonMaxSuppressionV3' in binary running on tegra-ubuntu. Make sure the Op and Kernel are registered in the binary running in this process.

With TF 1.10 (and TF 1.9):

GPU is available!
[_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456), _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:0, GPU, 1205403648)]
Model path: trained_models/mask_rcnn_plantule_V0_epoch5.pb
<BEGIN Loading Graph>
Traceback (most recent call last):
  File "/home/nvidia/PythonEnv/InstallationEnv/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 418, in import_graph_def
    graph._c_graph, serialized, options)  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: NodeDef mentions attr 'T' not in Op<name=NonMaxSuppressionV3; signature=boxes:float, scores:float, max_output_size:int32, iou_threshold:float, score_threshold:float -> selected_indices:int32>; NodeDef: ROI_1/rpn_non_max_suppression/NonMaxSuppressionV3 = NonMaxSuppressionV3[T=DT_FLOAT](ROI_1/strided_slice_21, ROI_1/strided_slice_22, ROI_1/rpn_non_max_suppression/NonMaxSuppressionV3/max_output_size, ROI_1/rpn_non_max_suppression/iou_threshold, ROI_1/rpn_non_max_suppression/score_threshold). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

I’ve been looking on forums for a while and I can’t find any solution except maybe the fact that the .pb has been generated using a higher version of TF than the one I’m using to do inference… what do you think ?

For information when I installed the different version of TF I’m using this link:

https://devtalk.nvidia.com/default/topic/1031300/jetson-tx2/tensorflow-1-8-wheel-with-jetpack-3-2-/

Thank you in advance

I have the same problem. Is your problem solved?

Nope still facing those error…

I even try using it on my host but I have another type of issue even if this time I have tensorflow1.12:

GPU is not recognize!
[_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 1971038397091143842), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 16963126580009595096)]
Model path: trained_models/mask_rcnn_aphid_V0_epoch20.pb
<BEGIN Loading Graph>
<END Loading Graph>
<BEGIN DETECTION IN VIDEO>
[ERROR:0] VIDEOIO(createMotionJpegWriter(filename, fourcc, fps, frameSize, isColor)): raised OpenCV exception:

OpenCV(3.4.2) /io/opencv/modules/videoio/src/cap_mjpeg_encoder.cpp:427: error: (-215:Assertion failed) fps >= 1 in function 'open'


Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
<END DETECTION IN VIDEO>

From what I can see I’d say it’s a codec error or something but even after updating upgrading and installing several libraries I still have the error occuring…

2018-11-28 14:09:41.088915: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
2018-11-28 14:09:53.230506: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:438] MULTIPLE tensorrt candidate conversion: 7
Segmentation fault (core dumped)

Hi, I want to know whether this problem has been solved.

Nope, but you can try the modifications proposed by AastaLL here.
Actually I reflashed the Jetson entirely but I didn’t try the tf-trt scripts again

I’ve tried, but it’s still not right

Hi,

It looks like you already file another topic for the new issue after re-flashing:
[url]https://devtalk.nvidia.com/default/topic/1046423/jetson-tx2/tensorflow-issue-nonmaxsuppressionv3-in-binary/post/5309955/#5309955[/url]

Let’s track the TensorFlow InvalidArgumentError issue on the topic 1046423 directly.

Thanks.