Problem testing TensorRT optimized model

marcoa.portocarrero · June 20, 2019, 3:26pm

I’ve recently manage to train a custom object detection model in Tensorflow using ssd_mobilenet_v1_coco_11_06_2017. I used TensorRT to get an optimized model and I ran into this problem while trying to do inference with the optimized model:

tegra@nvidia:~/marco/Object_detection$ python3 TRT_object_detection.py 
2019-06-20 10:12:14.683101: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-06-20 10:12:14.684393: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x3633da60 executing computations on platform Host. Devices:
2019-06-20 10:12:14.684769: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): <undefined>, <undefined>
2019-06-20 10:12:14.844122: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-06-20 10:12:14.844629: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x363a7ef0 executing computations on platform CUDA. Devices:
2019-06-20 10:12:14.844699: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): NVIDIA Tegra X2, Compute Capability 6.2
2019-06-20 10:12:14.845794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.02
pciBusID: 0000:00:00.0
totalMemory: 7.68GiB freeMemory: 5.47GiB
2019-06-20 10:12:14.845874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-20 10:12:17.879692: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-20 10:12:17.879818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-06-20 10:12:17.879886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-06-20 10:12:17.880243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4730 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2019-06-20 10:12:19.962506: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:241] Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis node Preprocessor/ResizeToRange/strided_slice_3. Error: Pack node (Preprocessor/ResizeToRange/stack_2) axis attribute is out of bounds: 0
2019-06-20 10:12:22.387329: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:241] Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis node Preprocessor/ResizeToRange/strided_slice_3. Error: Pack node (Preprocessor/ResizeToRange/stack_2) axis attribute is out of bounds: 0
Traceback (most recent call last):
  File "/home/tegra/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/tegra/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/tegra/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow/Gather/GatherV2_1}} has inputs from different frames. The input {{node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Gather_1/GatherV2/axis}} is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input {{node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/FilterGreaterThan/Gather/GatherV2_1}} is in frame ''.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "TRT_object_detection.py", line 106, in <module>
    feed_dict={image_tensor: frame_expanded})
  File "/home/tegra/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/tegra/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/tegra/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/tegra/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow/Gather/GatherV2_1 (defined at TRT_object_detection.py:70)  has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Gather_1/GatherV2/axis (defined at TRT_object_detection.py:70)  is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/FilterGreaterThan/Gather/GatherV2_1 (defined at TRT_object_detection.py:70)  is in frame ''.

I think it is caused because i used this lines to get the input and output tensors in the model in my inference code:

# Define input and output tensors (i.e. data) for the object detection classifier

# Input tensor is the image
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

# Output tensors are the detection boxes, scores, and classes
# Each box represents a part of the image where a particular object was detected
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')

# Each score represents level of confidence for each of the objects.
# The score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')

# Number of objects detected
num_detections = detection_graph.get_tensor_by_name('num_detections:0')

# Load image using OpenCV and
# expand image dimensions to have shape: [1, None, None, 3]
# i.e. a single-column array, where each item in the column has the pixel RGB value
image = cv2.imread(PATH_TO_IMAGE)
image_expanded = np.expand_dims(image, axis=0)

# Perform the actual detection by running the model with the image as input
(boxes, scores, classes, num) = sess.run(
    [detection_boxes, detection_scores, detection_classes, num_detections],
    feed_dict={image_tensor: image_expanded})

How could I get the correct name for the input and output tensors for the new model? If that’s not the problem, what could it be?
Thanks in advance for any help given or advice.

AastaLLL · June 21, 2019, 6:37am

Hi,

May I know how do you optimize the ssd_mobilenet_v1_coco_11_06_2017 to TensorRT?
Do you use TF-TRT or this sample?

Your error is from the TensorFlow frameworks.
But the sample shared above is a standalone app from TensorFlow.
It is weird to me see the TensorFlow error with the TRT_object_detection sample.

Thanks.

marcoa.portocarrero · June 25, 2019, 8:49pm

I followed this link instructions and changed it for my model: Tensorflow-TensorRT/7_optimizing_YOLOv3_using_TensorRT.ipynb at master · ardianumam/Tensorflow-TensorRT · GitHub

The code I used to optimize the model was this

import cv2
import time
import numpy as np
import tensorflow as tf
import tensorflow.contrib.tensorrt as trt
from tensorflow.python.platform import gfile
from PIL import Image
from tf_trt_models.detection import build_detection_graph

# function to read a ".pb" model 
# (can be used to read frozen model or TensorRT model)
def read_pb_graph(model):
  with gfile.FastGFile(model,'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
  return graph_def

config_path = '/home/tegra/marco/Object_detection/training/faster_rcnn_inception_v2_pets.config'
checkpoint_path = '/home/tegra/marco/Object_detection/training/model.ckpt-46958'

frozen_graph, input_names, output_names = build_detection_graph(
    config=config_path,
    checkpoint=checkpoint_path
)

# convert (optimize) frozen model to TensorRT model
trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',
    minimum_segment_size=50
    ) # precision, can be "FP32" (32 floating point precision) or "FP16"

#write the TensorRT model to be used later for inference
with gfile.FastGFile("/home/tegra/marco/Object_detection/model_trt.pb", 'wb') as f:
    f.write(trt_graph.SerializeToString())
print("TensorRT model is successfully stored!")

# check how many ops of the original frozen model
all_nodes = len([1 for n in frozen_graph.node])
print("numb. of all_nodes in frozen graph:", all_nodes)

# check how many ops that is converted to TensorRT engine
trt_engine_nodes = len([1 for n in trt_graph.node if str(n.op) == 'TRTEngineOp'])
print("numb. of trt_engine_nodes in TensorRT graph:", trt_engine_nodes)
all_nodes = len([1 for n in trt_graph.node])
print("numb. of all_nodes in TensorRT graph:", all_nodes)

kayccc · July 4, 2019, 3:13am

Hi marcoa.portocarrero,

Have you managed to get it working?
Any result can be shared?

Thanks

cpchiu · January 22, 2020, 6:01am

I encountered the same problem when I am trying run the inference on the TensorRT converted graph. But the problem happened on Faster RCNN and Mask RCNN instead of the MobileNet.

hillabar · February 23, 2020, 1:30pm

I encountered the same problem like cpchiu, I tried using TF-TRT and run Mask RCNN model and
recieved the following error message:

InvalidArgumentError: node BatchMultiClassNonMaxSuppress
ion/map/while/MultiClassNonMaxSuppression/FilterGreaterThan/Greater (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748)  has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/FilterGreaterThan/Greater/y (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748)  is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Reshape (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748)  is in frame ''.

Topic		Replies	Views
After converting ssdMobilnet from the examples, the model is slower Jetson Xavier NX tensorrt	4	492	October 18, 2021
No improvements from TensorRT on NVIDIA-AI-IOT/tf_trt_models TensorRT	3	1564	February 21, 2019
TensorRT graph to slow TensorRT	15	1549	January 5, 2020
Don't get any 'TRTEngineOp' after optimizing model via TensorRT in Jeton TX2 TensorRT	17	3672	October 12, 2021
No improvement in inference performance after Opt. with TensorRT TensorRT	6	1221	April 15, 2020
Jetson TX2 Tensorrt l4t-tensorflow NGC Segmentation fault at build trt graphconverterV2 Jetson TX2 tensorrt	4	472	May 17, 2023
No SpeedUp after TensorRT INT8 (PointNet ++ tensorflow model) TensorRT	6	1247	February 25, 2020
TensorRT (TF-TRT) doesn't improve TF model in GeForce 1060? TensorRT	7	2893	January 18, 2019
Deploy Object Detection TF-TRT INT8 with DS Triton DeepStream SDK inference-server-triton	16	1296	October 12, 2021
"Engine buffer is full" TensorRT	15	3632	October 12, 2021

Problem testing TensorRT optimized model

Related topics