Output Log:
2019-03-05 09:22:57.897641: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2019-03-05 09:22:57.897847: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
2019-03-05 09:23:08.293207: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:438] MULTIPLE tensorrt candidate conversion: 8
2019-03-05 09:23:10.659763: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: …/builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-03-05 09:23:10.659928: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:0 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 3 nodes)
2019-03-05 09:23:10.660836: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: …/builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-03-05 09:23:10.660905: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:1 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 6 nodes)
2019-03-05 09:23:10.661388: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: …/builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-03-05 09:23:10.661453: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:2 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 6 nodes)
2019-03-05 09:23:10.662183: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: …/builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-03-05 09:23:10.662248: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:3 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 62 nodes)
2019-03-05 09:23:10.662693: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: …/builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-03-05 09:23:10.662760: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:4 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 6 nodes)
2019-03-05 09:23:14.515738: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:5 due to: “Invalid argument: Output node ‘yolov3/yolo-v3/Conv_5/LeakyRelu/alpha’ is weights not tensor” SKIPPING…( 506 nodes)
2019-03-05 09:23:14.517134: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: …/builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-03-05 09:23:14.517232: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:6 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 8 nodes)
2019-03-05 09:23:14.517855: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: …/builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-03-05 09:23:14.517919: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:7 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 53 nodes)
Traceback (most recent call last):
File “/home/nvidia/py3tf/lib/python3.5/site-packages/tensorflow/python/framework/importer.py”, line 418, in import_graph_def
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: NodeDef mentions attr ‘Truncate’ not in Op<name=Cast; signature=x:SrcT → y:DstT; attr=SrcT:type; attr=DstT:type>; NodeDef: import/Cast = CastDstT=DT_FLOAT, SrcT=DT_INT32, Truncate=false. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “yolo.py”, line 64, in
[“Placeholder:0”, “concat_9:0”, “mul_9:0”])
File “/home/nvidia/Desktop/saurabh/Tensorflow-TensorRT/YOLOv3/utils.py”, line 231, in read_pb_return_tensors
return_elements=return_elements)
File “/home/nvidia/py3tf/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py”, line 432, in new_func
return func(*args, **kwargs)
File “/home/nvidia/py3tf/lib/python3.5/site-packages/tensorflow/python/framework/importer.py”, line 422, in import_graph_def
raise ValueError(str(e))
ValueError: NodeDef mentions attr ‘Truncate’ not in Op<name=Cast; signature=x:SrcT → y:DstT; attr=SrcT:type; attr=DstT:type>; NodeDef: import/Cast = CastDstT=DT_FLOAT, SrcT=DT_INT32, Truncate=false. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
Python Code:
# Import the needed libraries
import cv2
import time
import numpy as np
import tensorflow as tf
import tensorflow.contrib.tensorrt as trt
from tensorflow.python.platform import gfile
from PIL import Image
from YOLOv3 import utils
print("Import Done!")
# function to read a ".pb" model
# (can be used to read frozen model or TensorRT model)
tf.expand_dims
def read_pb_graph(model):
with gfile.FastGFile(model,'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
return graph_def
frozen_graph = read_pb_graph("./YOLOv3/yolov3_gpu_nms.pb")
tf.squeeze
your_outputs = ["Placeholder:0", "concat_9:0", "mul_9:0"]
print("PB Read Done!")
# convert (optimize) frozen model to TensorRT model
trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,# frozen model
outputs=your_outputs,
max_batch_size=2,# specify your max batch size
max_workspace_size_bytes=2*(10**9),# specify the max workspace
precision_mode="FP32") # precision, can be "FP32" (32 floating point precision) or "FP16"
print("Convert/optimize TRT Done!")
#write the TensorRT model to be used later for inference
with gfile.FastGFile("./YOLOv3/TensorRT_YOLOv3_2.pb", 'wb') as f:
f.write(trt_graph.SerializeToString())
print("TensorRT model is successfully stored!")
print("Write PB Done!")
# check how many ops of the original frozen model
all_nodes = len([1 for n in frozen_graph.node])
print("numb. of all_nodes in frozen graph:", all_nodes)
# check how many ops that is converted to TensorRT engine
trt_engine_nodes = len([1 for n in trt_graph.node if str(n.op) == 'TRTEngineOp'])
print("numb. of trt_engine_nodes in TensorRT graph:", trt_engine_nodes)
all_nodes = len([1 for n in trt_graph.node])
print("numb. of all_nodes in TensorRT graph:", all_nodes)
# config
SIZE = [416, 416] #input image dimension
# video_path = 0 # if you use camera as input
video_path = "./dataset/demo_video/road2.mp4" # path for video input
classes = utils.read_coco_names('./YOLOv3/coco.names')
num_classes = len(classes)
GIVEN_ORIGINAL_YOLOv3_MODEL = "./YOLOv3/yolov3_gpu_nms.pb" # to use given original YOLOv3
TENSORRT_YOLOv3_MODEL = "./YOLOv3/TensorRT_YOLOv3_2.pb" # to use the TensorRT optimized model
# get input-output tensor
input_tensor, output_tensors = \
utils.read_pb_return_tensors(tf.get_default_graph(),
TENSORRT_YOLOv3_MODEL,
["Placeholder:0", "concat_9:0", "mul_9:0"])
# perform inference
with tf.Session(config=tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.5))) as sess:
vid = cv2.VideoCapture(video_path) # must use opencv >= 3.3.1 (install it by 'pip install opencv-python')
while True:
return_value, frame = vid.read()
if return_value == False:
print('ret:', return_value)
vid = cv2.VideoCapture(video_path)
return_value, frame = vid.read()
if return_value:
image = Image.fromarray(frame)
else:
raise ValueError("No image!")
img_resized = np.array(image.resize(size=tuple(SIZE)),
dtype=np.float32)
img_resized = img_resized / 255.
prev_time = time.time()
boxes, scores = sess.run(output_tensors,
feed_dict={input_tensor:
np.expand_dims(
img_resized, axis=0)})
boxes, scores, labels = utils.cpu_nms(boxes,
scores,
num_classes,
score_thresh=0.4,
iou_thresh=0.5)
image = utils.draw_boxes(image, boxes, scores, labels,
classes, SIZE, show=False)
curr_time = time.time()
exec_time = curr_time - prev_time
result = np.asarray(image)
info = "time:" + str(round(1000*exec_time, 2)) + " ms, FPS: " + str(round((1000/(1000*exec_time)),1))
cv2.putText(result, text=info, org=(50, 70),
fontFace=cv2.FONT_HERSHEY_SIMPLEX,
fontScale=1, color=(255, 0, 0), thickness=2)
#cv2.namedWindow("result", cv2.WINDOW_AUTOSIZE)
cv2.imshow("result", result)
if cv2.waitKey(10) & 0xFF == ord('q'): break