the first inference speed is so slow

when i translate my pb model to tensorrt model, the first time it run ,is almost 3 three hour,then it get down to 0.17s:
I use tensorfow-gpu 2.0
cuda 10.0
the code is below:
import os
import tensorflow.compat.v1 as tf
import time
import numpy as np

os.environ[“CUDA_DEVICE_ORDER”] = “PCI_BUS_ID”
os.environ[“CUDA_VISIBLE_DEVICES”] = “6”

location = False

if location:
# trtfilepath = “/home/andy/models_trt/densenet/densenet_rt8.pb”
trtfilepath = “/home/andy/models_trt/densenet/densenet_rt8.pb”
trtfilepath = “/home/dongdong/qian/models_trt/densenet/densenet_rt8.pb”
# trtfilepath = “/home/dongdong/qian/models/densenet/loss_min_threeN_GPU16.pb”

input_x = “0:0”
outimg1 = “Sigmoid:0”
outimg2 = “Sigmoid_1:0”
outimg3 = “Sigmoid_2:0”

shape = [1, 1, 256, 256]
features = np.random.random(shape).astype(np.float32)

with tf.Session() as sess:
with tf.gfile.GFile(trtfilepath, ‘rb’) as f:
frozen_graph = tf.GraphDef()
tf.import_graph_def(frozen_graph, name=’’)
tf_input = sess.graph.get_tensor_by_name(input_x)
tf_output1 = sess.graph.get_tensor_by_name(outimg1)
tf_output2 = sess.graph.get_tensor_by_name(outimg2)
tf_output3 = sess.graph.get_tensor_by_name(outimg3)
t1 = time.time()

    while True:
        t1 = time.time()
        output1, output2, output3 =[tf_output1, tf_output2, tf_output3], feed_dict={
            tf_input: features
        t2 = time.time()

what’s wrong with it?

1 Like

Any updates on this? I’m having a similar problem.

Any update on this? Seems like the whole tech for edge device is still at early stage. So many bugs and inefficiencies around