I am using tf-trt for inference (as it is more or less the only available performance option without writing plugins). My code has the following segment:
with tf.gfile.GFile('./ssd_mobilenet_v1_coco_trt.pb', 'rb') as pf: trt_graph.ParseFromString(pf.read()) print("#3", time.time()) input_names = ['image_tensor'] output_names = ['detection_boxes', 'detection_classes', 'detection_scores', 'num_detections']
which takes a long long time to execute. My guess is that this creates an engine. In there a way of saving and loading this engine to make loading time quicker? when using direct tensorrt, like in the /usr/src/tensorrt/samples/python/uff_ssd sample, those function are used to save and load the engine:
def save_engine(engine, engine_dest_path): buf = engine.serialize() with open(engine_dest_path, 'wb') as f: f.write(buf) def load_engine(trt_runtime, engine_path): with open(engine_path, 'rb') as f: engine_data = f.read() engine = trt_runtime.deserialize_cuda_engine(engine_data) return engine
but I can’t see how to access the underlying engine from the trt api.
currently, on my 5w nano, the parsefromstring takes about 3 minutes and later on tf.import_graph_def(trt_graph, name=’’) takes another minute or so. That is a long time…