TensorRT across different platforms using TF-TRT

Hi,

I want to run inference on my Xavier.

I did the following:

On my working station:

  • build a frozen graph (with tensorflow.tools.graph_transforms)
  • generate an uff file (with uff.from_tensorflow)
  • transfer this uff file from my working station to the Xavier.
  • On the Xavier, convert the uff to plan by parsing the uff file, and running the following (c++):
Parse(uffFilename.c_str(), *network, dataType)
  • then, builds the plan file/ engine (here is the part of the code):
if (!parser->parse(uffFilename.c_str(), *network, dataType)) 

  { 

    cout << "Failed to parse UFF\n"; 

    builder->destroy(); 

    parser->destroy(); 

    network->destroy(); 

    return 1; 

  } 

  

  /* build engine */ 

  if (dataType == DataType::kHALF) 

    builder->setHalf2Mode(true); 

  

  builder->setMaxBatchSize(maxBatchSize); 

  builder->setMaxWorkspaceSize(maxWorkspaceSize); 

  ICudaEngine *engine = builder->buildCudaEngine(*network); 

  

   /* serialize engine and write to file */ 

  ofstream planFile; 

  planFile.open(planFilename); 

  IHostMemory *serializedEngine = engine->serialize(); 

  planFile.write((char *)serializedEngine->data(), serializedEngine->size()); 

  planFile.close();

   

  /* break down */ 

  builder->destroy(); 

  parser->destroy(); 

  network->destroy(); 

  engine->destroy(); 

  serializedEngine->destroy();

I have understood that uff will be deprecated.
My question is how to build the engine now with the TF-TRT library on my working station and building the engne on the Xavier.

I have tried the following:

From my working station:

with tf.Graph().as_default() as tf_graph: 
with my_sess as tf_sess: 
graph_size = len(self.frozen.SerializeToString()) 
num_nodes = len(self.frozen.node) 
frozen_graph = trt.create_inference_graph( 
input_graph_def=self.frozen, 
outputs=self.output_layers_name, 
max_batch_size=1, 
max_workspace_size_bytes=0, 
precision_mode='FP32', 
minimum_segment_size=2, 
is_dynamic_op=True, 
maximum_cached_engines=100) 

 

for n in frozen_graph.node: 
if n.op == "TRTEngineOp": 
print("Node: %s, %s" % (n.op, n.name.replace("/", "_"))) 
with tf.gfile.GFile("%s.plan" % (n.name.replace("/", "_")), 'wb') as f: 
f.write(n.attr["serialized_segment"].s) 
else: 
print("Exclude Node: %s, %s" % (n.op, n.name.replace("/", "_")))

Now I have this “ .plan file”, how do I build the TensorRT engine on the Xavier?
I have read the documentation:
the data in the .plan file can then be provided to IRuntime::deserializeCudaEngine to use the engine in TensorRT.
Please can you help me with the steps to do to use the engine in TensorRT?

Thank you.

Hello,

If you are looking into deserializing the model using the C++ API, check here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#serial_model_c

Thanks.

Hi NVESJ,

Ok thank you.

  1. So if I don’t have tensorflow on the Xavier (which GPU is used for inference), I can’t use TF-TRT offline (on my other working station) to build the engine?

  2. Does the following part of the code creates an engine (with tensorflow)?

for n in frozen_graph.node: 
if n.op == "TRTEngineOp": 
print("Node: %s, %s" % (n.op, n.name.replace("/", "_"))) 
with tf.gfile.GFile("%s.plan" % (n.name.replace("/", "_")), 'wb') as f: 
f.write(n.attr["serialized_segment"].s) 
else: 
print("Exclude Node: %s, %s" % (n.op, n.name.replace("/", "_")))

Thanks!