Convert tensorflow frozen graph to uff

Could you please let me know how to specify the dimension?

I am converting the tensorflow checkpoint trained with Lanenet https://github.com/MaybeShewill-CV/lanenet-lane-detection to UFF so I can use TensorRT to run inference.

There is a output node in the orignal tensorflow model lanenet_model/vgg_backend/binary_seg/ArgMax, which was [1, 256, 512] with type int64. However, after I convert the tensorflow checkpoint to UFF and load with C++ tensorrt code, the node became [1, 512] with type float32.

I do not what’s wrong. Anyone can help?

Hi,

Here are some examples that you can refer to:

/usr/src/tensorrt/samples/sampleUffMNIST
/usr/src/tensorrt/samples/sampleUffSSD
/usr/src/tensorrt/samples/sampleUffFasterRCNN
/usr/src/tensorrt/samples/sampleUffMaskRCNN

Thanks.

Following the sample code, I was able to make my program work. However, I found it takes about 5 minutes to load my 140M uff mode. Is this normal? Anyway to make it faster?

Hello,
Would you mind sharing the code you used to frozen the graph. I have managed to convert the file but it seems that it is not the right way to do it.
Thank you,
Luis

https://github.com/VasinPA/lanenet-lane-detection/tree/master/tools, check the test_lanenet_and_freeze.py.

This function is acutally running the inference part of Lanenet. While doing that, it also freezes the trained mode.

After freezing the model, you need to convert that to uff.

Hi,

To convert an uff file into an engine, TensorRT will test each possible implementation and then pick up a best solution.

If the model doesn’t change, you don’t need to do this each time.
You can serialize the TensorRT engine and de-serialize it directly next time.
https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_cuda_engine.html#a230a2f79e79f1d31bf2b64855f3b0ff9

Thanks.

@AastaLLL Below is my code to generate the Runtime from uff model. After reading your post, I realized I may be doing something silly here.

Could you take a look and let me know how to make it efficient?

IHostMemory* trtModelStream{ 0 };

ICudaEngine* tmpEngine = loadModelAndCreateEngine(filename.c_str(), parser, trtModelStream);
if (tmpEngine == 0) {
    cout << "Can not make temp engine" << endl;
    return;
}

if (trtModelStream == 0) {
    cout << "Can not make stream" << endl;
    return;
}

tmpEngine->destroy();

 // Deserialize the engine.
_runtime = createInferRuntime(gLogger.getTRTLogger());
if (_runtime == 0) {
    cout << "Can not create runtime" << endl;
    return;
}

_engine = _runtime->deserializeCudaEngine(trtModelStream->data(), trtModelStream->size(), 0);
if (_engine == 0) {
    cout << "Can not make engine" << endl;
    return;
}
trtModelStream->destroy();

_context = _engine->createExecutionContext();
if (_context == 0) {
    cout << "Can not make context" << endl;
    return;
}

Hi @AastaLLL I was able to do this and it’s much faster now. Thanks.

A general question, is the approach (converting tensorflow checkpoint to uff and the run the uff with tensorRT) the best approach in terms of inference speed?

You mentioned when converting a uff to a runtime engine, tensorRT tried to find the best implementation. By implementation, do you mean Int16 or Int8? Is there a way to find out which implementation tensorRT used?