I’ve been working with CUDA for the last couple of year on desktops/servers.
I’ve looked for quite some time now, for a TensorFlow to C++ demo/tutorial - taking a tensor flow module and running the inference in C++ app, optimize it etc.
I’ve looked at the Hello AI demo, but it doesn’t show this as far as I could tell.
Any pointers?
Also, this is something else I’ve not fully understood. Once I have a trained net in TF, do I must convert it to UFF/ONNX and then somehow to nvidia’s tensor flow plan? why so many error-prone steps? Isn’t there something simpler to take a trained net in TF and run the inference with C++ TensorRT?
These are two different frameworks: TensorFlow and TensorRT.
It’s possible to inference TensorFlow in C++ interface without converting the model.
You will need a TensorFlow C++ library. Please check this topic for some information:
However, we recommends to convert your model into TensorRT which is an optimizer for GPU-based inference.
The first step is to check if all the used operation are supported by TensorRT first:
If yes, you will need to convert the TensorFlow into an intermediate model format (UFF/ONNX) as TensorRT input.
I’ve looked a bit more and if I understand this correctly, option 3 that you mention is the direct TF-TRT path?
I.e. convert the graph to an optimized TensorRT plan file ready to run in a C++ application? The link you’ve put in option 3 seems not relevant?
Also, if I do manage to create a .plan file for TensorRT from within TensorFlow on, say, a desktop with a GTX card, isn’t there an issue to take to a different platform (for example Xaviar)? isn’t the conversion platform dependant?
Suppose there is a corresponding function in the C++ interface although the tutorial is python-based.
The plan file is not portable.
TensorRT will choose an optimal algorithm based on the GPU architecture when creating the engine.
This limits you to use a plan file created from different platform.
However, uff file is portable.
It is an intermediate description for model that independent to the GPU architecture.
Thanks again for the answers.
I guess something still missing for me.
Assuming the development work is NOT done on Xavier, how would I run an optimized plan in C++ on the Xavier itself?
If I understood correctly from what you say, I have only two options?
Develop, build TF net and save the TF output to TRT - ALL on the target machine - Xavier.
Develop on whatever platform I’d like (No Xavier) and via Onnx/Uff convert to Xavier.
Is that correct? If so, this is extremely cumbersome :(