Loading of the tensorRT Engine in C++ API

saikumar.gadde · February 15, 2018, 2:34pm

Hi,

I have created a deep network in tensorRT python API manually. I saved the engine into *.engine file. I want to load this engine into C++ and I am unable to find the necessary function to load the saved engine file into C++. Can we do this?

Thank you

dusty_nv · February 15, 2018, 4:33pm

Hi saikumar.gadde, the function is nvinfer1::IRuntime::deserializeCudaEngine(), please see code of loading it here:

[url]jetson-inference/tensorNet.cpp at e12e6e64365fed83e255800382e593bf7e1b1b1a · dusty-nv/jetson-inference · GitHub

Note that the engine should be created on the actual platform - Jetson TX1 - because TensorRT runs device-specific profiling during the optimization phase. Since Python API isn’t supported on Jetson at this time, it would seem that you are creating the optimized engine on a different platform (like a PC with another GPU).

ljstrnadiii · February 16, 2018, 6:29pm

Is there an example of loading a tensorflow tensorrt engine with the C++ api?

Thanks!

dusty_nv · February 16, 2018, 6:43pm

Hi ljstrnadiii, please see the SampleUffMNIST sample from TensorRT.

AastaNV also has an example on GitHub with it here: [url]https://github.com/AastaNV/ChatBot[/url]

ljstrnadiii · February 16, 2018, 6:54pm

Thanks, not sure I see what I am looking for.

Assuming I have built a tensorrt engine with my frozen tensorflow model, how can I load the engine and make inference in C++ like the python example:

from tensorrt.lite import Engine
from tensorrt.infer import LogSeverity
import tensorrt

# Create a runtime engine from plan file using TensorRT Lite API 
engine_single = Engine(PLAN="keras_vgg19_b1_FP32.engine",
                       postprocessors={"dense_2/Softmax":analyze})

images_trt, images_tf = load_and_preprocess_images()

results = []
for image in images_trt:
    result = engine_single.infer(image) # Single function for inference
    results.append(result)

which is found here: https://devblogs.nvidia.com/tensorrt-3-faster-tensorflow-inference

Thanks again. (just another non-cs data-scientist… haha)

dusty_nv · February 16, 2018, 9:49pm

You shouldn’t be running an optimized TensorRT engine that was frozen from Python on another machine, because TensorRT performs device-specific profiling and optimizations when building the TensorRT engine. See my quote from above:

What should happen on the Jetson is loading of the UFF file like in the samples, which has been converted from frozen .pb on another machine with Python. The TensorRT engine is then still created properly for the Jetson TX1’s GPU. You can copy the serialized TensorRT engine between different Jetson TX1 boards (if you are moving to Jetson TX2, you should re-create the engine because it’s a different GPU).

ljstrnadiii · February 16, 2018, 10:21pm

Agreed. I have read that is it necessary to build on the tx1 if we want to perform inference on a tx1.

So, assuming I made the engine on the tx1, how can I make inference with C++ code like the very simple example I posted in python?

I don’t code in C++, but I do code in python. The issue is that a user only uses C++ and I need to find an example to do inference in C++ like the very simple example of the python code above. I can’t seem to find an example and can’t read C++ well enough to read the docs… :/

Thanks!

dusty_nv · February 16, 2018, 11:18pm

That is what these samples do. If you are looking for a more general sample of performing inference with TensorRT C++ API, see this code:

http://github.com/dusty-nv/jetson-inference

The TensorRT Python API isn’t currently supported/available for arm64 on Jetson, so the inferencing is performed through the C++ API.

ljstrnadiii · February 18, 2018, 2:11am

I am starting to see. Thanks for repeating yourself. I guess I was hoping for it to be as simple as the python code example.

dusty_nv · February 18, 2018, 2:40pm

If the networks you are using are for image recognition (Alexnet, Googlenet, Resnet, ect.), object detection (DetectNet), or segmentation (Segnet, FCN-Alexnet), then you may be able to use or adapt these higher-level samples for your purposes:

These are essentially wrappers around TensorRT which implement the deep-learning vision primitives. They are currently setup to load .caffemodel’s

ljstrnadiii · February 19, 2018, 11:43pm

I have a very simple tensorflow model with one input and one output: “prefix/inputs” and “prefix/yhat”. I really just need to create a UFF file from the frozen graph to send to someone using a TX1/2. The intention is to have them build the tensorrt engine on the TX1/2 to run inference on video real time. ( the engine must be built on the hardware it will make inference on, but does building the UFF file require the same thing?)

I actually use docker to train and freeze graphs. Is there a simple way to build a UFF model without grabbing all the tensorrt stuff?

or do you know of an image that has tensorrt built in so that I can just attach the frozen graph to build a UFF file for the TX users?

Thanks again!

ljstrnadiii · February 20, 2018, 4:47am

nevermind, I found this:

docker pull nvcr.io/nvidia/tensorrt:17.12

However, I try to import tensorrt in python and it says to make sure pycuda is installed…hm.

update: I wasn’t using nvidia-docker to run the image! Looks like I now have tensorrt and uff!!

ljstrnadiii · February 24, 2018, 11:11pm

The plan is to build uff model file in in python in the tensorrt docker image found here: https://devblogs.nvidia.com/tensorrt-container/

Solution:

uff_model = uff.from_tensorflow_frozen_model(frozen_model_file_path, output_node_name, output_filename = "uff_filename.uff")

Then, I plan to use the examples @dusty_nv suggested above to deploy the uff model on a jetson with the C++ examples here: https://github.com/AastaNV/ChatBot/tree/master/src

reachtorameshmail · February 27, 2018, 9:20am

Hi ljstrnadiii, Thank you very much for initiating this discussion.

I just started to use the Jetson TX2 device (JetPack3.2). I have created a UFF file from a pre-trained tensorflow model (GitHub - argman/EAST: A tensorflow implementation of EAST text detector) using Python on the host machine (Intel). This model is based on resnet v1 50 used to detect the text segments from the images. I would like to create an optimized inference engine with the available UFF file using TensorRT C++ API without using any wrapper as in the ChatBot example.

Hi dusty_nv, Is it a good idea to adapt the imagenet code (https://github.com/dusty-nv/jetson-inference/blob/master/imageNet.h) to resolve my issue? Please suggest me on this.

dhingratul · May 7, 2018, 4:44pm

@dusty_nv I checked the samples, but it assumes that you create the engine within the same script. What is the way to load a .engine file that I have on disk? The nvinfer1::ICudaEngine* nvinfer1::IRuntime::deserializeCudaEngine asks for the memory that holds the engine, I am unsure of the correct way of reading the .engine file into memory.

dusty_nv · May 8, 2018, 7:52pm

See these lines of code where the previously-saved engine is loaded from file on disk:

That codebase is made to check if the cached engine already exists on disk, and if it does load it, otherwise run the TensorRT optimizations and then save it for next time.

dhingratul · May 8, 2018, 8:22pm

Update: Fixed after changing the way that engine is saved

p.write((const char*)tensorRTModelStream->data(), tensorRTModelStream->size());

I tried this, it’s throwing me a segmentation fault

ERROR: Parameter check failed at: Infer.cpp::deserializeCudaEngine::154, condition: (blob) != NULL
Segmentation fault (core dumped)

This is a snippet of how my .engine was saved

IHostMemory *&tensorRTModelStream;
tensorRTModelStream = engine->serialize();
std::ofstream p("../output/xyz.engine");
p.write(reinterpret_cast<const char*>(tensorRTModelStream->data()), tensorRTModelStream->size());

This is from the code you shared,

std::stringstream gieModelStream;
gieModelStream.seekg(0, gieModelStream.beg);
std::ifstream cache( "../output/xyz.engine" );
gieModelStream << cache.rdbuf();
cache.close();
	IRuntime* runtime = createInferRuntime(gLogger);
gieModelStream.seekg(0, std::ios::end);
const int modelSize = gieModelStream.tellg();
gieModelStream.seekg(0, std::ios::beg);
void* modelMem = malloc(modelSize);
gieModelStream.read((char*)modelMem, modelSize);
nvinfer1::ICudaEngine* engine = runtime->deserializeCudaEngine(modelMem, modelSize, NULL);
free(modelMem);

Do I need to change the way the engine is saved, or is there some other error ?

AastaLLL · May 11, 2018, 8:06am

Hi,

Could you check if the serialized engine is good without corruption?
It will also help if you can run the jetson_inference sample to check if everything good on your environment.

Thanks.

dhingratul · May 11, 2018, 4:01pm

This was fixed after I used the same typecasting for serialization and deserialization.

aaron7m9hn · June 15, 2018, 12:09pm

Hi dhingratul!

I was wondering if you could elaborate on the typecasting issues you had? I am having the same error and segfault as you before. Thanks!

[Update]
I didn’t see your update. I think I got it, thanks!