Load TensorRT engine and deserialize in C++

edit_or · September 23, 2020, 9:34am

Where can I see C++ sample to load TensorRT engine and deserialize for inference in C++?

This is in Python and I’m looking for C++ version.

with open(“sample.engine”, “wb”) as f: f.write(engine.serialize())
Read the engine from the file and deserialize:

with open(“sample.engine”, “rb”) as f, trt.Runtime(TRT_LOGGER) as runtime: engine = runtime.deserialize_cuda_engine(f.read())

AakankshaS · September 23, 2020, 3:46pm

Hi @edit_or,

Kindly refer to the below links

github.com

NVIDIA/TensorRT/blob/9a9cae75e7155b2114454f37ccc49eca9d3352dc/samples/opensource/sampleMovieLensMPS/sampleMovieLensMPS.cpp#L674


      
          parser->registerInput(USER_BLOB_NAME, inputIndices, UffInputOrder::kNCHW);
          parser->registerInput(ITEM_BLOB_NAME, inputIndices, UffInputOrder::kNCHW);
          parser->registerOutput(UFF_OUTPUT_NODE);
          
          
auto engine = loadModelAndCreateEngine(args.uffFile.c_str(), parser.get(), args);
          if (engine.get() == nullptr)
          {
              throw std::runtime_error("Failed to create engine.");
          }
          
          
auto modelStream = samplesCommon::infer_object(engine->serialize());
          
          
size_t modelStreamSize = modelStream->size();
          // Create a shared buffer for the modelStream.
          int fd = shm.open_rw();
          
          
fallocate(fd, 0, 0, modelStreamSize);
          void* modelStreamData = mmap(NULL, modelStreamSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
          // Copy modelStream to the shared buffer.
          std::memcpy(modelStreamData, modelStream->data(), modelStreamSize);
          // Clean up.

Thanks!

edit_or · September 24, 2020, 2:54am

It loads UFF model and create engine.
For me I like to load TensorRT engine file (detect.engine) directly in C++.
Because TensorRT engine is created using the same system, so I don’t need to rebuild. I can directly use TensorRT engine.
How to load and deserialize in C++?

AakankshaS · November 29, 2020, 9:00pm

Hi @edit_or
You can use trtexec command to load the engine
trtexec --loadEngine=g1.trt --batch=1
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks!

armand.zampierizn4wa · February 8, 2021, 2:12pm

Hello AakankshaS,

I don’t think this really answer the problemtic we have here.

I am facing the same issue as edit_or and the point is above loading in C++ (loadingfrom terminal is working) a .engine model.
I couldn’t find an answer to the problem but many supposition such as workign with the deepstream sample (nvdsinfer_custom_impl_yolo), modifying source. But still, loading this to a custom C++ apps remains very mysterious.

I am not sure of what the .engine file really aim at and documentation seems poor on this topic. From your answer does it means we are suppose to include and eventually modify the source of trtexec in our app to run a model ? Isn’t there some sort of c++ interface/lib for doing so ?

Thank for your attention

matesanz.cuadrado · April 30, 2021, 11:56am

It seems amazing to me that Nvidia is always trying to empower developers to do things with its libraries and frameworks but its literally impossible just to know how to load an .engine model and perform inference with it (in simple plain C++ code).

This is all they show (barely 7, out of context, lines). So you have to dive into Dustin Franklin Jetson-Inference code for hours just to understand a little bit how does this works.

→ Could you please show a simple and explained script on how to work with .engine files to perform inference?

I think this will help us to build better solutions using Nvidia frameworks. :)
Nice Regards.

NVES · May 3, 2021, 7:23am

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance

Thanks!

maazullah · July 7, 2021, 8:00am

@matesanz.cuadrado did you have found any solution?
I am also facing the same problem .

matesanz.cuadrado · July 7, 2021, 9:18am

No, unfortunately docs are still far away from what one would expect from a serious product.

If you figure it out, please, let me know.

edit_or · July 8, 2021, 1:39am

Hello,
Does this github helpful for you?

armand.zampierizn4wa · July 13, 2021, 1:16pm

Hello here you can find the code that got it working for me

#include <fstream> 
#include <sstream>
#include <NvInfer.h>
#include <NvInferPlugin.h>
#include <NvInferPluginUtils.h>
#include <NvInferRuntime.h>
#include <NvInferRuntimeCommon.h>

#include "logger.h"

struct TRTDestroy {
    template<class T> 
    void operator()(T* obj) const {
        obj->destroy();
    }
}
class Logger : public ILogger {
    void log(Severity severity, const char* msg) override {
        if(severity != Severity::kINFO) {
            std::cout << msg << std::endl;
        }
    }
}

template< class T >
using TRTUniquePtr = std::unique_ptr< T, TRTDestroy >;

std::ifstream planfile(<path_to_engine_file>)
std::stringstream planBuffer;
planBuffer << planFile.rdbuf();
std::string plan = planBuffer.str();

TRTUniquePtr< nvinfer1::IRuntime > runtime {nullptr};
TRTUniquePtr< nvinfer1::ICudaEngine > engine {nullptr};
TRTUniquePtr< nvinfer1::IExecutionContext()> context {nullptr};

runtime.reset(nvinfer1::createInferRuntime(gLogger));
engine.reset(runtime->deserializeCudaEngine((void*) plan.data(), plan.size(), nullptr));
context.reset(engine->createExecutionContext())
}

//logger.h
#pragma once

#include <iostream>

class logger : public ILogger {
    void log(Severity severity, const char * msg) override {
        if (severity != Severity::kInfo) {
            std::cout << msg << std::endl;
        }
    }
} gLogger;

// one of the logger is probably not usefull since overrided but I didn't experimented without it so here it is

from there you can start to populate your input (cudaMalloc / cudaMemCpy) and set the pointers in a std::vector<void*> for exemple

and request execution with for exemple

context->enqueue(batch_size, buffers.data(), cuda_stream /*0 for exemple */, nullptr);

with buffers the std::vector<void*> with pointers to the input in gpuMemory (from cudaMallloc)

I can’t remember where I found this, so sorry for the credit.

I may details a little more the second party later.
good luck

edit_or · July 14, 2021, 4:13am

Nice that helps

Topic		Replies	Views
TensorRT C++ engine deserealization failed. Windows 10 TensorRT	3	582	June 28, 2022
BufferManager issue \| TensorRT C++ sample TensorRT tensorrt	5	1930	October 12, 2021
Falure to do inference TAO Toolkit tensorrt	9	1070	January 11, 2022
Loading of the tensorRT Engine in C++ API Jetson TX1	24	19042	October 18, 2021
Build TensorRT on Cuda compute capability 7.5 and make it backward compatible with previous capabilities TensorRT tensorrt	4	1757	May 19, 2022
What causes the deserializeCudaEngine() fail and how to get the error message? TensorRT tensorrt	9	1378	May 27, 2023
How can I access the same TensorRT engine model in different thread TensorRT cudnn	1	542	November 27, 2023
TF-TRT not generating .engine file TensorRT	1	718	May 18, 2022
Runtime.deserialize_cuda_engine return a NoneType, how to fix ti? TensorRT tensorrt	10	2341	July 15, 2022
Run TF-TRT graph through TF C++ API TensorRT	16	3732	July 21, 2022

Load TensorRT engine and deserialize in C++

Related topics