It loads UFF model and create engine.
For me I like to load TensorRT engine file (detect.engine) directly in C++.
Because TensorRT engine is created using the same system, so I don’t need to rebuild. I can directly use TensorRT engine.
How to load and deserialize in C++?
I don’t think this really answer the problemtic we have here.
I am facing the same issue as edit_or and the point is above loading in C++ (loadingfrom terminal is working) a .engine model.
I couldn’t find an answer to the problem but many supposition such as workign with the deepstream sample (nvdsinfer_custom_impl_yolo), modifying source. But still, loading this to a custom C++ apps remains very mysterious.
I am not sure of what the .engine file really aim at and documentation seems poor on this topic. From your answer does it means we are suppose to include and eventually modify the source of trtexec in our app to run a model ? Isn’t there some sort of c++ interface/lib for doing so ?
It seems amazing to me that Nvidia is always trying to empower developers to do things with its libraries and frameworks but its literally impossible just to know how to load an .engine model and perform inference with it (in simple plain C++ code).
This is all they show (barely 7, out of context, lines). So you have to dive intoDustin Franklin Jetson-Inference code for hours just to understand a little bit how does this works.
→ Could you please show a simple and explained script on how to work with .engine files to perform inference?
I think this will help us to build better solutions using Nvidia frameworks. :)
Nice Regards.
//logger.h
#pragma once
#include <iostream>
class logger : public ILogger {
void log(Severity severity, const char * msg) override {
if (severity != Severity::kInfo) {
std::cout << msg << std::endl;
}
}
} gLogger;
// one of the logger is probably not usefull since overrided but I didn't experimented without it so here it is
from there you can start to populate your input (cudaMalloc / cudaMemCpy) and set the pointers in a std::vector<void*> for exemple
and request execution with for exemple
context->enqueue(batch_size, buffers.data(), cuda_stream /*0 for exemple */, nullptr);
with buffers the std::vector<void*> with pointers to the input in gpuMemory (from cudaMallloc)
I can’t remember where I found this, so sorry for the credit.
I may details a little more the second party later.
good luck