When calling:
_engine = std::shared_ptr<nvinfer1::ICudaEngine>(builder->buildEngineWithConfig(*network, *config), InferDeleter());
It always takes quite some time to optimize the network (around 3 minutes).
Is there some config to skip the optimization part during development or at least speed it up for some performance loss. When I just want to try out stuff? Currently it’s quite annoying to test anything because for every code change I have to wait 3 minutes to run it.
I already tried to add:
config->setAvgTimingIterations(1);
config->setMinTimingIterations(0);
But it does not seem to change anything.
Using TensorRT v7.0.0
Turns out, it is expected that the engine creation can take quite a bit of runtime. That is why there is engine->serialize() method and the engine can be saved in memory or to disk. That way we only have to build the engine once per model and machine.
I ended up checking if the engine file exists on disk, if not → create it and save it to disk, if yes → load engine from disk. Here is some code:
std::ifstream file("/path/to/model.engine");
if(file.good()) {
std::cout << "** Found Engine on Disk, loading... **" << std::endl;
file.seekg(0, file.end);
auto size = static_cast<size_t>(file.tellg());
file.seekg(0, file.beg);
std::vector<char> engineStream;
engineStream.resize(size);
file.read(engineStream.data(), size);
file.close();
nvinfer1::IRuntime* runtime = nvinfer1::createInferRuntime(getLogger());
auto engine = runtime->deserializeCudaEngine(engineStream.data(), size);
runtime->destroy();
}
else {
std::cout << "** No Engine found on Disk, creating... **" << std::endl;
// create engine here...
nvinfer1::IHostMemory* serializedModel = engine->serialize();
std::ofstream p("path/to/model.engine", std::ios::binary);
p.write((const char*)serializedModel->data(),serializedModel->size());
p.close();
serializedModel->destroy();
}