I want to build engine(mBuilder->buildEngineWithConfig(*mNetwork, *mConfig)) with onnx from the network backbone of yolov4.It works and runs successfully.
But it takes too much time(49mins) to finish the buildEngineWithConfig if I set Flag as FP16(mConfig->setFlag(nvinfer1::BuilderFlag::kFP16)).
Is there a way to speed up?
Environment
TensorRT Version: 7.2.2.3 GPU Type: 3060 Nvidia Driver Version: 471.41 CUDA Version: 11.1 CUDNN Version: 8.0.4.30 Operating System + Version: Windows 10 21H1 Python Version (if applicable): None TensorFlow Version (if applicable): None PyTorch Version (if applicable): None Baremetal or Container (if container which image + tag): None
Relevant Files
if (mRunMode == 1)
{
spdlog::info(“setFp16Mode”);
if (!mBuilder->platformHasFastFp16()) {
spdlog::warn(“the platform do not has fast for fp16”);
}
mBuilder->setFp16Mode(true);
mConfig->setFlag(nvinfer1::BuilderFlag::kFP16);
}
mBuilder->setMaxBatchSize(mBatchSize);
// set the maximum GPU temporary memory which the engine can use at execution time.
mConfig->setMaxWorkspaceSize(10 << 20);
spdlog::info(“fp16 support: {}”,mBuilder->platformHasFastFp16 ());
spdlog::info(“int8 support: {}”,mBuilder->platformHasFastInt8 ());
spdlog::info(“Max batchsize: {}”,mBuilder->getMaxBatchSize());
spdlog::info(“Max workspace size: {}”,mConfig->getMaxWorkspaceSize());
spdlog::info(“Number of DLA core: {}”,mBuilder->getNbDLACores());
spdlog::info(“Max DLA batchsize: {}”,mBuilder->getMaxDLABatchSize());
spdlog::info(“Current use DLA core: {}”,mConfig->getDLACore()); // TODO: set DLA core
spdlog::info(“build engine…”);
mEngine = mBuilder->buildEngineWithConfig(*mNetwork, *mConfig);
Thank you for your reply.
The size of my onnx model and pth model are 251Mb and I can’t upload my models to this.
Is there another way to share my onnx model and pth model?Google Cloud Drive or Baidu Netdisk?
Could you please try on latest TensorRT version 8.2 EA.
If you still face this issue, we recommend you to please share trtexec --verbose logs, issue repro script and onnx model via google drive to try from our end.
Thank you for your reply.
I have tried the latest TensorRT version 8.2EA.It still takes too much time(42mins) to build engine with onnx.
My trtexc below is modified on the basis of the sampleOnnxMNIST.cpp in the sample_onnx_mnist.sln. test_my_onnx.cpp (12.0 KB) class_timer.hpp (645 Bytes)
My logtxt: engine log.txt (3.2 KB)
My google drive of onnx model :
Currently, we don’t have a real good solution yet, but we can try using the TacticSources feature and disabling cudnn, cublas, and cublasLt. That should speed up the network building.
Also, we can speed up the build by setting the precision of each layer to FP16 and selecting kOBEY_PRECISION, this will disable FP32 layers, but it will fail if there are no FP16 implementations.
Another thing we can do is use the global timing cache, so it will only be slow the first time they build on a per-release basis, but will be faster for each subsequent build.
Also in future release TRT has improvement in build speeds, we suggest moving to new version when it is released.
I got same problem. The situation is hard to analyze.
I built engine from using tensorrt api on RTX 3060 → 5 to 10 mins but on RTX 3080 took over 30 mins.
I try to find the difference in hardware as CPU model but cannt find it out.