Description
I am currently researching ways for speeding up building TensorRT’s CUDA engines in the runtime from ONNX models using the C++ API. That is, I am loading the ONNX file, building a CUDA engine from it and then using the latter for inference.
For some specific reasons it is not possible for my application neither to prebuild engines before running the application nor to store those on a filesystem for a future usage. Thus, I am looking for ways to improve CUDA engine building speed. I have explored all the available API options, and this exploration resulted in the Timing Cache being the best option for me.
Here is the rough explanation of the approach I came up with:
- Obtain several Timing Caches using several GPUs of architectures / CCs I want my application to support.
- Combine those caches into a single one.
- Provide the combined cache to a builder while building a CUDA engine from the ONNX file.
- Build the CUDA engine fast, because the provided Timing Cache already contains needed profiling metrics.
Unfortunately, the last step fails in some specific environments with the following error message:
[TensorRT] 2: [caskBuilderUtils.cpp::getCaskHandle::408] Error Code 2: Internal Error (Assertion !isGroupConv || (findConvShaderByHandle(klib, caskHandle) || cask_trt::availableLinkableConvShaders()->findByHandle(caskHandle)) failed. )
[TensorRT] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
I suspect that the builder is picking some tactic from the provided cache, but the tactic is not available on the current GPU device. Assuming this guess is right, it does not seem to be an inevitable error since the incompatible tactic could be skipped (I tried setting the ignoreMismatch
flag with no luck). So the feature request is essentially to make the build skip the cache entry and proceed instead of throwing a hard error.
P.S. I am aware that timing cache is not recommended to be reused across devices, but my benchmarks shows that it’s harmless.
Environment
TensorRT Version:
NVIDIA GPU: 8.4.1
NVIDIA Driver Version:
CUDA Version: Any
CUDNN Version: Any
Operating System: Any