[Feature request] Make using incompatible timing caches for building CUDA engines not a hard error

Description

I am currently researching ways for speeding up building TensorRT’s CUDA engines in the runtime from ONNX models using the C++ API. That is, I am loading the ONNX file, building a CUDA engine from it and then using the latter for inference.
For some specific reasons it is not possible for my application neither to prebuild engines before running the application nor to store those on a filesystem for a future usage. Thus, I am looking for ways to improve CUDA engine building speed. I have explored all the available API options, and this exploration resulted in the Timing Cache being the best option for me.
Here is the rough explanation of the approach I came up with:

  1. Obtain several Timing Caches using several GPUs of architectures / CCs I want my application to support.
  2. Combine those caches into a single one.
  3. Provide the combined cache to a builder while building a CUDA engine from the ONNX file.
  4. Build the CUDA engine fast, because the provided Timing Cache already contains needed profiling metrics.

Unfortunately, the last step fails in some specific environments with the following error message:

[TensorRT] 2: [caskBuilderUtils.cpp::getCaskHandle::408] Error Code 2: Internal Error (Assertion !isGroupConv || (findConvShaderByHandle(klib, caskHandle) || cask_trt::availableLinkableConvShaders()->findByHandle(caskHandle)) failed. )
[TensorRT] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

I suspect that the builder is picking some tactic from the provided cache, but the tactic is not available on the current GPU device. Assuming this guess is right, it does not seem to be an inevitable error since the incompatible tactic could be skipped (I tried setting the ignoreMismatch flag with no luck). So the feature request is essentially to make the build skip the cache entry and proceed instead of throwing a hard error.

P.S. I am aware that timing cache is not recommended to be reused across devices, but my benchmarks shows that it’s harmless.

Environment

TensorRT Version:
NVIDIA GPU: 8.4.1
NVIDIA Driver Version:
CUDA Version: Any
CUDNN Version: Any
Operating System: Any

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hi,

Looks like you’ve already created a git issue [Feature request] Make using incompatible timing caches for building CUDA engines not a hard error · Issue #2144 · NVIDIA/TensorRT · GitHub, please let our team get an update.

Thank you.

Hi,

Thank you. I am not sure I understand your question. There are no updates since it’s not a support request, but a feature request.

spolisetty found your github reqiest - and will be asking the team to respond via github for your feature request.
Thanks for raising it here too - so others can see it.
I shall be making this thread solved since its moved to Github - but you you can reverse that if you want.
Thanks

Hi,

Thank you! I marked your response as a ‘Solution’.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.