TensorRT for RTX on RTX 2060: Myelin/cuDNN-Graph errors, but works on RTX 3080Ti

Hello,

Our workflow spans three GPUs:
Training: RTX 3080 Ti
Integration/Development: RTX 2060
Deployment: RTX 5060

On the 3080 Ti (Ampere) and 5060 (Blackwell), TensorRT for RTX 1.1 (with CUDA 12.9, driver 576.02) builds and runs the ONNX → TensorRT engine successfully.

However, on the RTX 2060 (Turing, builds engines locally), the same code and inference fail at:
’nvinfer1::IExecutionContext *context = engine->createExecutionContext();’
with errors like:
Internal Error: MyelinCheckException: cudnn_graph_utils.h:379: CHECK(false) failed. cuDNN graph compilation failed.
[TensorRT] ICudaEngine::createExecutionContext: Error Code 1: Myelin ([myelin_graph.h:1168: attachExceptionMsgToGraph] MyelinCheckException: cudnn_graph_utils.h:379: CHECK(false) failed. cuDNN graph compilation failed. In nvinfer1::rt::MyelinGraphContext::MyelinGraphContext at runtime/myelin/graphContext.cpp:68)
.

Even if I make the following if-else when building the engine:
if (major >= 8)
{
config->setFlag(BuilderFlag::kGPU_FALLBACK);
}
else
{
config->setTacticSources(
(1U << static_cast(TacticSource::kCUBLAS)) |
(1U << static_cast(TacticSource::kCUDNN))
);
}

Environment
GPU: RTX 2060 / RTX 3080 Ti / RTX 5060
Driver: 576.02 / 581.29
CUDA: 12.9
cuDNN: 12X
TensorRT: RTX 1.1
OS: Windows 10 x64

So, the questions are:

  1. Is TensorRT for RTX 1.1 not fully compatible with Turing-based GPUs (such as RTX 2060)?
  2. Are there recommended workarounds (e.g., disabling Myelin/cuDNN-Graph, generating engines directly on Turing, and so on…)
  3. Does TensorRT 10.8+ support GPUs from 1650 (Turing) to 5060 (Blackwell) (only static graph)?
  1. TensorRT-RTX is compatible for Turing GPUs, however the support surface is not identical to Ampere and newer GPUs.
  2. Tactic sources are not supported in TRT-RTX. When building for Turing, specify only SM75 as the target: CPU-Only AOT and TensorRT-RTX Engines — NVIDIA TensorRT for RTX Documentation . If you are still facing issues, please share the model so we can take a look.
  3. TensorRT 10.8 should have support for all Turing+ GPUs as well.

Got it, thank you for clarifying.

I have a follow-up question:

When using TensorRT for RTX 1.1, the step
nvinfer1::IExecutionContext *context = engine->createExecutionContext();
takes significantly longer compared to TensorRT 8.x.
· With TensorRT 8.x, context creation takes about 330 ms.
· With TensorRT for RTX 1.1, it takes about 5200 ms.

Even if I create multiple contexts sequentially, the second attempt drops to ~2800 ms, and the third is similar — so it seems some partial caching happens, but not fully effective.

Additionally, inference still requires a warmup step: the first inference takes 40–50 ms, while subsequent inferences stabilize at ~4 ms.

So my questions are:

  1. Is this longer createExecutionContext() time expected with TensorRT for RTX 1.1?
  2. Is there a way to cache or reuse initialization results across init (like read the cache file directly)?
  3. Is the warmup cost expected, or is there a recommended method to pre-warm or reduce this overhead?

@cuihanhuan based on your question in the other thread, your are probably already familiar with Runtime Cache: Working with Runtime Cache — NVIDIA TensorRT for RTX Documentation . We can continue to discuss on the other thread.