Help with TensorRT errors when building an engine

Description

Hello,
I’m parsing a onnx model, and building the network using the TensorRT C++ API. When building, I’m getting these errors that doesn’t tell much to me and I was wondering if anyone could help. Thanks,

[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[ERROR] [] TensorRT error: Error Code: 9: Skipping tactic 0x00000000000003e9 due to exception Assertion g.nodes.size() == 0 failed.
[ERROR] [] TensorRT error: Error Code: 9: Skipping tactic 0x00000000000003e9 due to exception Assertion g.nodes.size() == 0 failed.
[ERROR] [] TensorRT error: Error Code: 9: Skipping tactic 0x00000000000003e9 due to exception Assertion g.nodes.size() == 0 failed.
[INFO ] [] TensorRT engine written to: model.engine

Please also note that after building the engine saves, the plan is indeed not null and the engine file is ~20MB.

I’ve also tried inference on the model and I’m getting:

[ERROR] [] TensorRT error: IExecutionContext::enqueueV3: Error Code 1: Cask (Cask Pooling Runner Execute Failure)

Environment

TensorRT Version: 10.8
GPU Type: RTX 2060 SUPER
Nvidia Driver Version: 560.35.05
CUDA Version: 12.6
CUDNN Version:
Operating System + Version: Ubuntu 24.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

By not setting the TilingOptimizationLevel in the build configuration, I was able to avoid the first errors:

[ERROR] [] TensorRT error: Error Code: 9: Skipping tactic 0x00000000000003e9 due to exception Assertion g.nodes.size() == 0 failed.
[ERROR] [] TensorRT error: Error Code: 9: Skipping tactic 0x00000000000003e9 due to exception Assertion g.nodes.size() == 0 failed.
[ERROR] [] TensorRT error: Error Code: 9: Skipping tactic 0x00000000000003e9 due to exception Assertion g.nodes.size() == 0 failed.

I had set it to MODERATE, I guess now the builder is able to choose tactics working with my model.


I have been left with this error that I have no idea how to debug:

[ERROR] [] TensorRT error: IExecutionContext::enqueueV3: Error Code 1: Cask (Cask Pooling Runner Execute Failure)

The suspicious thing is that:

  1. I have a Python script doing inference with the same model, where this error doesn’t appear and the model gives correct result. Tensors are allocated with PyTorch and inference is done with context.execute_v2

  2. Using the TensorRT C++ API fails with that error. I/O tensors are allocated with cudaMalloc and yes, they are of the correct size (or big enough) (checked too many times). I’m using enqueueV3 (but I’ve tried executeV2 as well; same error)

  3. trtexec shows the error; which hints me it’s not an issue of the API usage:

trtexec --loadEngine=model.engine --shapes=im0:1x3x888x1280,im1:1x3x888x1280 --verbose 

The verbose output is the same for all 3:

  1. Python (OK):
TensorRT version: 10.8.0.43
[02/21/2025-23:22:34] [TRT] [I] Loaded engine size: 21 MiB
[02/21/2025-23:22:34] [TRT] [V] Deserialization required 43240 microseconds.
[02/21/2025-23:22:34] [TRT] [V] Total per-runner device persistent memory is 146201600
[02/21/2025-23:22:34] [TRT] [V] Total per-runner host persistent memory is 1300272
[02/21/2025-23:22:34] [TRT] [V] Allocated device scratch memory of size 2107987968
[02/21/2025-23:22:34] [TRT] [V] - Runner scratch: 2107987968 bytes
[02/21/2025-23:22:35] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +2, GPU +2150, now: CPU 2, GPU 2302 (MiB)
[02/21/2025-23:22:35] [TRT] [V] CUDA lazy loading is enabled.
  1. C++:
[INFO ] [] TensorRT version: 10.8.0.43
[INFO ] [] Loading engine file: model.engine
[INFO ] [] TensorRT: Loaded engine size: 21 MiB
[DEBUG] [] TensorRT: Deserialization required 43430 microseconds.
[DEBUG] [] TensorRT: Total per-runner device persistent memory is 146201600
[DEBUG] [] TensorRT: Total per-runner host persistent memory is 1274672
[DEBUG] [] TensorRT: Allocated device scratch memory of size 2107987968
[DEBUG] [] TensorRT: - Runner scratch: 2107987968 bytes
[INFO ] [] TensorRT: [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +2, GPU +2150, now: CPU 2, GPU 2302 (MiB)
[DEBUG] [] TensorRT: CUDA lazy loading is enabled.
[ERROR] [] TensorRT: IExecutionContext::executeV2: Error Code 1: Cask (Cask Pooling Runner Execute Failure)
  1. trtexec:
[02/21/2025-23:24:11] [I] TensorRT version: 10.8.0
[02/21/2025-23:24:11] [I] [TRT] Loaded engine size: 21 MiB
[02/21/2025-23:24:11] [V] [TRT] Deserialization required 43383 microseconds.
[02/21/2025-23:24:11] [I] Engine deserialized in 0.0744463 sec.
[02/21/2025-23:24:11] [V] [TRT] Total per-runner device persistent memory is 146201600
[02/21/2025-23:24:11] [V] [TRT] Total per-runner host persistent memory is 1300272
[02/21/2025-23:24:11] [V] [TRT] Allocated device scratch memory of size 2107987968
[02/21/2025-23:24:11] [V] [TRT] - Runner scratch: 2107987968 bytes
[02/21/2025-23:24:11] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +2, GPU +2150, now: CPU 2, GPU 2302 (MiB)
[02/21/2025-23:24:11] [V] [TRT] CUDA lazy loading is enabled.
[02/21/2025-23:24:11] [I] Setting persistentCacheLimit to 0 bytes.
[02/21/2025-23:24:11] [I] Set shape of input tensor im0 to: 1x3x888x1280
[02/21/2025-23:24:11] [I] Set shape of input tensor im1 to: 1x3x888x1280
[02/21/2025-23:24:11] [I] Created execution context with device memory size: 2010.33 MiB
[02/21/2025-23:24:11] [I] Using random values for input im0
[02/21/2025-23:24:12] [I] Input binding for im0 with dimensions 1x3x888x1280 is created.
[02/21/2025-23:24:12] [I] Using random values for input im1
[02/21/2025-23:24:12] [I] Input binding for im1 with dimensions 1x3x888x1280 is created.
[02/21/2025-23:24:12] [I] Output binding for disparity_map with dimensions 1x1x888x1280 is created.
[02/21/2025-23:24:12] [I] Starting inference
[02/21/2025-23:24:12] [E] Error[1]: IExecutionContext::enqueueV3: Error Code 1: Cask (Cask Pooling Runner Execute Failure)
[02/21/2025-23:24:12] [E] Error occurred during inference

Notice the TensorRT version is the same for all 3.
What is the Python API doing differently from the C++?

I have found the issue:

my specific model required input to be 32-padded.

In Python I was passing correctly sized input, while in C++/trtexec not (1280 OK; 888 NOT).

Closing

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.