Help with TensorRT errors when building an engine

loryruta · February 21, 2025, 12:38am

Description

Hello,
I’m parsing a onnx model, and building the network using the TensorRT C++ API. When building, I’m getting these errors that doesn’t tell much to me and I was wondering if anyone could help. Thanks,

[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[WARN ] [] TensorRT warning:  (foreignNode) [l2tc] - VALIDATE FAIL - Graph contains symbolic shape, l2tc doesn't take effect
[ERROR] [] TensorRT error: Error Code: 9: Skipping tactic 0x00000000000003e9 due to exception Assertion g.nodes.size() == 0 failed.
[ERROR] [] TensorRT error: Error Code: 9: Skipping tactic 0x00000000000003e9 due to exception Assertion g.nodes.size() == 0 failed.
[ERROR] [] TensorRT error: Error Code: 9: Skipping tactic 0x00000000000003e9 due to exception Assertion g.nodes.size() == 0 failed.
[INFO ] [] TensorRT engine written to: model.engine

Please also note that after building the engine saves, the plan is indeed not null and the engine file is ~20MB.

I’ve also tried inference on the model and I’m getting:

[ERROR] [] TensorRT error: IExecutionContext::enqueueV3: Error Code 1: Cask (Cask Pooling Runner Execute Failure)

Environment

TensorRT Version: 10.8
GPU Type: RTX 2060 SUPER
Nvidia Driver Version: 560.35.05
CUDA Version: 12.6
CUDNN Version:
Operating System + Version: Ubuntu 24.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

loryruta · February 21, 2025, 10:36pm

By not setting the TilingOptimizationLevel in the build configuration, I was able to avoid the first errors:

[ERROR] [] TensorRT error: Error Code: 9: Skipping tactic 0x00000000000003e9 due to exception Assertion g.nodes.size() == 0 failed.
[ERROR] [] TensorRT error: Error Code: 9: Skipping tactic 0x00000000000003e9 due to exception Assertion g.nodes.size() == 0 failed.
[ERROR] [] TensorRT error: Error Code: 9: Skipping tactic 0x00000000000003e9 due to exception Assertion g.nodes.size() == 0 failed.

I had set it to MODERATE, I guess now the builder is able to choose tactics working with my model.

I have been left with this error that I have no idea how to debug:

[ERROR] [] TensorRT error: IExecutionContext::enqueueV3: Error Code 1: Cask (Cask Pooling Runner Execute Failure)

The suspicious thing is that:

I have a Python script doing inference with the same model, where this error doesn’t appear and the model gives correct result. Tensors are allocated with PyTorch and inference is done with context.execute_v2
Using the TensorRT C++ API fails with that error. I/O tensors are allocated with cudaMalloc and yes, they are of the correct size (or big enough) (checked too many times). I’m using enqueueV3 (but I’ve tried executeV2 as well; same error)
trtexec shows the error; which hints me it’s not an issue of the API usage:

trtexec --loadEngine=model.engine --shapes=im0:1x3x888x1280,im1:1x3x888x1280 --verbose

The verbose output is the same for all 3:

Python (OK):

TensorRT version: 10.8.0.43
[02/21/2025-23:22:34] [TRT] [I] Loaded engine size: 21 MiB
[02/21/2025-23:22:34] [TRT] [V] Deserialization required 43240 microseconds.
[02/21/2025-23:22:34] [TRT] [V] Total per-runner device persistent memory is 146201600
[02/21/2025-23:22:34] [TRT] [V] Total per-runner host persistent memory is 1300272
[02/21/2025-23:22:34] [TRT] [V] Allocated device scratch memory of size 2107987968
[02/21/2025-23:22:34] [TRT] [V] - Runner scratch: 2107987968 bytes
[02/21/2025-23:22:35] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +2, GPU +2150, now: CPU 2, GPU 2302 (MiB)
[02/21/2025-23:22:35] [TRT] [V] CUDA lazy loading is enabled.

C++:

[INFO ] [] TensorRT version: 10.8.0.43
[INFO ] [] Loading engine file: model.engine
[INFO ] [] TensorRT: Loaded engine size: 21 MiB
[DEBUG] [] TensorRT: Deserialization required 43430 microseconds.
[DEBUG] [] TensorRT: Total per-runner device persistent memory is 146201600
[DEBUG] [] TensorRT: Total per-runner host persistent memory is 1274672
[DEBUG] [] TensorRT: Allocated device scratch memory of size 2107987968
[DEBUG] [] TensorRT: - Runner scratch: 2107987968 bytes
[INFO ] [] TensorRT: [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +2, GPU +2150, now: CPU 2, GPU 2302 (MiB)
[DEBUG] [] TensorRT: CUDA lazy loading is enabled.
[ERROR] [] TensorRT: IExecutionContext::executeV2: Error Code 1: Cask (Cask Pooling Runner Execute Failure)

trtexec:

[02/21/2025-23:24:11] [I] TensorRT version: 10.8.0
[02/21/2025-23:24:11] [I] [TRT] Loaded engine size: 21 MiB
[02/21/2025-23:24:11] [V] [TRT] Deserialization required 43383 microseconds.
[02/21/2025-23:24:11] [I] Engine deserialized in 0.0744463 sec.
[02/21/2025-23:24:11] [V] [TRT] Total per-runner device persistent memory is 146201600
[02/21/2025-23:24:11] [V] [TRT] Total per-runner host persistent memory is 1300272
[02/21/2025-23:24:11] [V] [TRT] Allocated device scratch memory of size 2107987968
[02/21/2025-23:24:11] [V] [TRT] - Runner scratch: 2107987968 bytes
[02/21/2025-23:24:11] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +2, GPU +2150, now: CPU 2, GPU 2302 (MiB)
[02/21/2025-23:24:11] [V] [TRT] CUDA lazy loading is enabled.
[02/21/2025-23:24:11] [I] Setting persistentCacheLimit to 0 bytes.
[02/21/2025-23:24:11] [I] Set shape of input tensor im0 to: 1x3x888x1280
[02/21/2025-23:24:11] [I] Set shape of input tensor im1 to: 1x3x888x1280
[02/21/2025-23:24:11] [I] Created execution context with device memory size: 2010.33 MiB
[02/21/2025-23:24:11] [I] Using random values for input im0
[02/21/2025-23:24:12] [I] Input binding for im0 with dimensions 1x3x888x1280 is created.
[02/21/2025-23:24:12] [I] Using random values for input im1
[02/21/2025-23:24:12] [I] Input binding for im1 with dimensions 1x3x888x1280 is created.
[02/21/2025-23:24:12] [I] Output binding for disparity_map with dimensions 1x1x888x1280 is created.
[02/21/2025-23:24:12] [I] Starting inference
[02/21/2025-23:24:12] [E] Error[1]: IExecutionContext::enqueueV3: Error Code 1: Cask (Cask Pooling Runner Execute Failure)
[02/21/2025-23:24:12] [E] Error occurred during inference

Notice the TensorRT version is the same for all 3.
What is the Python API doing differently from the C++?

loryruta · February 22, 2025, 6:18pm

I have found the issue:

my specific model required input to be 32-padded.

In Python I was passing correctly sized input, while in C++/trtexec not (1280 OK; 888 NOT).

Closing

system · March 8, 2025, 6:18pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[TensorRT] OutOfMemory Error when building engine from ONNX model TensorRT tensorrt	6	3896	January 2, 2024
Tensorrt fails shapeMachine.cpp TensorRT tensorrt , cudnn	2	419	February 16, 2024
Trtexec Internal Error (Symbolic relation a.z >= 0 is always false. ) TensorRT tensorrt	6	1046	December 5, 2022
ONNX model and TensorRT engine works differently TensorRT	5	755	February 20, 2023
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	976	September 29, 2022
TensorRT does not see all GPU memory TensorRT	1	1024	November 18, 2022
TensorRT С++ optimization profile TensorRT tensorrt , opencv , cuda	29	3107	September 9, 2021
Trtexec multi-source (streams) and multi-batch performance test failed TensorRT	5	1012	August 11, 2023
Cannot serialize ONNX model on TensorRT 8 TensorRT	3	1459	May 26, 2021
Tensorrt inference with batch > 1 TensorRT	4	1393	October 13, 2022

Help with TensorRT errors when building an engine

Description

Environment

Relevant Files

Steps To Reproduce

Related topics