TensorRT 10.8 on Windows: API Usage Error (Target GPU SM 120 is not supported by this TensorRT release.)

ElainaKawaii · February 12, 2025, 2:44am

Description

TensorRT (trtexec) can not convert onnx to trt engine on Windows, NVIDIA 5080

Environment

TensorRT Version: 10.8.0.43.Windows.win10.cuda-11.8
GPU Type: “NVIDIA GeForce RTX 5080” UUID: GPU-13d9b629-2176-5b05-416a-aae1ae0d41ee
Nvidia Driver Version: 572.16
CUDA Version: 12.8
CUDNN Version: /
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

https://github.com/NevermindNilas/TAS-Modes-Host/releases/download/main/2x_AniScale2S_Compact_i8_60K-fp32.onnx

Steps To Reproduce

installed TensorRT following https://docs.nvidia.com/deeplearning/tensorrt/latest/installing-tensorrt/installing.html#zip-file-installation

trtexec.exe --onnx=2x_AniScale2S_Compact_i8_60K-fp32.onnx --saveEngine=2x_AniScale2S_Compact_i8_60K-fp32.1920x1088_workspace1024_device0_10.8.0.43.engine --shapes=input:1x3x1088x1920 --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw

Error Output:

C:\Users\CCDYT\Downloads\TensorRT-10.8.0.43.Windows.win10.cuda-11.8\TensorRT-10.8.0.43>trtexec.exe --onnx="C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx" --memPoolSize=workspace:1024MiB --timingCacheFile="2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache" --device=0 --saveEngine="E:\Steam\steamapps\common\SVFI\models\sr\TensorRT\models\2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.engine" --shapes=input:1x3x1088x1920 --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --tacticSources=-CUBLAS,-CUBLAS_LT
&&&& RUNNING TensorRT.trtexec [TensorRT v100800] [b43] # trtexec.exe --onnx=C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx --memPoolSize=workspace:1024MiB --timingCacheFile=2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache --device=0 --saveEngine=E:\Steam\steamapps\common\SVFI\models\sr\TensorRT\models\2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.engine --shapes=input:1x3x1088x1920 --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --tacticSources=-CUBLAS,-CUBLAS_LT
[02/12/2025-10:42:51] [I] === Model Options ===
[02/12/2025-10:42:51] [I] Format: ONNX
[02/12/2025-10:42:51] [I] Model: C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx
[02/12/2025-10:42:51] [I] Output:
[02/12/2025-10:42:51] [I] === Build Options ===
[02/12/2025-10:42:51] [I] Memory Pools: workspace: 0.000976562 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[02/12/2025-10:42:51] [I] avgTiming: 8
[02/12/2025-10:42:51] [I] Precision: FP32+FP16
[02/12/2025-10:42:51] [I] LayerPrecisions:
[02/12/2025-10:42:51] [I] Layer Device Types:
[02/12/2025-10:42:51] [I] Calibration:
[02/12/2025-10:42:51] [I] Refit: Disabled
[02/12/2025-10:42:51] [I] Strip weights: Disabled
[02/12/2025-10:42:51] [I] Version Compatible: Disabled
[02/12/2025-10:42:51] [I] ONNX Plugin InstanceNorm: Disabled
[02/12/2025-10:42:51] [I] TensorRT runtime: full
[02/12/2025-10:42:51] [I] Lean DLL Path:
[02/12/2025-10:42:51] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[02/12/2025-10:42:51] [I] Exclude Lean Runtime: Disabled
[02/12/2025-10:42:51] [I] Sparsity: Disabled
[02/12/2025-10:42:51] [I] Safe mode: Disabled
[02/12/2025-10:42:51] [I] Build DLA standalone loadable: Disabled
[02/12/2025-10:42:51] [I] Allow GPU fallback for DLA: Disabled
[02/12/2025-10:42:51] [I] DirectIO mode: Disabled
[02/12/2025-10:42:51] [I] Restricted mode: Disabled
[02/12/2025-10:42:51] [I] Skip inference: Disabled
[02/12/2025-10:42:51] [I] Save engine: E:\Steam\steamapps\common\SVFI\models\sr\TensorRT\models\2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.engine
[02/12/2025-10:42:51] [I] Load engine:
[02/12/2025-10:42:51] [I] Profiling verbosity: 0
[02/12/2025-10:42:51] [I] Tactic sources: cublas [OFF], cublasLt [OFF],
[02/12/2025-10:42:51] [I] timingCacheMode: global
[02/12/2025-10:42:51] [I] timingCacheFile: 2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache
[02/12/2025-10:42:51] [I] Enable Compilation Cache: Enabled
[02/12/2025-10:42:51] [I] Enable Monitor Memory: Disabled
[02/12/2025-10:42:51] [I] errorOnTimingCacheMiss: Disabled
[02/12/2025-10:42:51] [I] Preview Features: Use default preview flags.
[02/12/2025-10:42:51] [I] MaxAuxStreams: -1
[02/12/2025-10:42:51] [I] BuilderOptimizationLevel: -1
[02/12/2025-10:42:51] [I] MaxTactics: -1
[02/12/2025-10:42:51] [I] Calibration Profile Index: 0
[02/12/2025-10:42:51] [I] Weight Streaming: Disabled
[02/12/2025-10:42:51] [I] Runtime Platform: Same As Build
[02/12/2025-10:42:51] [I] Debug Tensors:
[02/12/2025-10:42:51] [I] Input(s): fp16:chw
[02/12/2025-10:42:51] [I] Output(s): fp16:chw
[02/12/2025-10:42:51] [I] Input build shape (profile 0): input=1x3x1088x1920+1x3x1088x1920+1x3x1088x1920
[02/12/2025-10:42:51] [I] Input calibration shapes: model
[02/12/2025-10:42:51] [I] === System Options ===
[02/12/2025-10:42:51] [I] Device: 0
[02/12/2025-10:42:51] [I] DLACore:
[02/12/2025-10:42:51] [I] Plugins:
[02/12/2025-10:42:51] [I] setPluginsToSerialize:
[02/12/2025-10:42:51] [I] dynamicPlugins:
[02/12/2025-10:42:51] [I] ignoreParsedPluginLibs: 0
[02/12/2025-10:42:51] [I]
[02/12/2025-10:42:51] [I] === Inference Options ===
[02/12/2025-10:42:51] [I] Batch: Explicit
[02/12/2025-10:42:51] [I] Input inference shape : input=1x3x1088x1920
[02/12/2025-10:42:51] [I] Iterations: 10
[02/12/2025-10:42:51] [I] Duration: 3s (+ 200ms warm up)
[02/12/2025-10:42:51] [I] Sleep time: 0ms
[02/12/2025-10:42:51] [I] Idle time: 0ms
[02/12/2025-10:42:51] [I] Inference Streams: 1
[02/12/2025-10:42:51] [I] ExposeDMA: Disabled
[02/12/2025-10:42:51] [I] Data transfers: Enabled
[02/12/2025-10:42:51] [I] Spin-wait: Disabled
[02/12/2025-10:42:51] [I] Multithreading: Disabled
[02/12/2025-10:42:51] [I] CUDA Graph: Disabled
[02/12/2025-10:42:51] [I] Separate profiling: Disabled
[02/12/2025-10:42:51] [I] Time Deserialize: Disabled
[02/12/2025-10:42:51] [I] Time Refit: Disabled
[02/12/2025-10:42:51] [I] NVTX verbosity: 0
[02/12/2025-10:42:51] [I] Persistent Cache Ratio: 0
[02/12/2025-10:42:51] [I] Optimization Profile Index: 0
[02/12/2025-10:42:51] [I] Weight Streaming Budget: 100.000000%
[02/12/2025-10:42:51] [I] Inputs:
[02/12/2025-10:42:51] [I] Debug Tensor Save Destinations:
[02/12/2025-10:42:51] [I] === Reporting Options ===
[02/12/2025-10:42:51] [I] Verbose: Disabled
[02/12/2025-10:42:51] [I] Averages: 10 inferences
[02/12/2025-10:42:51] [I] Percentiles: 90,95,99
[02/12/2025-10:42:51] [I] Dump refittable layers:Disabled
[02/12/2025-10:42:51] [I] Dump output: Disabled
[02/12/2025-10:42:51] [I] Profile: Disabled
[02/12/2025-10:42:51] [I] Export timing to JSON file:
[02/12/2025-10:42:51] [I] Export output to JSON file:
[02/12/2025-10:42:51] [I] Export profile to JSON file:
[02/12/2025-10:42:51] [I]
[02/12/2025-10:42:51] [I] === Device Information ===
[02/12/2025-10:42:51] [I] Available Devices:
[02/12/2025-10:42:51] [I]   Device 0: "NVIDIA GeForce RTX 5080" UUID: GPU-13d9b629-2176-5b05-416a-aae1ae0d41ee
[02/12/2025-10:42:51] [I] Selected Device: NVIDIA GeForce RTX 5080
[02/12/2025-10:42:51] [I] Selected Device ID: 0
[02/12/2025-10:42:51] [I] Selected Device UUID: GPU-13d9b629-2176-5b05-416a-aae1ae0d41ee
[02/12/2025-10:42:51] [I] Compute Capability: 12.0
[02/12/2025-10:42:51] [I] SMs: 84
[02/12/2025-10:42:51] [I] Device Global Memory: 16302 MiB
[02/12/2025-10:42:51] [I] Shared Memory per SM: 100 KiB
[02/12/2025-10:42:51] [I] Memory Bus Width: 256 bits (ECC disabled)
[02/12/2025-10:42:51] [I] Application Compute Clock Rate: 2.64 GHz
[02/12/2025-10:42:51] [I] Application Memory Clock Rate: 15.001 GHz
[02/12/2025-10:42:51] [I]
[02/12/2025-10:42:51] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[02/12/2025-10:42:51] [I]
[02/12/2025-10:42:51] [I] TensorRT version: 10.8.0
[02/12/2025-10:42:51] [I] Loading standard plugins
[02/12/2025-10:42:51] [I] [TRT] [MemUsageChange] Init CUDA: CPU +83, GPU +0, now: CPU 12560, GPU 1435 (MiB)
[02/12/2025-10:42:51] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +16, GPU +0, now: CPU 12917, GPU 1435 (MiB)
[02/12/2025-10:42:51] [I] Start parsing network model.
[02/12/2025-10:42:51] [I] [TRT] ----------------------------------------------------------------
[02/12/2025-10:42:51] [I] [TRT] Input filename:   C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx
[02/12/2025-10:42:51] [I] [TRT] ONNX IR version:  0.0.8
[02/12/2025-10:42:51] [I] [TRT] Opset version:    17
[02/12/2025-10:42:51] [I] [TRT] Producer name:    pytorch
[02/12/2025-10:42:51] [I] [TRT] Producer version: 2.1.2
[02/12/2025-10:42:51] [I] [TRT] Domain:
[02/12/2025-10:42:51] [I] [TRT] Model version:    0
[02/12/2025-10:42:51] [I] [TRT] Doc string:
[02/12/2025-10:42:51] [I] [TRT] ----------------------------------------------------------------
[02/12/2025-10:42:51] [I] Finished parsing network model. Parse time: 0.0122676
[02/12/2025-10:42:51] [I] Set shape of input tensor input for optimization profile 0 to: MIN=1x3x1088x1920 OPT=1x3x1088x1920 MAX=1x3x1088x1920
[02/12/2025-10:42:51] [W] [TRT] Could not read timing cache from: 2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache. A new timing cache will be generated and written.
[02/12/2025-10:42:51] [E] Error[9]: IBuilder::buildSerializedNetwork: Error Code 9: API Usage Error (Target GPU SM 120 is not supported by this TensorRT release.)
[02/12/2025-10:42:51] [E] Engine could not be created from network
[02/12/2025-10:42:51] [E] Building engine failed
[02/12/2025-10:42:51] [E] Failed to create engine from model or file.
[02/12/2025-10:42:51] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100800] [b43] # trtexec.exe --onnx=C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx --memPoolSize=workspace:1024MiB --timingCacheFile=2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache --device=0 --saveEngine=E:\Steam\steamapps\common\SVFI\models\sr\TensorRT\models\2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.engine --shapes=input:1x3x1088x1920 --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --tacticSources=-CUBLAS,-CUBLAS_LT

ElainaKawaii · February 13, 2025, 6:20am

Using CUDA 12.0~12.8 solves this issue

Akiy071 · March 27, 2025, 12:58am

Same issue.But it doesn’t work on me.

Akiy071 · March 27, 2025, 4:20am

I solved this bug by copying the files in the corresponding directory of TensorRT to the corresponding directory of Cuda。
There are three folders in total：( lib\include\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\TensorRT-10.9.0.34\bin copy to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin

aby.anboyuan · July 20, 2025, 2:59am

Maybe you can try upgrading TensorRT as well
I fixed the issue by upgrading both CUDA and TensorRT.

Topic		Replies	Views
Tensorrt SM: 0x809 error TensorRT tensorrt , cuda , deep-learning	5	2219	September 13, 2023
TensorRT 8.2 convert art model for onnx failed TensorRT cudnn	1	85	November 30, 2024
Cannot serialize ONNX model on TensorRT 8 TensorRT	3	1502	May 26, 2021
Convet onnx to trt engine got error TensorRT	3	1277	January 7, 2022
Assertion failed"only supports input K as an initializer" TensorRT pytorch , onnx	1	1093	September 9, 2021
Could not parse ONNX model (2) TensorRT cudnn	0	430	February 5, 2024
Failed converting ONNX model to TensorRT model TensorRT	3	2194	June 13, 2022
Problem converting TensorFlow 2-> ONNX model to TensorRT Engine (efficientdet_d0) TensorRT	8	1529	November 17, 2022
Parseq tensorrt conversion takes for ever to complete TensorRT cudnn	1	100	August 30, 2024
TensorRT 8 convert UNET ERROR TensorRT	5	2000	October 12, 2021

TensorRT 10.8 on Windows: API Usage Error (Target GPU SM 120 is not supported by this TensorRT release.)

Description

Environment

Relevant Files

Steps To Reproduce

Related topics