TensorRT 10.8 on Windows: API Usage Error (Target GPU SM 120 is not supported by this TensorRT release.)

Description

TensorRT (trtexec) can not convert onnx to trt engine on Windows, NVIDIA 5080

Environment

TensorRT Version: 10.8.0.43.Windows.win10.cuda-11.8
GPU Type: “NVIDIA GeForce RTX 5080” UUID: GPU-13d9b629-2176-5b05-416a-aae1ae0d41ee
Nvidia Driver Version: 572.16
CUDA Version: 12.8
CUDNN Version: /
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

https://github.com/NevermindNilas/TAS-Modes-Host/releases/download/main/2x_AniScale2S_Compact_i8_60K-fp32.onnx

Steps To Reproduce

  • installed TensorRT following https://docs.nvidia.com/deeplearning/tensorrt/latest/installing-tensorrt/installing.html#zip-file-installation

trtexec.exe --onnx=2x_AniScale2S_Compact_i8_60K-fp32.onnx --saveEngine=2x_AniScale2S_Compact_i8_60K-fp32.1920x1088_workspace1024_device0_10.8.0.43.engine --shapes=input:1x3x1088x1920 --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw

Error Output:

C:\Users\CCDYT\Downloads\TensorRT-10.8.0.43.Windows.win10.cuda-11.8\TensorRT-10.8.0.43>trtexec.exe --onnx="C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx" --memPoolSize=workspace:1024MiB --timingCacheFile="2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache" --device=0 --saveEngine="E:\Steam\steamapps\common\SVFI\models\sr\TensorRT\models\2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.engine" --shapes=input:1x3x1088x1920 --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --tacticSources=-CUBLAS,-CUBLAS_LT
&&&& RUNNING TensorRT.trtexec [TensorRT v100800] [b43] # trtexec.exe --onnx=C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx --memPoolSize=workspace:1024MiB --timingCacheFile=2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache --device=0 --saveEngine=E:\Steam\steamapps\common\SVFI\models\sr\TensorRT\models\2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.engine --shapes=input:1x3x1088x1920 --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --tacticSources=-CUBLAS,-CUBLAS_LT
[02/12/2025-10:42:51] [I] === Model Options ===
[02/12/2025-10:42:51] [I] Format: ONNX
[02/12/2025-10:42:51] [I] Model: C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx
[02/12/2025-10:42:51] [I] Output:
[02/12/2025-10:42:51] [I] === Build Options ===
[02/12/2025-10:42:51] [I] Memory Pools: workspace: 0.000976562 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[02/12/2025-10:42:51] [I] avgTiming: 8
[02/12/2025-10:42:51] [I] Precision: FP32+FP16
[02/12/2025-10:42:51] [I] LayerPrecisions:
[02/12/2025-10:42:51] [I] Layer Device Types:
[02/12/2025-10:42:51] [I] Calibration:
[02/12/2025-10:42:51] [I] Refit: Disabled
[02/12/2025-10:42:51] [I] Strip weights: Disabled
[02/12/2025-10:42:51] [I] Version Compatible: Disabled
[02/12/2025-10:42:51] [I] ONNX Plugin InstanceNorm: Disabled
[02/12/2025-10:42:51] [I] TensorRT runtime: full
[02/12/2025-10:42:51] [I] Lean DLL Path:
[02/12/2025-10:42:51] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[02/12/2025-10:42:51] [I] Exclude Lean Runtime: Disabled
[02/12/2025-10:42:51] [I] Sparsity: Disabled
[02/12/2025-10:42:51] [I] Safe mode: Disabled
[02/12/2025-10:42:51] [I] Build DLA standalone loadable: Disabled
[02/12/2025-10:42:51] [I] Allow GPU fallback for DLA: Disabled
[02/12/2025-10:42:51] [I] DirectIO mode: Disabled
[02/12/2025-10:42:51] [I] Restricted mode: Disabled
[02/12/2025-10:42:51] [I] Skip inference: Disabled
[02/12/2025-10:42:51] [I] Save engine: E:\Steam\steamapps\common\SVFI\models\sr\TensorRT\models\2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.engine
[02/12/2025-10:42:51] [I] Load engine:
[02/12/2025-10:42:51] [I] Profiling verbosity: 0
[02/12/2025-10:42:51] [I] Tactic sources: cublas [OFF], cublasLt [OFF],
[02/12/2025-10:42:51] [I] timingCacheMode: global
[02/12/2025-10:42:51] [I] timingCacheFile: 2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache
[02/12/2025-10:42:51] [I] Enable Compilation Cache: Enabled
[02/12/2025-10:42:51] [I] Enable Monitor Memory: Disabled
[02/12/2025-10:42:51] [I] errorOnTimingCacheMiss: Disabled
[02/12/2025-10:42:51] [I] Preview Features: Use default preview flags.
[02/12/2025-10:42:51] [I] MaxAuxStreams: -1
[02/12/2025-10:42:51] [I] BuilderOptimizationLevel: -1
[02/12/2025-10:42:51] [I] MaxTactics: -1
[02/12/2025-10:42:51] [I] Calibration Profile Index: 0
[02/12/2025-10:42:51] [I] Weight Streaming: Disabled
[02/12/2025-10:42:51] [I] Runtime Platform: Same As Build
[02/12/2025-10:42:51] [I] Debug Tensors:
[02/12/2025-10:42:51] [I] Input(s): fp16:chw
[02/12/2025-10:42:51] [I] Output(s): fp16:chw
[02/12/2025-10:42:51] [I] Input build shape (profile 0): input=1x3x1088x1920+1x3x1088x1920+1x3x1088x1920
[02/12/2025-10:42:51] [I] Input calibration shapes: model
[02/12/2025-10:42:51] [I] === System Options ===
[02/12/2025-10:42:51] [I] Device: 0
[02/12/2025-10:42:51] [I] DLACore:
[02/12/2025-10:42:51] [I] Plugins:
[02/12/2025-10:42:51] [I] setPluginsToSerialize:
[02/12/2025-10:42:51] [I] dynamicPlugins:
[02/12/2025-10:42:51] [I] ignoreParsedPluginLibs: 0
[02/12/2025-10:42:51] [I]
[02/12/2025-10:42:51] [I] === Inference Options ===
[02/12/2025-10:42:51] [I] Batch: Explicit
[02/12/2025-10:42:51] [I] Input inference shape : input=1x3x1088x1920
[02/12/2025-10:42:51] [I] Iterations: 10
[02/12/2025-10:42:51] [I] Duration: 3s (+ 200ms warm up)
[02/12/2025-10:42:51] [I] Sleep time: 0ms
[02/12/2025-10:42:51] [I] Idle time: 0ms
[02/12/2025-10:42:51] [I] Inference Streams: 1
[02/12/2025-10:42:51] [I] ExposeDMA: Disabled
[02/12/2025-10:42:51] [I] Data transfers: Enabled
[02/12/2025-10:42:51] [I] Spin-wait: Disabled
[02/12/2025-10:42:51] [I] Multithreading: Disabled
[02/12/2025-10:42:51] [I] CUDA Graph: Disabled
[02/12/2025-10:42:51] [I] Separate profiling: Disabled
[02/12/2025-10:42:51] [I] Time Deserialize: Disabled
[02/12/2025-10:42:51] [I] Time Refit: Disabled
[02/12/2025-10:42:51] [I] NVTX verbosity: 0
[02/12/2025-10:42:51] [I] Persistent Cache Ratio: 0
[02/12/2025-10:42:51] [I] Optimization Profile Index: 0
[02/12/2025-10:42:51] [I] Weight Streaming Budget: 100.000000%
[02/12/2025-10:42:51] [I] Inputs:
[02/12/2025-10:42:51] [I] Debug Tensor Save Destinations:
[02/12/2025-10:42:51] [I] === Reporting Options ===
[02/12/2025-10:42:51] [I] Verbose: Disabled
[02/12/2025-10:42:51] [I] Averages: 10 inferences
[02/12/2025-10:42:51] [I] Percentiles: 90,95,99
[02/12/2025-10:42:51] [I] Dump refittable layers:Disabled
[02/12/2025-10:42:51] [I] Dump output: Disabled
[02/12/2025-10:42:51] [I] Profile: Disabled
[02/12/2025-10:42:51] [I] Export timing to JSON file:
[02/12/2025-10:42:51] [I] Export output to JSON file:
[02/12/2025-10:42:51] [I] Export profile to JSON file:
[02/12/2025-10:42:51] [I]
[02/12/2025-10:42:51] [I] === Device Information ===
[02/12/2025-10:42:51] [I] Available Devices:
[02/12/2025-10:42:51] [I]   Device 0: "NVIDIA GeForce RTX 5080" UUID: GPU-13d9b629-2176-5b05-416a-aae1ae0d41ee
[02/12/2025-10:42:51] [I] Selected Device: NVIDIA GeForce RTX 5080
[02/12/2025-10:42:51] [I] Selected Device ID: 0
[02/12/2025-10:42:51] [I] Selected Device UUID: GPU-13d9b629-2176-5b05-416a-aae1ae0d41ee
[02/12/2025-10:42:51] [I] Compute Capability: 12.0
[02/12/2025-10:42:51] [I] SMs: 84
[02/12/2025-10:42:51] [I] Device Global Memory: 16302 MiB
[02/12/2025-10:42:51] [I] Shared Memory per SM: 100 KiB
[02/12/2025-10:42:51] [I] Memory Bus Width: 256 bits (ECC disabled)
[02/12/2025-10:42:51] [I] Application Compute Clock Rate: 2.64 GHz
[02/12/2025-10:42:51] [I] Application Memory Clock Rate: 15.001 GHz
[02/12/2025-10:42:51] [I]
[02/12/2025-10:42:51] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[02/12/2025-10:42:51] [I]
[02/12/2025-10:42:51] [I] TensorRT version: 10.8.0
[02/12/2025-10:42:51] [I] Loading standard plugins
[02/12/2025-10:42:51] [I] [TRT] [MemUsageChange] Init CUDA: CPU +83, GPU +0, now: CPU 12560, GPU 1435 (MiB)
[02/12/2025-10:42:51] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +16, GPU +0, now: CPU 12917, GPU 1435 (MiB)
[02/12/2025-10:42:51] [I] Start parsing network model.
[02/12/2025-10:42:51] [I] [TRT] ----------------------------------------------------------------
[02/12/2025-10:42:51] [I] [TRT] Input filename:   C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx
[02/12/2025-10:42:51] [I] [TRT] ONNX IR version:  0.0.8
[02/12/2025-10:42:51] [I] [TRT] Opset version:    17
[02/12/2025-10:42:51] [I] [TRT] Producer name:    pytorch
[02/12/2025-10:42:51] [I] [TRT] Producer version: 2.1.2
[02/12/2025-10:42:51] [I] [TRT] Domain:
[02/12/2025-10:42:51] [I] [TRT] Model version:    0
[02/12/2025-10:42:51] [I] [TRT] Doc string:
[02/12/2025-10:42:51] [I] [TRT] ----------------------------------------------------------------
[02/12/2025-10:42:51] [I] Finished parsing network model. Parse time: 0.0122676
[02/12/2025-10:42:51] [I] Set shape of input tensor input for optimization profile 0 to: MIN=1x3x1088x1920 OPT=1x3x1088x1920 MAX=1x3x1088x1920
[02/12/2025-10:42:51] [W] [TRT] Could not read timing cache from: 2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache. A new timing cache will be generated and written.
[02/12/2025-10:42:51] [E] Error[9]: IBuilder::buildSerializedNetwork: Error Code 9: API Usage Error (Target GPU SM 120 is not supported by this TensorRT release.)
[02/12/2025-10:42:51] [E] Engine could not be created from network
[02/12/2025-10:42:51] [E] Building engine failed
[02/12/2025-10:42:51] [E] Failed to create engine from model or file.
[02/12/2025-10:42:51] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100800] [b43] # trtexec.exe --onnx=C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx --memPoolSize=workspace:1024MiB --timingCacheFile=2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache --device=0 --saveEngine=E:\Steam\steamapps\common\SVFI\models\sr\TensorRT\models\2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.engine --shapes=input:1x3x1088x1920 --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --tacticSources=-CUBLAS,-CUBLAS_LT

Using CUDA 12.0~12.8 solves this issue