Description
TensorRT (trtexec) can not convert onnx to trt engine on Windows, NVIDIA 5080
Environment
TensorRT Version: 10.8.0.43.Windows.win10.cuda-11.8
GPU Type: “NVIDIA GeForce RTX 5080” UUID: GPU-13d9b629-2176-5b05-416a-aae1ae0d41ee
Nvidia Driver Version: 572.16
CUDA Version: 12.8
CUDNN Version: /
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Steps To Reproduce
- installed TensorRT following
https://docs.nvidia.com/deeplearning/tensorrt/latest/installing-tensorrt/installing.html#zip-file-installation
trtexec.exe --onnx=2x_AniScale2S_Compact_i8_60K-fp32.onnx --saveEngine=2x_AniScale2S_Compact_i8_60K-fp32.1920x1088_workspace1024_device0_10.8.0.43.engine --shapes=input:1x3x1088x1920 --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw
Error Output:
C:\Users\CCDYT\Downloads\TensorRT-10.8.0.43.Windows.win10.cuda-11.8\TensorRT-10.8.0.43>trtexec.exe --onnx="C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx" --memPoolSize=workspace:1024MiB --timingCacheFile="2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache" --device=0 --saveEngine="E:\Steam\steamapps\common\SVFI\models\sr\TensorRT\models\2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.engine" --shapes=input:1x3x1088x1920 --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --tacticSources=-CUBLAS,-CUBLAS_LT
&&&& RUNNING TensorRT.trtexec [TensorRT v100800] [b43] # trtexec.exe --onnx=C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx --memPoolSize=workspace:1024MiB --timingCacheFile=2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache --device=0 --saveEngine=E:\Steam\steamapps\common\SVFI\models\sr\TensorRT\models\2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.engine --shapes=input:1x3x1088x1920 --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --tacticSources=-CUBLAS,-CUBLAS_LT
[02/12/2025-10:42:51] [I] === Model Options ===
[02/12/2025-10:42:51] [I] Format: ONNX
[02/12/2025-10:42:51] [I] Model: C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx
[02/12/2025-10:42:51] [I] Output:
[02/12/2025-10:42:51] [I] === Build Options ===
[02/12/2025-10:42:51] [I] Memory Pools: workspace: 0.000976562 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[02/12/2025-10:42:51] [I] avgTiming: 8
[02/12/2025-10:42:51] [I] Precision: FP32+FP16
[02/12/2025-10:42:51] [I] LayerPrecisions:
[02/12/2025-10:42:51] [I] Layer Device Types:
[02/12/2025-10:42:51] [I] Calibration:
[02/12/2025-10:42:51] [I] Refit: Disabled
[02/12/2025-10:42:51] [I] Strip weights: Disabled
[02/12/2025-10:42:51] [I] Version Compatible: Disabled
[02/12/2025-10:42:51] [I] ONNX Plugin InstanceNorm: Disabled
[02/12/2025-10:42:51] [I] TensorRT runtime: full
[02/12/2025-10:42:51] [I] Lean DLL Path:
[02/12/2025-10:42:51] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[02/12/2025-10:42:51] [I] Exclude Lean Runtime: Disabled
[02/12/2025-10:42:51] [I] Sparsity: Disabled
[02/12/2025-10:42:51] [I] Safe mode: Disabled
[02/12/2025-10:42:51] [I] Build DLA standalone loadable: Disabled
[02/12/2025-10:42:51] [I] Allow GPU fallback for DLA: Disabled
[02/12/2025-10:42:51] [I] DirectIO mode: Disabled
[02/12/2025-10:42:51] [I] Restricted mode: Disabled
[02/12/2025-10:42:51] [I] Skip inference: Disabled
[02/12/2025-10:42:51] [I] Save engine: E:\Steam\steamapps\common\SVFI\models\sr\TensorRT\models\2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.engine
[02/12/2025-10:42:51] [I] Load engine:
[02/12/2025-10:42:51] [I] Profiling verbosity: 0
[02/12/2025-10:42:51] [I] Tactic sources: cublas [OFF], cublasLt [OFF],
[02/12/2025-10:42:51] [I] timingCacheMode: global
[02/12/2025-10:42:51] [I] timingCacheFile: 2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache
[02/12/2025-10:42:51] [I] Enable Compilation Cache: Enabled
[02/12/2025-10:42:51] [I] Enable Monitor Memory: Disabled
[02/12/2025-10:42:51] [I] errorOnTimingCacheMiss: Disabled
[02/12/2025-10:42:51] [I] Preview Features: Use default preview flags.
[02/12/2025-10:42:51] [I] MaxAuxStreams: -1
[02/12/2025-10:42:51] [I] BuilderOptimizationLevel: -1
[02/12/2025-10:42:51] [I] MaxTactics: -1
[02/12/2025-10:42:51] [I] Calibration Profile Index: 0
[02/12/2025-10:42:51] [I] Weight Streaming: Disabled
[02/12/2025-10:42:51] [I] Runtime Platform: Same As Build
[02/12/2025-10:42:51] [I] Debug Tensors:
[02/12/2025-10:42:51] [I] Input(s): fp16:chw
[02/12/2025-10:42:51] [I] Output(s): fp16:chw
[02/12/2025-10:42:51] [I] Input build shape (profile 0): input=1x3x1088x1920+1x3x1088x1920+1x3x1088x1920
[02/12/2025-10:42:51] [I] Input calibration shapes: model
[02/12/2025-10:42:51] [I] === System Options ===
[02/12/2025-10:42:51] [I] Device: 0
[02/12/2025-10:42:51] [I] DLACore:
[02/12/2025-10:42:51] [I] Plugins:
[02/12/2025-10:42:51] [I] setPluginsToSerialize:
[02/12/2025-10:42:51] [I] dynamicPlugins:
[02/12/2025-10:42:51] [I] ignoreParsedPluginLibs: 0
[02/12/2025-10:42:51] [I]
[02/12/2025-10:42:51] [I] === Inference Options ===
[02/12/2025-10:42:51] [I] Batch: Explicit
[02/12/2025-10:42:51] [I] Input inference shape : input=1x3x1088x1920
[02/12/2025-10:42:51] [I] Iterations: 10
[02/12/2025-10:42:51] [I] Duration: 3s (+ 200ms warm up)
[02/12/2025-10:42:51] [I] Sleep time: 0ms
[02/12/2025-10:42:51] [I] Idle time: 0ms
[02/12/2025-10:42:51] [I] Inference Streams: 1
[02/12/2025-10:42:51] [I] ExposeDMA: Disabled
[02/12/2025-10:42:51] [I] Data transfers: Enabled
[02/12/2025-10:42:51] [I] Spin-wait: Disabled
[02/12/2025-10:42:51] [I] Multithreading: Disabled
[02/12/2025-10:42:51] [I] CUDA Graph: Disabled
[02/12/2025-10:42:51] [I] Separate profiling: Disabled
[02/12/2025-10:42:51] [I] Time Deserialize: Disabled
[02/12/2025-10:42:51] [I] Time Refit: Disabled
[02/12/2025-10:42:51] [I] NVTX verbosity: 0
[02/12/2025-10:42:51] [I] Persistent Cache Ratio: 0
[02/12/2025-10:42:51] [I] Optimization Profile Index: 0
[02/12/2025-10:42:51] [I] Weight Streaming Budget: 100.000000%
[02/12/2025-10:42:51] [I] Inputs:
[02/12/2025-10:42:51] [I] Debug Tensor Save Destinations:
[02/12/2025-10:42:51] [I] === Reporting Options ===
[02/12/2025-10:42:51] [I] Verbose: Disabled
[02/12/2025-10:42:51] [I] Averages: 10 inferences
[02/12/2025-10:42:51] [I] Percentiles: 90,95,99
[02/12/2025-10:42:51] [I] Dump refittable layers:Disabled
[02/12/2025-10:42:51] [I] Dump output: Disabled
[02/12/2025-10:42:51] [I] Profile: Disabled
[02/12/2025-10:42:51] [I] Export timing to JSON file:
[02/12/2025-10:42:51] [I] Export output to JSON file:
[02/12/2025-10:42:51] [I] Export profile to JSON file:
[02/12/2025-10:42:51] [I]
[02/12/2025-10:42:51] [I] === Device Information ===
[02/12/2025-10:42:51] [I] Available Devices:
[02/12/2025-10:42:51] [I] Device 0: "NVIDIA GeForce RTX 5080" UUID: GPU-13d9b629-2176-5b05-416a-aae1ae0d41ee
[02/12/2025-10:42:51] [I] Selected Device: NVIDIA GeForce RTX 5080
[02/12/2025-10:42:51] [I] Selected Device ID: 0
[02/12/2025-10:42:51] [I] Selected Device UUID: GPU-13d9b629-2176-5b05-416a-aae1ae0d41ee
[02/12/2025-10:42:51] [I] Compute Capability: 12.0
[02/12/2025-10:42:51] [I] SMs: 84
[02/12/2025-10:42:51] [I] Device Global Memory: 16302 MiB
[02/12/2025-10:42:51] [I] Shared Memory per SM: 100 KiB
[02/12/2025-10:42:51] [I] Memory Bus Width: 256 bits (ECC disabled)
[02/12/2025-10:42:51] [I] Application Compute Clock Rate: 2.64 GHz
[02/12/2025-10:42:51] [I] Application Memory Clock Rate: 15.001 GHz
[02/12/2025-10:42:51] [I]
[02/12/2025-10:42:51] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[02/12/2025-10:42:51] [I]
[02/12/2025-10:42:51] [I] TensorRT version: 10.8.0
[02/12/2025-10:42:51] [I] Loading standard plugins
[02/12/2025-10:42:51] [I] [TRT] [MemUsageChange] Init CUDA: CPU +83, GPU +0, now: CPU 12560, GPU 1435 (MiB)
[02/12/2025-10:42:51] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +16, GPU +0, now: CPU 12917, GPU 1435 (MiB)
[02/12/2025-10:42:51] [I] Start parsing network model.
[02/12/2025-10:42:51] [I] [TRT] ----------------------------------------------------------------
[02/12/2025-10:42:51] [I] [TRT] Input filename: C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx
[02/12/2025-10:42:51] [I] [TRT] ONNX IR version: 0.0.8
[02/12/2025-10:42:51] [I] [TRT] Opset version: 17
[02/12/2025-10:42:51] [I] [TRT] Producer name: pytorch
[02/12/2025-10:42:51] [I] [TRT] Producer version: 2.1.2
[02/12/2025-10:42:51] [I] [TRT] Domain:
[02/12/2025-10:42:51] [I] [TRT] Model version: 0
[02/12/2025-10:42:51] [I] [TRT] Doc string:
[02/12/2025-10:42:51] [I] [TRT] ----------------------------------------------------------------
[02/12/2025-10:42:51] [I] Finished parsing network model. Parse time: 0.0122676
[02/12/2025-10:42:51] [I] Set shape of input tensor input for optimization profile 0 to: MIN=1x3x1088x1920 OPT=1x3x1088x1920 MAX=1x3x1088x1920
[02/12/2025-10:42:51] [W] [TRT] Could not read timing cache from: 2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache. A new timing cache will be generated and written.
[02/12/2025-10:42:51] [E] Error[9]: IBuilder::buildSerializedNetwork: Error Code 9: API Usage Error (Target GPU SM 120 is not supported by this TensorRT release.)
[02/12/2025-10:42:51] [E] Engine could not be created from network
[02/12/2025-10:42:51] [E] Building engine failed
[02/12/2025-10:42:51] [E] Failed to create engine from model or file.
[02/12/2025-10:42:51] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100800] [b43] # trtexec.exe --onnx=C:\Users\CCDYT\Downloads\2x_AniScale2S_Compact_i8_60K-fp32.onnx --memPoolSize=workspace:1024MiB --timingCacheFile=2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.cache --device=0 --saveEngine=E:\Steam\steamapps\common\SVFI\models\sr\TensorRT\models\2x-LD-Anime-Compact.onnx.1920x1088_workspace1024_device0_10.8.0.43.engine --shapes=input:1x3x1088x1920 --fp16 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --tacticSources=-CUBLAS,-CUBLAS_LT