Error (smClk > 0) when I using TensorRT8.6 with cuda12 on RTX4070


An Error when use trtexec on RTX4070


TensorRT Version: TensorRT-
GPU Type: RTX4070
Nvidia Driver Version: 535
CUDA Version: 12.0.1_528
CUDNN Version:
Operating System + Version: Win10
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

Use this to Export yolov8s.onnx
yolov8s.onnx (42.8 MB)

Steps To Reproduce

Export yolov8s.onnx from GitHub - ultralytics/ultralytics: NEW - YOLOv8 🚀 in PyTorch > ONNX > CoreML > TFLite by cmd yolo detect export format=onnx

then copy to path of trtexec.exe and run cmd .\trtexec.exe onnx=yolov8s.onnx,I get an error Error[1]: Unexpected exception KTM assertion failure: C:\_src\externals\ktm\src\timingModel.cpp:382 smClk > 0

here is my log
error.log (687.2 KB)

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging

yolov8s.onnx (42.8 MB)
Here is the onnx file

error.log (687.2 KB)


Based on the following logs, looks like CUDA report the device clock incorrectly.

[07/05/2023-09:50:21] [I] === Device Information ===
[07/05/2023-09:50:21] [I] Selected Device: NVIDIA GeForce RTX 4070
[07/05/2023-09:50:21] [I] Compute Capability: 8.9
[07/05/2023-09:50:21] [I] SMs: 46
[07/05/2023-09:50:21] [I] Device Global Memory: 12281 MiB
[07/05/2023-09:50:21] [I] Shared Memory per SM: 100 KiB
[07/05/2023-09:50:21] [I] Memory Bus Width: 192 bits (ECC disabled)
[07/05/2023-09:50:21] [I] Application Compute Clock Rate: 0 GHz
[07/05/2023-09:50:21] [I] Application Memory Clock Rate: 0 GHz
[07/05/2023-09:50:21] [I]
[07/05/2023-09:50:21] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.

Could you please run the following CUDA sample and confirm if it works fine.

Thank you.