Nvidia driver hang up with RTX3080 when run trtexec

1183837300 · November 18, 2021, 7:23am

Description

When I use TensorRT’s trtexec command to import and test a simple onnx model, the process will get stuck and the graphics card driver will hang up. What’s more, nvidia-smi command will show that the No GPU is found. This situation only occurs on my 3080 machine, and there is no problem on 1060, 3060 and 3070. I provide two onnx model that one of them will cause the process stuck during building and another will cause the process stuck during inferencing. Both log information during trtexec execution process are provided.

Environment

TensorRT Version: 7.2.1 OR 8.2.0
GPU Type: NVIDIA GeForce RTX 3080
Nvidia Driver Version: 470.63
CUDA Version: 11.1 OR 11.4
CUDNN Version: 8.0.5 OR 8.2.1
Operating System + Version: Ubuntu 18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

model: model_crash_during_building.onnx (20.6 KB)
log: trtexec_stuck_when_building (11.8 KB)
model: model_crash_during_inference.onnx (20.6 KB)
log: trtexec_struck_when_inferencing (29.0 KB)

Steps To Reproduce

in 3080 machine, run the following command

./trtexec --onnx=model_crash_during_building.onnx --verbose --explicitBatch

OR

./trtexec --onnx=model_crash_during_inference.onnx --verbose --explicitBatch

NVES · November 18, 2021, 7:38am

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#model-accuracy

Thanks!

spolisetty · January 4, 2022, 1:09pm

Hi,

Looks like workspace was very less (16 mb), Could you please try improving the workspace.

Thank you.

Topic		Replies	Views
TensorRT on RTX 3080 slow down TensorRT tensorrt	6	2028	September 16, 2022
Onnx to tensorrt inference fail TensorRT tensorrt	1	501	August 24, 2021
Error occurred while running the Tensorrt samples: [reformat.cpp::executeCutensor::385] TensorRT tensorrt	3	1195	December 12, 2023
:nvinfer1::rt::ExecutionContext::enqueueInternal::330, condition: bindings[x] != nullptr TensorRT tensorrt	1	1885	February 15, 2022
Efficientdet conversion from onnx to tensorrt engine freeze TensorRT	5	794	May 5, 2022
Trtexec stuck,when convert onnx to rt TensorRT tensorrt	2	282	July 1, 2024
Trtexec crash on Windows 10 64-bit TensorRT	4	1386	August 19, 2022
Build engine from onnx failed TensorRT	2	1012	December 14, 2021
Error (Could not find any implementation for node ArgMax_260.) TensorRT	9	3576	May 17, 2022
TensorRT model giving constant output TensorRT deepstream	4	1360	November 30, 2021

Nvidia driver hang up with RTX3080 when run trtexec

Description

Environment

Relevant Files

Steps To Reproduce

Related topics