CUDA driver version is insufficient for CUDA runtime version

realimposter · December 24, 2022, 5:16am

Description

I am trying to convert a yolov4-tiny model from darknet to onnx, then onnx to tensorrt. I am using the nvidia-cuda:tensorrt-21.02-py3 container, with scripts from GitHub - Tianxiaomo/pytorch-YOLOv4: PyTorch ,ONNX and TensorRT implementation of YOLOv4. I have successfully run the script to convert from darknet to onnx, then onnx to tensorrt (outputted labeled images, which are correct), on my local machine, which is Tesla V100 and Nvidia Driver version 418.165.02. This is the output from nvidia-smi for my local machine:
NVIDIA-SMI 418.165.02 Driver Version: 418.165.02 CUDA Version: 11.2

As I need to deploy the model in the cloud, and tensorrt conversion needs to take place on target hardware, I replicated the same process on my AWS AMI, which is running with Tesla T4. Nvidia-smi in my AWS AMI shows:
NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2
I was able to make the conversion from darknet to onnx correctly, with the same results as local. However, when I ran the command to convert onnx model to tensorrt, exactly as I did on local, I received the error:
cuda failure: cuda driver version is insufficient for cuda runtime version

I am confused. Firstly, isn’t 460.73.01 > 418.165.02, and hence shouldn’t the cuda driver be sufficient? Secondly, if you look at TensorFlow Release Notes :: NVIDIA Deep Learning Frameworks Documentation, it says that 460.27.04 or later is sufficient, and on Tesla T4, I may use 418.40 (or later R418), 440.33 (or later R440), 450.51(or later R450). However, clearly nothing has worked.

In addition, this AMI has been successfully used to do inference with torch models before, so I don’t think my Nvidia driver is incorrectly installed.

Can someone advise?

Environment

TensorRT Version: 7.2.2
GPU Type: Tesla T4
Nvidia Driver Version: 460.73.01
CUDA Version: 11.2
CUDNN Version: 8.1.0
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8
TensorFlow Version (if applicable): N.A.
PyTorch Version (if applicable): 1.6.0
Baremetal or Container (if container which image + tag): * nvidia-cuda:tensorrt-21.02-py3

NVES · December 24, 2022, 6:07am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

realimposter · December 24, 2022, 6:34am

requirements.txt (103 Bytes)
Let me go and generate the --verbose trtexec logs.

I have some issues pulling the onnx model from cloud, but I will try to do so. Meanwhile, I will run your model check and generate the results. Will an onnx model from my local machine suffice, if it was generated from identical docker containers? Does the machine doing the converting matter for onnx? (I know it does for tensorrt)

Meanwhile, some files:
Dockerfile (963 Bytes)
demo_darknet2onnx.py (2.3 KB)
darknet2onnx.py (2.7 KB)
darknet2pytorch.py (21.1 KB)

There are some utils files too, but I don’t know if you want all of it. I just followed exactly the instructions here:

realimposter · December 24, 2022, 7:48am

I have rerun the model, as requested. I tested using nvidia-cuda:tensorrt-20.11-py3 instead (it works perfectly on my local). I have a very slightly adjusted requirements.txt file (as attached, for adjusting for python3.6). I have a slightly adjusted Dockerfile, mainly switching to python3.6 instead of python3.8. I have run my trtexec with --verbose. I have attached my logs. I have also attached my converted model.

I have run onnx.checker.check_model, no exception is thrown. And this is expected, because on AWS AMI, the prediction output from onnx model is exactly the same as on local. The failure is at the trtexec stage.

However, there continues to be a failure of cuda driver being insufficient for runtime, for both the containers used (11.1 and 11.2)

Please advise.
Dockerfile (973 Bytes)
requirements.txt (103 Bytes)
2011-py3_logs.txt (15.9 KB)
yolov4_1_3_416_416_static_2011.onnx (23.2 MB)
yolov4_1_3_416_416_static_2102.onnx (23.2 MB)

spolisetty · January 3, 2023, 4:49pm

Hi,

Could you please share the docker container launching command and nvidia-smi output inside the container.

Thank you.

Topic		Replies	Views
Cuda failure: CUDA driver version is insufficient for CUDA runtime version TensorRT tensorrt , cuda	8	2729	October 12, 2021
Unable to run ONNX runtime with TensorRT execution provider on docker based on NVidia image CUDA Setup and Installation	4	7765	June 22, 2022
Convet onnx to trt engine got error TensorRT	3	1210	January 7, 2022
Unable to convert ONNX to TRT after upgrading TensorRT from 8.5.2 to tensorRT 8.6.1 in NVIDIA ORIN TensorRT cudnn	1	437	November 29, 2023
Conversion from onnx to TensorRT engine TensorRT tensorrt , cuda	1	488	July 24, 2023
Convert onnx model using trtexec in DRIVE OS DRIVE AGX Orin General driveos-dl	8	66	September 4, 2024
Could not parse ONNX model (2) TensorRT cudnn	0	394	February 5, 2024
What version of the Cuda and tensorrt supoorted by my gpu CUDA Setup and Installation tensorrt , cuda	0	344	November 24, 2023
Parseq tensorrt conversion takes for ever to complete TensorRT cudnn	1	41	August 30, 2024
Cuda initialization failure when converting trt model with different GPU TensorRT tensorrt	7	6494	September 28, 2022

CUDA driver version is insufficient for CUDA runtime version

Description

Environment

check_model.py

Related topics