CUDA driver version is insufficient for CUDA runtime version


I am trying to convert a yolov4-tiny model from darknet to onnx, then onnx to tensorrt. I am using the nvidia-cuda:tensorrt-21.02-py3 container, with scripts from GitHub - Tianxiaomo/pytorch-YOLOv4: PyTorch ,ONNX and TensorRT implementation of YOLOv4. I have successfully run the script to convert from darknet to onnx, then onnx to tensorrt (outputted labeled images, which are correct), on my local machine, which is Tesla V100 and Nvidia Driver version 418.165.02. This is the output from nvidia-smi for my local machine:
NVIDIA-SMI 418.165.02 Driver Version: 418.165.02 CUDA Version: 11.2

As I need to deploy the model in the cloud, and tensorrt conversion needs to take place on target hardware, I replicated the same process on my AWS AMI, which is running with Tesla T4. Nvidia-smi in my AWS AMI shows:
NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2
I was able to make the conversion from darknet to onnx correctly, with the same results as local. However, when I ran the command to convert onnx model to tensorrt, exactly as I did on local, I received the error:
cuda failure: cuda driver version is insufficient for cuda runtime version

I am confused. Firstly, isn’t 460.73.01 > 418.165.02, and hence shouldn’t the cuda driver be sufficient? Secondly, if you look at TensorFlow Release Notes :: NVIDIA Deep Learning Frameworks Documentation, it says that 460.27.04 or later is sufficient, and on Tesla T4, I may use 418.40 (or later R418), 440.33 (or later R440), 450.51(or later R450). However, clearly nothing has worked.

In addition, this AMI has been successfully used to do inference with torch models before, so I don’t think my Nvidia driver is incorrectly installed.

Can someone advise?


TensorRT Version: 7.2.2
GPU Type: Tesla T4
Nvidia Driver Version: 460.73.01
CUDA Version: 11.2
CUDNN Version: 8.1.0
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8
TensorFlow Version (if applicable): N.A.
PyTorch Version (if applicable): 1.6.0
Baremetal or Container (if container which image + tag): * nvidia-cuda:tensorrt-21.02-py3

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging

requirements.txt (103 Bytes)
Let me go and generate the --verbose trtexec logs.

I have some issues pulling the onnx model from cloud, but I will try to do so. Meanwhile, I will run your model check and generate the results. Will an onnx model from my local machine suffice, if it was generated from identical docker containers? Does the machine doing the converting matter for onnx? (I know it does for tensorrt)

Meanwhile, some files:
Dockerfile (963 Bytes) (2.3 KB) (2.7 KB) (21.1 KB)

There are some utils files too, but I don’t know if you want all of it. I just followed exactly the instructions here:

I have rerun the model, as requested. I tested using nvidia-cuda:tensorrt-20.11-py3 instead (it works perfectly on my local). I have a very slightly adjusted requirements.txt file (as attached, for adjusting for python3.6). I have a slightly adjusted Dockerfile, mainly switching to python3.6 instead of python3.8. I have run my trtexec with --verbose. I have attached my logs. I have also attached my converted model.

I have run onnx.checker.check_model, no exception is thrown. And this is expected, because on AWS AMI, the prediction output from onnx model is exactly the same as on local. The failure is at the trtexec stage.

However, there continues to be a failure of cuda driver being insufficient for runtime, for both the containers used (11.1 and 11.2)

Please advise.
Dockerfile (973 Bytes)
requirements.txt (103 Bytes)
2011-py3_logs.txt (15.9 KB)
yolov4_1_3_416_416_static_2011.onnx (23.2 MB)
yolov4_1_3_416_416_static_2102.onnx (23.2 MB)


Could you please share the docker container launching command and nvidia-smi output inside the container.

Thank you.