Unable to run ONNX runtime with TensorRT execution provider on docker based on NVidia image

I am trying to execute an ONNX model on the TensorRT execution provider (from python). To do this I subscribed to the NVidia ‘TensorRT’ container in AWS marketplace, and set it up as per the instructions here: https://docs.nvidia.com/ngc/ngc-deploy-public-cloud/ngc-aws/index.html#introduction-to-using-ngc-aws
The VM I am running on is using the AMI recommended in the above ( NVIDIA Deep Learning AMI) hosted on an AWS p2.xlarge (1 V100).
I have created an image that uses the NVidia ‘TensorRT’ image as its base, and layers some small updates to allow me to create a python venv. The dockerfile is simply:

from 709825985650.dkr.ecr.us-east-1.amazonaws.com/nvidia/containers/nvidia/tensorrt:22.01-py3

WORKDIR /ebs

RUN apt update && apt install python3.8-venv && apt-get install libsndfile1-dev

where 709825985650.dkr.ecr.us-east-1.amazonaws.com/nvidia/containers/nvidia/tensorrt:22.01-py3 is the local image of ‘TensorRT’

I then build a virtual env (python -m venv …] on attached storage from inside the container to install various extras I need (such as ONNX runtime).

When me code attempts to create an ONNX inference session requesting the ‘TensorrtExecutionProvider’ it fails with:

RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:122 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=0 ; hostname=e399d4dbe2d4 ; expr=cudaSetDevice(device_id_);

nvcc --version reports (inside the container):

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Fri_Dec_17_18:16:03_PST_2021
Cuda compilation tools, release 11.6, V11.6.55
Build cuda_11.6.r11.6/compiler.30794723_0

Driver version (from inside the container):

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  460.73.01  Thu Apr  1 21:40:36 UTC 2021
GCC version:  gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)

Since there is no nvidia-smi in that image I cannot run it insider the container, but in the host it reports:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   31C    P0    24W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I am unsure how the driver and runtime could possibly be out of step with this setup, since neither have been (knowingly!) updated from the NVidia image. Any ideas?

[Note - not sure if there is a clue here, but pytorch [installed in the virtual env for CUDA 11] reports that no GPU is available, which is not directly relevant for me since I am only using pytorch to load a model and save it to ONNX as an offline activity]

Update - I have reduced the steps required so as not to involve modifying the global python to support venv, or to require pytorch. Hence using he NVidia image unmodified.

Steps:

  1. Run a shell inside docker with the NVidia TensorRT image (the volume mount provides a test script and sample ONNX model verified in both CPU and default CUDA execution providers):
docker run -it -v /ebs:/ebs 709825985650.dkr.ecr.us-east-1.amazonaws.com/nvidia/containers/nvidia/tensorrt:22.01-py3 bash
  1. Install the ONNX runtime globally inside the container (ethemerally, but this is only a test - obviously in a real world case this would be part of a docker build):
pip install onnxruntime-gpu
  1. Run the test script:
python onnx_load_test.py --onnx /ebs/models/test_model.onnx

which fails with:

Traceback (most recent call last):
  File "onnx_load_test.py", line 37, in <module>
    run(args.onnx)
  File "onnx_load_test.py", line 14, in run
    ort_session = onnxruntime.InferenceSession(onnx,
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 335, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 379, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:122 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=0 ; hostname=8f2a5528ea84 ; expr=cudaSetDevice(device_id_);

For reference the test script is just:

import os
import argparse
import onnxruntime


def run(onnx):
    if not os.path.exists(onnx):
        raise ValueError("Specified model file not found")

    #  'TensorrtExecutionProvider',
    ort_session = onnxruntime.InferenceSession(onnx,
                                               providers=['TensorrtExecutionProvider'])
    print(f"ONNX device: {onnxruntime.get_device()}")
    print(f"Session providers: {ort_session.get_providers()}")


if __name__ == "__main__":
    # Parse arguments
    parser = argparse.ArgumentParser()
    parser.add_argument('--onnx', type=str, help='Optional path to save/restore ONNX to and run with ONNX runtime')
    parser.add_argument('--dummy')
    args = parser.parse_args()

    run(args.onnx)

According to the CUDA doc , CUDA 11.6 requires driver version >=450.80.02. In this case we have:

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  460.73.01  Thu Apr  1 21:40:36 UTC 2021
GCC version:  gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)

Since 460 > 450 this seems like it should be ok…?

1 Like

Did you ever find a solution to this? I’m getting the same issues.

I did not

Cuda 11.6 requires nominally a 510.xx or newer driver. Refer to table 3 here.

You are referring to table 2. That only applies if you have the compatibility libraries installed in your container. Without it, your driver only has CUDA 11.2 support:

That is the reason you are getting the mismatch message. The image presumably has a newer CUDA version dependency. You have a few options:

  1. since you are building your own image, install the compatibility libraries. The instructions are linked around the table 2 you read to conclude that 450.xx was an acceptable driver.

<or>

  1. Update the driver on the machine instance you are using.

<or>

  1. downgrade the TRT container image you are starting with, to one that works with CUDA 11.2. You can find such images on ngc.nvidia.com