I am trying to execute an ONNX model on the TensorRT execution provider (from python). To do this I subscribed to the NVidia ‘TensorRT’ container in AWS marketplace, and set it up as per the instructions here: https://docs.nvidia.com/ngc/ngc-deploy-public-cloud/ngc-aws/index.html#introduction-to-using-ngc-aws
The VM I am running on is using the AMI recommended in the above ( NVIDIA Deep Learning AMI) hosted on an AWS p2.xlarge (1 V100).
I have created an image that uses the NVidia ‘TensorRT’ image as its base, and layers some small updates to allow me to create a python venv. The dockerfile is simply:
from 709825985650.dkr.ecr.us-east-1.amazonaws.com/nvidia/containers/nvidia/tensorrt:22.01-py3
WORKDIR /ebs
RUN apt update && apt install python3.8-venv && apt-get install libsndfile1-dev
where 709825985650.dkr.ecr.us-east-1.amazonaws.com/nvidia/containers/nvidia/tensorrt:22.01-py3
is the local image of ‘TensorRT’
I then build a virtual env (python -m venv …] on attached storage from inside the container to install various extras I need (such as ONNX runtime).
When me code attempts to create an ONNX inference session requesting the ‘TensorrtExecutionProvider’ it fails with:
RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:122 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=0 ; hostname=e399d4dbe2d4 ; expr=cudaSetDevice(device_id_);
nvcc --version reports (inside the container):
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Fri_Dec_17_18:16:03_PST_2021
Cuda compilation tools, release 11.6, V11.6.55
Build cuda_11.6.r11.6/compiler.30794723_0
Driver version (from inside the container):
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 460.73.01 Thu Apr 1 21:40:36 UTC 2021
GCC version: gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
Since there is no nvidia-smi in that image I cannot run it insider the container, but in the host it reports:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:1E.0 Off | 0 |
| N/A 31C P0 24W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I am unsure how the driver and runtime could possibly be out of step with this setup, since neither have been (knowingly!) updated from the NVidia image. Any ideas?
[Note - not sure if there is a clue here, but pytorch [installed in the virtual env for CUDA 11] reports that no GPU is available, which is not directly relevant for me since I am only using pytorch to load a model and save it to ONNX as an offline activity]