As title says, I am getting this error and can’t seem to get rid of it while starting tensorrt and running everything on the latest available versions.
Environment
TensorRT Version: 7.1.3.4 GPU Type: RTX2070 Nvidia Driver Version: 450.57 CUDA Version: 11.0.2 CUDNN Version: 8.0.1 Operating System + Version: ubuntu 18.04 Python Version (if applicable): 3.7.5 TensorFlow Version (if applicable): PyTorch Version (if applicable): 1.5.1 Baremetal or Container (if container which image + tag): VM
Relevant Files
Steps To Reproduce
using the python script trying to convert ONNX file to TensorRT
trtexec --onnx=yolov4_5_3_608_608.onnx --workspace=4096 --saveEngine=yolov4-5 --fp16 --explicitBatch
Note that I tested also with the driver version included in the CUDA toolkit package with the same outcome. I am also running caffe and tensorRT models on opencv on this cudnn, cuda and driver set without this problem.
Hi @anhmantran,
I tried reproducing your issue with yolov4 model, and it worked fine for me.
Can you share your onnx model so that i can try on that?
Also please check once if you are using compatible versions of cuda from the below link
The ONNX file is the public model YOLO V4. trtexec crashes out before getting to even look for the file so I don’t think it has anything to do with it. I get the same error with any random file name I put in the command.
I also just upgraded cudnn to 8.0.2 GA and the problem remains. trtexec just refuses to do anything. I also uninstalled and reinstalled tensorRT and the problem is still there. It just doesn’t appear to read the cuda drive version correctly. Any way to bypass that check or at least figure out where it is getting its cuda or driver version from?
So I am running openCV, Pytorch and tensorflow all see the driver correctly and only tensorRT has this problem…
Can you please validate with installation guide, on any missing steps?
However to avoid any system related dependency, we recommend you to use NGC containers.
Thank you for your response. I did follow the installation steps very carefully and as I said, I even tested with the 450.51.5 driver which came embedded in the cuda tool kit. All my other frameworks work even using the RT cores and only tensorRT is problematic. And unfortunately, I am one of the people who hate containers with a passion due to the added layer of management and complication they involve so no, that’s not an option. I will just be using other frameworks for now.
Quick update on this: I tried starting the python binding to tensorrt and am getting a similar error 35.
This is getting really frustrating and please do not tell me it is a cudnn, driver or cuda version mismatch.
Cuda and cudnn have been working fine and my GPU is running 7 different inferences on other frameworks using CUDA.
I am suspecting that tensorrt is checking is somehow checking in the wrong place, I just don’t know where and what it is looking for and it is possible that some remnants of previous driver installations are still there though this is a pretty new installation and I have never had any version of cuda below 11 or driver below 450.5x
I have gone through and reinstalled the driver, uninstalled cuda11 toolkit, reinstalled the cuda toolkit with run making sure that it was uninstalled from apt/deb and following all the documentations
nvidia-smi shows:
| NVIDIA-SMI 450.57 Driver Version: 450.57 CUDA Version: 11.0 |
What am I missing?
The tensorRT samples pop the same error:
[TRT] CUDA initialization failure with error 35.
Meanwhile opencv, ffmpeg, pytorch have been running fine with cuda/cudnn enabled.
I researched this a bit and it seems to be a long standing error across a number of versions where some inconsistencies exist in how the version is checked and how cuda is invoked. I am sure I have some installation issues but I can’t point to any.
As you can see, somehow the libcuda.so library points to a static library within the cuda installation and not to the driver library. I don’t know how it became that way but I deleted the symbolic link and recreated a new one to point to the driver library: