[checkMacros.cpp::catchCudaError::272] Error Code 1: Cuda Runtime (CUDA driver is a stub library)

Hi, everyone , first of all, this error occurs when i tried to convert onnx model trained by pytorch to *.engine file.

My enviroment is as follows

RTX3090 / ubuntu18.04

i 've intalled cuda11.2 and tensorrt8.2 GA version on my computer.

the details are

ubuntu 18.04
TensorRT 8.2GA
onnx-tensorrt for tensorrt8,please refer to other repos, onnx-tensorrt repo
cuda 11.2

Name: torch
Version: 1.7.0+cu110

Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/{Path}/anaconda3/envs/CenterTrack/lib/python3.6/site-packages
Requires: numpy, typing-extensions, future, dataclasses
Required-by: torchvision, torchaudio

GTX3090

– The C compiler identification is GNU 7.5.0
– The CXX compiler identification is GNU 7.5.0
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Check for working C compiler: /usr/bin/cc - skipped
– Detecting C compile features
– Detecting C compile features - done
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: /usr/bin/c++ - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Found Protobuf: /usr/local/lib/libprotobuf.so;-lpthread (found version “3.15.8”)

–
– ******** Summary ********
– CMake version : 3.22.2
– CMake command : /snap/cmake/1005/bin/cmake
– System : Linux
– C++ compiler : /usr/bin/c++
– C++ compiler version : 7.5.0
– CXX flags : -Wall -Wno-deprecated-declarations -Wno-unused-function -Wnon-virtual-dtor
– Build type : Release
– Compile definitions : SOURCE_LENGTH=42;ONNX_NAMESPACE=onnx2trt_onnx
– CMAKE_PREFIX_PATH :
– CMAKE_INSTALL_PREFIX : /usr/local
– CMAKE_MODULE_PATH :

– ONNX version : 1.8.0
– ONNX NAMESPACE : onnx2trt_onnx
– ONNX_BUILD_TESTS : OFF
– ONNX_BUILD_BENCHMARKS : OFF
– ONNX_USE_LITE_PROTO : OFF
– ONNXIFI_DUMMY_BACKEND : OFF
– ONNXIFI_ENABLE_EXT : OFF

– Protobuf compiler : /usr/local/bin/protoc
– Protobuf includes : /usr/local/include
– Protobuf libraries : /usr/local/lib/libprotobuf.so;-lpthread
– BUILD_ONNX_PYTHON : OFF
– Found CUDA headers at /usr/local/cuda/include
– Found TensorRT headers at /usr/include/x86_64-linux-gnu
– Find TensorRT libs at /usr/lib/x86_64-linux-gnu/libnvinfer.so;/usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so
– Found TENSORRT: /usr/include/x86_64-linux-gnu
– Found Threads: TRUE
– Found CUDA: /usr/local/cuda-11.2 (found version “11.2”)
– Found TensorRT headers at /usr/include/x86_64-linux-gnu

finally, i built my program successfully on the basis of above enviroment, however, when i executed the program to do onnx transferring to engine file, i got the error of

[checkMacros.cpp::catchCudaError::272] Error Code 1: Cuda Runtime (CUDA driver is a stub library)

i 've tried to search this error on google, however, i have not found some helpful information yet, it referred stub library, which really made me confused, so any help or suggestions will be so much appreciated! thanks in advance!!

BY the way, i only found

CUDA_ERROR_STUB_LIBRARY = 34

“This indicates that the CUDA driver that the application has loaded is a stub library. Applications that run with the stub rather than a real driver loaded will result in CUDA API returning this error.”

above descriptions on the web of cuda toolkit
CUDA Driver API :: CUDA Toolkit Documentation , please any hits or supervision!! help

One possible issue is that your LD_LIBRARY_PATH env var is set incorrectly.

Hello, Thanks for your reply, i checked the LD_library_path in my environment, it refers to /usr/local/cuda-11.2/lib64 , is it correctly set?

You’re showing how you set it. That doesn’t indicate what the actual contents are. For example we don’t know what the contents of LD_LIBRARY_PATH are before you execute that export line, and of course that matters.

When you hit the error, that would be the point to inspect the variable, e.g. via:

echo $LD_LIBRARY_PATH

or something equivalent via python, e.g.

import os
os.system('echo $LD_LIBRARY_PATH')

Hello, Thanks for your help, i trying to run the sample of sampleOnnxmnist.cpp which are included in tensorrt8.2 toolkit, i compiled it successfully, however it occurred the same errors with my own program. @Robert_Crovella


BY the way, it showed my LD_LIBRARY_PATH is /usr/local/cuda-11.2/lib64

however, i execute the command with sudo make CUDA_INSTALL_DIR=/usr/local/cuda-11.2 to compile the sample, that is because there is an error showing i should specify the CUDA_INSTALL_DIR.

any suggestions? please @Robert_Crovella ,should i check the whole tensorrt installation instruction? or any references?

You have a corrupted install of some sort.

There is a file called libcuda.so that is in a place it is not supposed to be. This file should be in two places:

  1. Wherever the GPU driver install put it. This is the proper one to use. No I can’t be real specific here, because the actual location of this file varies depending on your OS (and I don’t happen to have the install locations memorized for Ubuntu 18.04). And this might actually be two locations, one corresponding to 32-bit usage and one corresponding to 64-bit usage. For example, on my fresh load of CUDA 11.6.1 on a fresh load of CentOS 7, I find that the GPU driver installer has placed it in /usr/lib (the 32-bit location) and /usr/lib64 (the 64-bit location).
  2. In /usr/local/cuda/lib64/stubs. This is one that should only be used for linking purposes and should never be discovered by the runtime loader.

I can think of two options:

  1. use a utility like sudo find / -name libcuda.so to locate every single instance of that file on your machine. Remove any that don’t fit the description above.

  2. Remove all aspects of CUDA and GPU driver from your machine, and do a complete reload.

If the machine is a horrible mess, option 2 might really only be achievable by doing a disk wipe and OS reload, first. If option 1 doesn’t seem to work for some reason, then the only suggestion I have left is option 2.

And by all means, make sure that at no point does your LD_LIBRARY_PATH env var include the path /usr/local/cuda/lib64/stubs. And by all means, don’t copy the stub version of libcuda.so anywhere. You shouldn’t ever copy or symlink to libcuda.so under any circumstances.

Also note that it generally should not be necessary to have the GPU driver install location on your LD_LIBRARY_PATH variable. The runtime loader is usually already configured (e.g. by ldconfig or similar) to look in the location that the GPU driver installer places it.

Finally, I note that you have installed pytorch via anaconda. If anaconda has done something I am unfamiliar with or unexpected in your conda environment, then you might still run into trouble here. I don’t think this should be the case. When running things from a python/conda environment, a conclusive read of the LD_LIBRARY_PATH variable can only be ascertained using the method I already gave, which you don’t seem to have done. You don’t seem to have given a directed response to my last posting.

1 Like

@Robert_Crovella ,wonderfull, thanks for your fully detailed suggestions, i will try both options one by one, if works , i will represent the results here if it could be helpfull for someone else.

Cool, it works, i removed all “libcuda.so” in other inappropriate places as you mentioned in option one. thanks you very much, the stub library error then disappeared.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.