You have a corrupted install of some sort.
There is a file called libcuda.so
that is in a place it is not supposed to be. This file should be in two places:
- Wherever the GPU driver install put it. This is the proper one to use. No I can’t be real specific here, because the actual location of this file varies depending on your OS (and I don’t happen to have the install locations memorized for Ubuntu 18.04). And this might actually be two locations, one corresponding to 32-bit usage and one corresponding to 64-bit usage. For example, on my fresh load of CUDA 11.6.1 on a fresh load of CentOS 7, I find that the GPU driver installer has placed it in
/usr/lib
(the 32-bit location) and/usr/lib64
(the 64-bit location). - In
/usr/local/cuda/lib64/stubs
. This is one that should only be used for linking purposes and should never be discovered by the runtime loader.
I can think of two options:
-
use a utility like
sudo find / -name libcuda.so
to locate every single instance of that file on your machine. Remove any that don’t fit the description above. -
Remove all aspects of CUDA and GPU driver from your machine, and do a complete reload.
If the machine is a horrible mess, option 2 might really only be achievable by doing a disk wipe and OS reload, first. If option 1 doesn’t seem to work for some reason, then the only suggestion I have left is option 2.
And by all means, make sure that at no point does your LD_LIBRARY_PATH
env var include the path /usr/local/cuda/lib64/stubs
. And by all means, don’t copy the stub version of libcuda.so
anywhere. You shouldn’t ever copy or symlink to libcuda.so
under any circumstances.
Also note that it generally should not be necessary to have the GPU driver install location on your LD_LIBRARY_PATH
variable. The runtime loader is usually already configured (e.g. by ldconfig or similar) to look in the location that the GPU driver installer places it.
Finally, I note that you have installed pytorch via anaconda. If anaconda has done something I am unfamiliar with or unexpected in your conda environment, then you might still run into trouble here. I don’t think this should be the case. When running things from a python/conda environment, a conclusive read of the LD_LIBRARY_PATH
variable can only be ascertained using the method I already gave, which you don’t seem to have done. You don’t seem to have given a directed response to my last posting.