Libcublas.so.11 file missing after installing cuDNN successfully and libcublas.so.12 being present

Hi,

I’m trying to install cuDNN on my Ubuntu 22.10 machine, I’ve followed the instructions here on how to install it using the package manager.

The installation has worked fine and I was able to compile the mnistCUDNN example like in step 1.4.

But unfortunately when I run the compiled mnistCUDNN binary I get the following error:

Could not load library libcudnn_ops_infer.so.8. Error: libcublas.so.11: cannot open shared object file: No such file or directory

My LD_LIBRARY_PATH env variable is set to the following path /usr/local/cuda-12.0/lib64

When I check that directory there is a file called libcudnn_ops_infer.so.8 as well
as the following versions of libcublas.so:

libcublas.so
libcublas.so.12
libcublas.so.12.0.2.224

so there is a version 12 file but no libcublas.so.11 like in the error message.

Also running sudo apt list | grep libcudnn8 gives me the following output:

libcudnn8-dev/unbekannt,now 8.8.1.3-1+cuda12.0 amd64  [installiert]
libcudnn8-samples/unbekannt,now 8.7.0.84-1+cuda11.8 amd64  [installiert]
libcudnn8/unbekannt,now 8.8.1.3-1+cuda12.0 amd64  [installiert]

Now I’m wondering why that file is missing or why it can’t work with the existing libcublas.so.12.

nvidia-smi works fine too and tells me that I have cuda version 12 running.

I don’t know why but I got to fix this by copying the contents of the cuDNN tar file like described here in section 1.3.1, on top of the package manager installation. After doing this everything works.

$ sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda-12.0/include 
$ sudo cp -P cudnn-*-archive/lib/libcudnn* /usr/local/cuda-12.0/lib64 
$ sudo chmod a+r /usr/local/cuda-12.0/include/cudnn*.h /usr/local/cuda-12.0/lib64/libcudnn*

Make sure to check if the the directory of your cuda installation differs.

The tar file is available for download here.

On a side note: Even after copying there’s no libcublas.so.11 file so maybe the build distributed with apt has a bug?

libcublas.so seems to be a part of the cuda toolkit, not cuDNN; if you go to the cuda toolkit downloads page, There’s be a link ‘Tarbell and Zip Archive Deliverables’ at the bottom of the page:

Upon opening that link, there’ll be a link for libcublas, which contains tar files for different architectures and cuda versions.

From which you can download the appropriate version (I suspect you’ll want 11.X, maybe the latest one, 11.11?). You’ll find the libcublas.so.11 files in the lib/ folder of the extracted archive.

I’m facing a similar issue, where I have cuda 11.7 installed on Ubuntu, but the software I’m working with wants libcublas.so.12; I downloaded libdcublas for cuda 12.1 and moved the .so.12 files to /usr/local/cuda/lib64:

sudo cp <libcublas-extracted-archive-folder>/lib/libcublas.so.12* /usr/local/cuda/lib64/
sudo cp <libcublas-extracted-archive-folder>/lib/libcublasLt.so.12* /usr/local/cuda/lib64/

So far, the software I’m running isn’t throwing any complaints about me not actually having cuda 12.1 installed, or not having copied over the contents of the include/ folder.

Yes, libcublas is part of the CUDA toolkit.

Moving those files should never be needed or an expected machine setup process. You handle this via path settings.

I would generally not suggest installing just cublas either, but rather to install a complete CUDA toolkit (so, in this case, install CUDA 11.7 and CUDA 12.1). Those can easily live side-by-side on a development system. Any sensible software stack should not need to refer to both at the same time, but of course if you decide that is what you want, it should be handled in this case with appropriate path settings, not by copying CUDA 12 files into a CUDA 11 directory.

CUDNN is a separate install, and as already indicated, if you want to put the CUDNN files into your CUDA install directory structure, that is acceptable.

You are correct, I was operating under the (very wrong) assumption that Pytorch 1.X only supports CUDA 11.7 and 11.8, but was requesting shared objects from CUDA 12, whereas in reality Pytorch 2.0 is backwards compatible with 1.X and also supports CUDA 12. Sure enough, I switched to CUDA 12 and have no issues now.