Cuda libraries not found for cuda 11 in ubuntu 20.04

I am using Nvidia VM on Azure with Ubuntu 20.04
I already had nvidia and cuda installed but hile running my program it still showed libraries not found

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000001:00:00.0 Off |                  Off |
| N/A   32C    P0    25W /  70W |      0MiB / 16127MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

This is the error I see for multiple cuda libraries:

Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2022-05-02 05:33:53.131224: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2022-05-02 05:33:53.131235: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2022-05-02 05:33:55.738515: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1592] Cannot dlopen some GPU libraries. 

I am not sure if this error is because of GPG key error rotation or something else, since i tried to install driver separately as well but it kept giving error of not locating drivers.

I also tried:

sudo apt-get -y install cuda

Reading package lists... Done

Building dependency tree

Reading state information... Done

Some packages could not be installed. This may mean that you have

requested an impossible situation or if you are using the unstable

distribution that some required packages have not yet been created

or been moved out of Incoming.

The following information may help to resolve the situation:

The following packages have unmet dependencies:

cuda : Depends: cuda-11-6 (>= 11.6.2) but it is not going to be installed

E: Unable to correct problems, you have held broken packages.

I am using tensorflow-gpu-2.1.3 as thats the requirement for my program.

Please help.
@kmittman

Hi @Eyshika
I hope you do not mind doing some debugging steps. Please try running this to help determine the root cause dependency issue.

recurse_apt() { getDeps() { output=$(timeout 5 sudo apt-get install $1 2>&1 | grep "Depends:"); broken=$(echo "$output" | sed -e 's|but it is not.*||' -e 's|(.*||' | awk '{print $NF}'); }; getDeps "$1"; while [[ -n $broken ]]; do for dep in $broken; do echo ":: $dep"; prev="$output"; getDeps "$dep"; done; done; echo "$prev"; }

$ recurse_apt cuda

Regarding the dynamic library errors,
libnvinfer is part of TensorRT, need to make sure to use packages that match the installed CUDA toolkit version (i.e. +cuda11.4). Likewise for cuDNN, need to install the version matching the toolkit version.

Can you provide the instructions you used to install this software?

Hello, @kmittman I’m having the same issue as @Eyshika. recurse_apt cuda isn’t found.

>> cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"

and

>> cat /etc/nv_tegra_release
# R34 (release), REVISION: 1.1, GCID: 30414990, BOARD: t186ref, EABI: aarch64, DATE: Tue May 17 04:20:55 UTC 2022

Can you please advise?

Hi @aslovacek
I defined that recurse_apt shell function in the previous comment, it’s just to help debug where exactly the dependency breakage is.

Otherwise, can just follow the dependency chain, i.e.

  1. sudo apt-get install cuda

    cuda : Depends: cuda-11-6 (>= 11.6.2) but it is not going to be installed

  2. sudo apt-get install cuda-11-6

    cuda-toolkit-11-6 : Depends: […] but it is not going to be installed

until the error message is different.