Cannot get MATLAB or Python Libraries (tensorflow, tensorrt, or pytorch) to recognize GPU

I have a gpu driver installed and running “nvidia-smi” gives me an output that suggests it is working:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   38C    P8    11W / 320W |    445MiB / 16376MiB |      9%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1363      G   /usr/lib/xorg/Xorg                252MiB |
|    0   N/A  N/A      1565      G   /usr/bin/gnome-shell               46MiB |
|    0   N/A  N/A    288347      G   ...962700789120202471,131072      113MiB |
|    0   N/A  N/A    302096      G   ...Matlab/bin/glnxa64/MATLAB        3MiB |
|    0   N/A  N/A    302398      G   ...975D29E5EB0A20ACBBF6FD973       12MiB |
+-----------------------------------------------------------------------------+

If I run “lspci -vnn | grep VGA -A 12”, I get:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD103 [GeForce RTX 4080] [10de:2704] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: NVIDIA Corporation Device [10de:167a]
	Flags: bus master, fast devsel, latency 0, IRQ 149
	Memory at a1000000 (32-bit, non-prefetchable) [size=16M]
	Memory at b0000000 (64-bit, prefetchable) [size=256M]
	Memory at c0000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 3000 [size=128]
	Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22bb] (rev a1)

Yet I am unable to get MATLAB or Python libraries such as tensorflow, tensorrt, or pytorch to recognize my GPU. I have even tried using docker images to access my GPU and they work… but don’t use my GPU.

Here are the errors I get from MATLAB when I attempt to use functions from their Parallel Computing Toolbox:

Error using gpuDevice (line 26)

No supported GPU device was found on this computer. To learn more about supported GPU devices, see

www.mathworks.com/gpudevice.

Here are some of the errors I get from Python when I attempt to use my GPU for tensorflow, tensorrt, and pytorch:

Tensorflow errors:

>>>import tensorflow as tf
2023-04-19 15:45:34.614095: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-04-19 15:45:34.614113: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-04-19 15:45:35.291067: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-04-19 15:45:35.291207: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-04-19 15:45:35.291214: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly
>>> tf.config.list_physical_devices('GPU')
2023-04-19 15:51:31.057174: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2023-04-19 15:51:31.057202: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: drew-desktop
2023-04-19 15:51:31.057208: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: drew-desktop
2023-04-19 15:51:31.057263: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 525.105.17
2023-04-19 15:51:31.057281: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 525.105.17
2023-04-19 15:51:31.057287: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 525.105.17
[]

Pytorch errors:

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/drew/.local/lib/python3.10/site-packages/torch/__init__.py", line 229, in <module>
    from torch._C import *  # noqa: F403
ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory

Tensorrt errors:

>>> import tensorrt
>>> print(tensorrt.__version__)
8.6.0
>>> assert tensorrt.Builder(tensorrt.Logger())
[04/19/2023-16:02:17] [TRT] [W] Unable to determine GPU memory usage
[04/19/2023-16:02:17] [TRT] [W] Unable to determine GPU memory usage
[04/19/2023-16:02:17] [TRT] [W] CUDA initialization failure with error: 999. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: pybind11::init(): factory function returned nullptr

Note: I have tried a ton of things. That’s not to say I’m unwilling to try a ton more, because I really need to get this to work. But to quickly recap some of the more obvious things I have tried: rebooting, creating export/library paths to files/directories mentioned in error/warning messages, reinstalling and rebooting, docker, venv (conda) and trying a few different tensorflow and pytorch packages that are notorious for working with gpu support. I have Ubuntu 22.04.2 LTS.

The error messages say you have neither cuda nor cudnn installed. You also need to install the exact versions the tensorflow package was built for.

Thanks! After taking your advice, I ended up referencing this compatibility table and following this blog post verbatim. I had been struggling with getting my coding platforms to recognize and use my GPUs for over a week. Problems solved!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.