CUDA-11.8 : bandwidthTest.cu:686 code=46(cudaErrorDevicesUnavailable) "cudaEventCreate(&start)"

Hi,
After installing CUDA toolkit 11.8 (driver 520.61.05) on my platform and A100 GPU, I can run deviceQuery and nvidia-smi properly, but once I want to run cuda-samples examples or tensorflow, this error occurs:

./bin/x86_64/linux/release/bandwidthTest 
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: NVIDIA A100 80GB PCIe
 Quick Mode

CUDA error at bandwidthTest.cu:686 code=46(cudaErrorDevicesUnavailable) "cudaEventCreate(&start)" 

uname -a:
Linux r750xa 5.4.0-144-generic #161~18.04.1-Ubuntu SMP Fri Feb 10 15:55:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

nvidia-smi:

nvidia-smi 
Wed Mar 29 10:32:20 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100 80G...  On   | 00000000:65:00.0 Off |                    0 |
| N/A   33C    P0    43W / 300W |      0MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

deviceQuery:

./bin/x86_64/linux/release/deviceQuery 
./bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA A100 80GB PCIe"
  CUDA Driver Version / Runtime Version          11.8 / 11.8
  CUDA Capability Major/Minor version number:    8.0
  Total amount of global memory:                 81100 MBytes (85039775744 bytes)
  (108) Multiprocessors, (064) CUDA Cores/MP:    6912 CUDA Cores
  GPU Max Clock rate:                            1410 MHz (1.41 GHz)
  Memory Clock rate:                             1512 Mhz
  Memory Bus Width:                              5120-bit
  L2 Cache Size:                                 41943040 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        167936 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 101 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.8, CUDA Runtime Version = 11.8, NumDevs = 1
Result = PASS

With tensorflow-gpu==2.6.2, gpu is not found:

Python 3.6.9 (default, Mar 10 2023, 16:46:00) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
2023-03-29 10:34:11.839810: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.8/lib64
2023-03-29 10:34:11.839837: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
>>> tf.__version__
'2.6.2'
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Num GPUs Available:  0

Any ideas are welcome

I know nothing about tensorflow, but when I follow the link in the error message, the second sentence refers to requirements for older TF versions, (you are running 2.6.2).

There it states that you need Cuda 11.2 and cuDNN 8.1 - you don’t state what version of cuDNN you installed.

Hi,

The issue from tensorflow does not concerns cuDNN (not installed here) but the availability of the GPU I think so. It seems that I could fetch information about the board and its environment but once I want to exploit the GPU such as cudaEventStart(&start) does not work … (cuda-samples/Samples/1_utilities/BandwidthTest and other tests)
Concerning tensorflow, because I’m constraint to use Ubuntu18.04 and python 3.6, I can use the highest version here 2.6.2. I thought CUDA11-8 would work :(. If you said that CUDA11-2 could be the right version I will try.

If you have another idea, do not hesitate.
Thanks you.

Issue fixed on ubuntu 18.04 → active “iommu” and remove “nopat” flags in grub file:

  1. cat /etc/default/grub
  2. Check GRUB_CMDLINE_LINUX_DEFAULT value with intel_iommu=off and nopat are removed
  3. regenerate grub file if needed
  4. reboot host
  5. then, check cmdline value: ls /proc/cmdline

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.