CONDA ENV compatible NVIDIA driver, cuda, cuddn version

Hi, I was wondering what is the role of conda environment in cuda and cudnn

Lets describe:
I’m on ubuntu 20.04 and i have installed latest Nvidia Driver (535), then i’ve used CUDA toolkit runfile for installing CUDA 11.2

$ nvidia-smi
Sun Jul 2 13:21:13 2023
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1080 Off | 00000000:01:00.0 Off | N/A |
| N/A 34C P8 4W / 200W | 320MiB / 8192MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce GTX 1080 Off | 00000000:02:00.0 Off | N/A |
| N/A 38C P8 4W / 200W | 8MiB / 8192MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1059 G /usr/lib/xorg/Xorg 84MiB |
| 0 N/A N/A 1395 G /usr/bin/gnome-shell 55MiB |
| 0 N/A N/A 4226 C …/sl/miniconda3/envs/pose/bin/python 176MiB |
| 1 N/A N/A 1059 G /usr/lib/xorg/Xorg 4MiB |
±--------------------------------------------------------------------------------------+

So Nvidia Driver is installed and confirmed! for the cuda toolkit i can also confirm via:

$nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

it shows that version 11.2 is successfully installed.
After these i get into the cudnn installation by downloading and extracting the TAR file into /usr/local
( where the cuda is installed {/usr/local/cuda-11.2}, So that cudnn*.h files and libcuddn copyed into the /usr/local/cuda/{include,lib64}

i wont be able to confirm the installation by
$ dpkg -l libcudnn*
and
$ cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

but i can confirm by
$ cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
define CUDNN_MAJOR 8
define CUDNN_MINOR 1
define CUDNN_PATCHLEVEL 0

define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

endif /* CUDNN_VERSION_H */

Also i added
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda/bin:$PATH

after that i have created a conda environment using conda create -n test python=3.10

So it creates an environment while there are no way for me to test the real cudnn version, because /usr/src/samples are not available!

(is there any way to test the cudnn?

then i searched the net and found a code that can be used to test the cudnn as bellow:

CODE:
“”“”“”“”“”“”“”“”“”“”"
import numpy as np
import ctypes

Load CuDNN library using ctypes

cudnn_lib = ctypes.cdll.LoadLibrary(‘libcudnn.so.8’)

Set CuDNN data types

cudnnDataType_t = ctypes.c_void_p
cudnnTensorDescriptor_t = ctypes.c_void_p
cudnnConvolutionDescriptor_t = ctypes.c_void_p
cudnnFilterDescriptor_t = ctypes.c_void_p
cudnnConvolutionFwdAlgo_t = ctypes.c_void_p

Define CuDNN data types

cudnnDataType = {
‘float32’: 0,
‘float64’: 1,
‘float16’: 2,
‘int8’: 3,
‘int32’: 4,
‘int8x4’: 5,
‘uint8’: 6,
‘uint8x4’: 7
}

Define CuDNN convolution forward algorithms

cudnnConvolutionFwdAlgo = {
‘implicit_gemm’: 0,
‘implicit_precomp_gemm’: 1,
‘gemm’: 2,
‘direct’: 3,
‘fft’: 4,
‘fft_tiling’: 5,
‘winograd’: 6,
‘winograd_nonfused’: 7,
‘count’: 8
}

Define input and filter dimensions

input_dim = (1, 3, 5, 5)
filter_dim = (2, 3, 3, 3)

Initialize input and filter arrays

input_data = np.random.randn(*input_dim).astype(np.float32)
filter_data = np.random.randn(*filter_dim).astype(np.float32)

Initialize CuDNN tensor descriptors

input_desc = cudnnTensorDescriptor_t()
cudnn_lib.cudnnCreateTensorDescriptor(ctypes.byref(input_desc))
cudnn_lib.cudnnSetTensor4dDescriptor(input_desc, # tensor descriptor
cudnnDataType[‘float32’], # data type
input_dim[0], # batch size
input_dim[1], # channels
input_dim[2], # height
input_dim[3]) # width

filter_desc = cudnnFilterDescriptor_t()
cudnn_lib.cudnnCreateFilterDescriptor(ctypes.byref(filter_desc))
cudnn_lib.cudnnSetFilter4dDescriptor(filter_desc, # filter descriptor
cudnnDataType[‘float32’], # data type
filter_dim[0], # output channels
filter_dim[1], # input channels
filter_dim[2], # filter height
filter_dim[3]) # filter width

Initialize CuDNN convolution descriptor

conv_desc = cudnnConvolutionDescriptor_t()
cudnn_lib.cudnnCreateConvolutionDescriptor(ctypes.byref(conv_desc))
cudnn_lib.cudnnSetConvolution2dDescriptor(conv_desc, # convolution descriptor
1, # pad height
1, # pad width
1, # vertical stride
1, # horizontal stride
1, # dilation height
1, # dilation width
1, # mode (CuDNN_CONVOLUTION)
cudnnDataType[‘float32’]) # data type

Get output dimensions

output_dim = (input_dim[0], filter_dim[0], input_dim[2]-filter_dim[2]+1, input_dim[3]-filter_dim[3]+1)

Initialize output array

output_data = np.zeros(output_dim, dtype=np.float32)

Initialize CuDNN tensor descriptor for output array

output_desc = cudnnTensorDescriptor_t()
cudnn_lib.cudnnCreateTensorDescriptor(ctypes.byref(output_desc))
cudnn_lib.cudnnSetTensor4dDescriptor(output_desc, # tensor descriptor
cudnnDataType[‘float32’], # data type
output_dim[0], # batch size
output_dim[1], # channels
output_dim[2], # height
output_dim[3]) # width

Initialize CuDNN convolution forward algorithm

conv_algo = cudnnConvolutionFwdAlgo_t()
cudnn_lib.cudnnGetConvolutionForwardAlgorithm(ctypes.byref(conv_algo),
input_desc,
filter_desc,
conv_desc,
output_desc,
cudnnConvolutionFwdAlgo[‘fft’],
0,
0)

Perform convolution using CuDNN

cudnn_lib.cudnnConvolutionForward(cudnn_lib._handle, # CuDNN handle
ctypes.byref(ctypes.c_float(1.0)), # alpha
input_desc, # input tensor descriptor
input_data.ctypes.data, # input data
filter_desc, # filter descriptor
filter_data.ctypes.data, # filter data
conv_desc, # convolution descriptor
conv_algo, # convolution algorithm
ctypes.byref(ctypes.c_void_p(0)), # workspace
0, # workspace size
ctypes.byref(ctypes.c_float(0.0)), # beta
output_desc, # output tensor descriptor
output_data.ctypes.data) # output data

Destroy CuDNN descriptors

cudnn_lib.cudnnDestroyTensorDescriptor(input_desc)
cudnn_lib.cudnnDestroyFilterDescriptor(filter_desc)
cudnn_lib.cudnnDestroyConvolutionDescriptor(conv_desc)
cudnn_lib.cudnnDestroyTensorDescriptor(output_desc)

“”“”“”“”“”“”“”“”“”“”“”“”“”“”“”“”“”"

but it produce an error that indicate the cudnn is not installed!

AttributeError Traceback (most recent call last)
Cell In[4], line 97
95 # Initialize CuDNN convolution forward algorithm
96 conv_algo = cudnnConvolutionFwdAlgo_t()
—> 97 cudnn_lib.cudnnGetConvolutionForwardAlgorithm(ctypes.byref(conv_algo),
98 input_desc,
99 filter_desc,
100 conv_desc,
101 output_desc,
102 cudnnConvolutionFwdAlgo[‘fft’],
103 0,
104 0)
106 # Perform convolution using CuDNN
107 cudnn_lib.cudnnConvolutionForward(cudnn_lib._handle, # CuDNN handle
108 ctypes.byref(ctypes.c_float(1.0)), # alpha
109 input_desc, # input tensor descriptor
(…)
118 output_desc, # output tensor descriptor
119 output_data.ctypes.data) # output data

File ~/miniconda3/envs/pose/lib/python3.10/ctypes/init.py:387, in CDLL.getattr(self, name)
385 if name.startswith(‘‘) and name.endswith(’’):
386 raise AttributeError(name)
→ 387 func = self.getitem(name)

→ 392 func = self._FuncPtr((name_or_ordinal, self))
393 if not isinstance(name_or_ordinal, int):
394 func.name = name_or_ordinal

AttributeError: /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn.so: undefined symbol: cudnnGetConvolutionForwardAlgorithm

After all, my purpose is to use cuda version 11.2 with cudnn 8.1.0.77 for a project that requires these specific versions! it is better for me to install it in the env instead of the base system!

Can anyone help me to figure this out?

I also have another question about the frameworks!

As i also want to use pytorch in my conda env, does the base system needs to have the same version of the pythorch cuda?

for example, as i installed cuda 11.2, but the latest version of pythorch installes cuda 11.8, do they have conflicts? should those be the same? how about the cudnn version?

I’m not sure what is the relationship between conda environment cuda, cudnn version with the base system ( while conda env is deactivated) cuda, cudnn version!

Thanks
Best regards

@AakankshaS @Ming_Q @nadeemm
can you explain it please?