Conda environments: The pytorch and nvidia channels aren't playing nicely together and the nvidia channel is out of date

pgoetz1 · September 27, 2023, 2:30pm

If I set up a conda pytorch environment like this:

conda activate pytorch-cuda
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

That works; at least insofar as being able to import torch in python. If, however, I add cuDNN:

conda install cudnn -c nvidia

Things are no longer warm and fuzzy:

(torch-cuda1) pgoetz@finglas ~$ python --version
Python 3.11.5
(torch-cuda1) pgoetz@finglas ~$ python
Python 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/lusr/opt/miniconda/envs/torch-cuda1/lib/python3.11/site-packages/torch/__init__.py", line 229, in <module>
from torch._C import *  # noqa: F403
^^^^^^^^^^^^^^^^^^^^^^
ImportError: /lusr/opt/miniconda/envs/torch-cuda1/lib/python3.11/site-packages/torch/lib/libc10_cuda.so: undefined symbol: cudaMemPoolSetAttribute, version libcudart.so.11.0
>>>

What’s happening is the cuDNN conda package is installing and relinking an older version of libcudart.so.11.0. Here is what is in /miniconda/envs/pytorch-cuda/lib before cuDNN is installed:

# ls -l libcudart*
-rwxr-xr-x 3 root root 695712 Sep 21  2022 libcudart.so.11.8.89

Here is what it looks like after the cudnn package is installed from the nvidia channel:

# ls -l libcudart*
lrwxrwxrwx 1 root root     20 Sep 25 13:12 libcudart.so -> libcudart.so.11.1.74
lrwxrwxrwx 1 root root     20 Sep 25 13:12 libcudart.so.11.0 -> libcudart.so.11.1.74
-rwxr-xr-x 2 root root 554032 Oct 14  2020 libcudart.so.11.1.74
-rwxr-xr-x 3 root root 695712 Sep 21  2022 libcudart.so.11.8.89

It looks like something similar is happening with libcusparse.so.11, and possibly other libraries, I didn’t bother trying to track them all down.

It looks like the cuDNN packages in the nvidia conda channel are extremely out of date:

(torch-cuda1) pgoetz@finglas ~$ conda search -c nvidia cudnn
Loading channels: done
# Name                       Version           Build  Channel             
cudnn                          7.0.5       cuda8.0_0  pkgs/main           
cudnn                          7.1.2       cuda9.0_0  pkgs/main           
cudnn                          7.1.3       cuda8.0_0  pkgs/main           
cudnn                          7.2.1       cuda9.2_0  pkgs/main           
cudnn                          7.3.1      cuda10.0_0  pkgs/main           
cudnn                          7.3.1       cuda9.0_0  pkgs/main           
cudnn                          7.3.1       cuda9.2_0  pkgs/main           
cudnn                          7.6.0      cuda10.0_0  nvidia              
cudnn                          7.6.0      cuda10.0_0  pkgs/main           
cudnn                          7.6.0      cuda10.1_0  nvidia              
cudnn                          7.6.0      cuda10.1_0  pkgs/main           
cudnn                          7.6.0       cuda9.0_0  pkgs/main           
cudnn                          7.6.0       cuda9.2_0  nvidia              
cudnn                          7.6.0       cuda9.2_0  pkgs/main           
cudnn                          7.6.4      cuda10.0_0  pkgs/main           
cudnn                          7.6.4      cuda10.1_0  pkgs/main           
cudnn                          7.6.4       cuda9.0_0  pkgs/main           
cudnn                          7.6.4       cuda9.2_0  pkgs/main           
cudnn                          7.6.5      cuda10.0_0  pkgs/main           
cudnn                          7.6.5      cuda10.1_0  pkgs/main           
cudnn                          7.6.5      cuda10.2_0  pkgs/main           
cudnn                          7.6.5       cuda9.0_0  pkgs/main           
cudnn                          7.6.5       cuda9.2_0  pkgs/main           
cudnn                          8.0.0      cuda10.2_0  nvidia              
cudnn                          8.0.0      cuda11.0_0  nvidia              
cudnn                          8.0.4      cuda10.1_0  nvidia              
cudnn                          8.0.4      cuda10.2_0  nvidia              
cudnn                          8.0.4      cuda11.0_0  nvidia              
cudnn                          8.0.4      cuda11.1_0  nvidia              
cudnn                          8.2.1      cuda11.3_0  pkgs/main           
cudnn                       8.9.2.26        cuda11_0  pkgs/main

which is likely the source of the problem. If I install cudnn v.8.9.2.26 from main, then things seem to work; well, at least I can import torch without crashing out.

So, this is kind of a mess. I must install cuda from the nvidia channel (it’s not available elsewhere), but then should not use the nvidia channel for cudnn, where the main channel has, if not the newest, but a much newer version of these libraries. To make matters worse, the conda-forge channel also includes cudnn packages (through 8.8), and conda installs packages based on channel priority, so it’s pretty easy to mess this up. Thoughts on the best strategy for dealing with this?

AakankshaS · September 29, 2023, 10:09am

Hi @pgoetz1 ,
Would you mind trying the Pytorch NGC container and let us know if this works?

Thanks

pgoetz1 · October 4, 2023, 2:40pm

Hi -

Yes, sure. Where can I find that?

AakankshaS · October 12, 2023, 9:52am

Hi @pgoetz1 ,

Please find the link for the same

Thanks

Topic		Replies	Views
Cudnn not available in `conda install cuda -c nvidia` CUDA Setup and Installation	0	3687	October 4, 2023
Which nvidia packages are needed to run pytorch? Jetson Xavier NX pytorch	5	1463	July 5, 2023
Cuda directory inside container doesn't contain enough libraries to import torch Jetson AGX Xavier tensorrt , cuda , pytorch	4	706	June 16, 2023
Inconsistent conda environment in nvidia pytorch container Docker and NVIDIA Docker nvbugs	1	1968	September 6, 2022
Cannot install Pytorch 2.x with CUDA support Jetson Orin Nano cuda , pytorch	9	837	September 2, 2024
Getting error, RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED while running a basic RNN model TensorRT pytorch	3	19325	April 17, 2023
What's the relationship between cuda_toolkit and pytorch CUDA Setup and Installation cuda , ubuntu , pytorch , wsl	3	10824	August 16, 2023
Jetson Nano Torch 1.6.0 PyTorch Vision v0.7.0-rc2 Runtime Error Jetson Nano pytorch	4	1436	October 18, 2021
CONDA ENV compatible NVIDIA driver, cuda, cuddn version cuDNN cuda	2	3042	July 2, 2023
Torch on Jetson Jetpack 6.2 Jetson Orin Nano pytorch	4	1025	February 24, 2025

Conda environments: The pytorch and nvidia channels aren't playing nicely together and the nvidia channel is out of date

Related topics