CUDA initialization error on deep learning framework (Tensorflow, PyTorch) and deviceQuery

Hi,

I have problem to use deep learning framework (TensorFlow, PyTorch) with my GPUs.
This is my setting

  • 4 x A100-SXM 80 GB
  • Nvidia driver version : 470.57.02
  • CUDA version : 11.0

I installed NVIDIA driver and CUDA from official runfile, especially I installed different version of CUDA toolkit for framework compatibility.

Both command nvidia-smi and nvcc -V works fine.

However, it has raised me CUDA initialization error on both frameworks.
When I also try to test deviceQuery from CUDA samples, it has the same issue :(

This is my screenshot about the issue.

Please support me to solve this issue ! 🙏

Hello @line.k0
In the screenshot, the driver’s CUDA version is 11.4 (top right corner of nvidia-smi command).

Can you downgrade your driver version for CUDA 11.0 or can you upgrade frameworks’ packages for CUDA 11.4?

Additionally, your nvidia-smi and nvcc commands’ results have different CUDA versions. Did you upgrade your GPU driver recently?

Regards

Hi @mehmetdeniz !
Thank you for reply.

I also tried both options you suggested.

  1. It still has same issue for the lower version of NVIDIA driver and CUDA 11.4.
    I attach screenshot my trial for the CUDA 11.4.

I wonder if one driver version works for CUDA. From the release note, it shows that it needs to satisfy higher than the minimum required version. https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

  1. I installed nvidia-driver first, and downgrade to CUDA 11.0 for the framework.
    I found that it doesn’t matter though the different CUDA version between nvidia-smi and nvcc comand.
    https://stackoverflow.com/questions/53422407/different-cuda-versions-shown-by-nvcc-and-nvidia-smi