Hi,
I have problem to use deep learning framework (TensorFlow, PyTorch) with my GPUs.
This is my setting
- 4 x A100-SXM 80 GB
- Nvidia driver version : 470.57.02
- CUDA version : 11.0
I installed NVIDIA driver and CUDA from official runfile, especially I installed different version of CUDA toolkit for framework compatibility.
Both command nvidia-smi and nvcc -V works fine.
However, it has raised me CUDA initialization error on both frameworks.
When I also try to test deviceQuery from CUDA samples, it has the same issue :(
This is my screenshot about the issue.
Please support me to solve this issue ! 🙏
Hello @line.k0
In the screenshot, the driver’s CUDA version is 11.4 (top right corner of nvidia-smi command).
Can you downgrade your driver version for CUDA 11.0 or can you upgrade frameworks’ packages for CUDA 11.4?
Additionally, your nvidia-smi and nvcc commands’ results have different CUDA versions. Did you upgrade your GPU driver recently?
Regards
Hi @mehmetdeniz !
Thank you for reply.
I also tried both options you suggested.
- It still has same issue for the lower version of NVIDIA driver and CUDA 11.4.
I attach screenshot my trial for the CUDA 11.4.
I wonder if one driver version works for CUDA. From the release note, it shows that it needs to satisfy higher than the minimum required version. https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
- I installed nvidia-driver first, and downgrade to CUDA 11.0 for the framework.
I found that it doesn’t matter though the different CUDA version between nvidia-smi and nvcc comand.
https://stackoverflow.com/questions/53422407/different-cuda-versions-shown-by-nvcc-and-nvidia-smi