Hi, we have a problem with setting up an environment where we can utilize tensorflow (with gpu suport) on Ubuntu 20.04.
We have tried different guides and versions of CUDA, now we use nvidia-driver-460, cuDNN7.6.5, toolkit10.1, python 3.8.5 and tensorflow=2.5.0-dev20201028.
We have also tried to upgrade to driver-455, cuDNN8.0, toolkit 11.1 and had no more sucess.
Even though the GPU is found by tensorflow it fails to use it properly. importing packages, and show images work fine but when we try to load a custom model it freeze. No real error appear and the GPU is allocating 22/24 GB memory but nothing is happening.
All of this worked fine on a geforce1050 with the same code.
What can be the issue? What are the exact versions of driver, cuDNN, toolkit, python and tensorflow to make this work? If we can get the correct versions we can begin installing those and send specific errors for those versions.
Really hope someone has the time to help us with this, we have been stuck for several days now.
Best regards, Jonatan and Marcus
I have the same question with geforce 3070. Wait for answer.
I’m facing a similar issue on Windows with Conda and an RTX 3090.
I started out with an environment for our Titan GPUs (cudaToolkit 10.1, cuDNN 7.6, python 3.7 and tensorflow-gpu 2.3), which even let us run the training but the resulting values when running our hyperparameter optimization were not very trustworthy (resulting in NAN immediately at the beginning of an epoch) to very small values, which did not deviate in the magnitude they usually did.
I then read that the new GPU architecture would require Cuda 11 and a suitable cuDNN… which then requires Tensorflow 2.4 and Python 3.8 or newer… In the past three weeks I tried every available Cuda Toolkit version (11.2, 11.1 and 11.0), cuDNN version (8.1 and 8.0.4) and Tensorflow 2.5 and 2.4, and whereas cudaToolkit versions 11.2 and 11.1 did not work at all (cusolver64_11.dll was found but cusolver64_10.dll was requested, so I renamed that as suggested in another post… then cudnn64_8.dll is successfully opened but then cudnn_ops_infer64_8.dll, which is in the same folder as cudnn64_8.dll 2 files above, could not be found… So I ended up with a configuration using CudaToolkit 11.0, cuDNN 8.0.4, Tensorflow 2.4 and Python 3.8, which appears to work technically, but when performing the training, the results are sometimes NAN again right from the beginning, or unexpectedly low…
Could you please notify me, if you have any new information about it?
The issue was solved by using the latest recommended GPU driver (455), and then create a conda environment with python 3.8 and tensorflow 2.4. When we then simply used
pip install cudatoolkit everything installed successfully and tensorflow worked with GPU support. Every attempt to install cudnn or specific version of CUDA BEFORE cudatoolkit made the installation fail for some reason.
I upload the conda environment that works for us in case it help anyone out there with similar problems. Let me know if you understand the issue, I do not think I can help you more than this unfortunately.
stereo_gpu.yml (10.5 KB)