Tensorflow and CUDA 12

jdslatermd · December 16, 2022, 11:38pm

I admit defeat.

I’ve build a new machine: AMD Ryzen 7 7700x 8-core with a GEforce RTX 4080 running Ubuntu 22.04. After installation, CUDA 12 with the most recent CUDA toolkit are installed and functional. Here is my dilemma - I’m trying to install tensorflow and keras, and have them take advantage of the GPU. I’ve searched the web, tried repeated installations with and without virtual environments, followed instructions from tensorflow/org, other instructions from anaconda/org, and suggestions from every corner I could find. I’ve even tried building up tensorflow from scratch on the machine. Every attempt has ended in a different kind of failure. Sometimes code runs (without the GPU), sometimes it fails (when trying to use the GPU). Sometimes libraries are missing, but when they are not something else fails. The most entertaining was when I finally succeeded in an installation where tensorflow imported without an error, the GPU was recognized, the NUMA error was corrected…and I got the “Optimization loop failed: Cancelled: Operation was cancelled” error but NOT every iteration - sometimes 1 out of 5, or 1 out of 10 (before you ask, the batch size was small and the memory on this GPU is relatively large - not that).

I understand that it will be some time before tensorflow catches up to CUDA 12. All I ask at this point is, given this hardware, someone point me in the direction of clear tensorflow installation instructions (and alternative cuda version installation instructions) that will actually work - so that tensorflow is imported without an error, and the GPU is available without error. As a point of comparison, I’ve got a Windows 11 machine with an RTX 3080 sitting next to this one - the exact same code runs on it flawlessly. I’ve concluded I’m just too dumb to get 4080+tensorflow+cuda+cudnn to function all together.

Please help.

Robert_Crovella · December 17, 2022, 10:48pm

It’s possible to get this to work, but the results will be disappointing (performance wise) if you use a Tensorflow build that doesn’t have direct support for the cc8.9 GPU (4080).

One possible approach is to use NGC. The latest NGC TF containers support your GPU, and setup to use NGC is fairly simple. Install the OS, install the latest driver for your GPU, install docker and nvidia container runtime, and then sign up for NGC, then pull the latest TF container.

jdslatermd · December 19, 2022, 6:49pm

Took a bit of tweaking, but that worked. One would like to to think that anything you can run in Docker you should be able to install onto the platform directly, at least in theory, but I’ll take the win. Thank you very much!!

jdslatermd · December 19, 2022, 6:58pm

Full disclosure, even in the most recent Tensorflow container it does appear to be running a CUDA version < 12.0 for Tensorflow itself. nvidia-smi identifies CUDA 12.0, but if you run a benchmark in python:

from ai_benchmark import AIBenchmark

benchmark = AIBenchmark()

results = benchmark.run()

AI-Benchmark-v.0.1.2
Let the AI Games begin…

2022-12-19 18:54:52.344927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1637] Created device /device:GPU:0 with 12724 MB memory: → device: 0, name: NVIDIA GeForce RTX 4080, pci bus id: 0000:01:00.0, compute capability: 8.9
2022-12-19 18:54:52.345681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1637] Created device /device:GPU:0 with 12724 MB memory: → device: 0, name: NVIDIA GeForce RTX 4080, pci bus id: 0000:01:00.0, compute capability: 8.9
2022-12-19 18:54:52.346252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1637] Created device /device:GPU:0 with 12724 MB memory: → device: 0, name: NVIDIA GeForce RTX 4080, pci bus id: 0000:01:00.0, compute capability: 8.9

TF Version: 2.10.0
Platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.29
CPU: N/A
CPU RAM: 62 GB
GPU/0: NVIDIA GeForce RTX 4080
GPU RAM: 12.4 GB
CUDA Version: 11.8
CUDA Build: V11.8.89

I’m not complaining - it works, and I can get my work done. I’ll just keep updating the container and sooner or later I’m sure the CUDA build will catch up.

jdslatermd · December 19, 2022, 7:11pm

Oh, and for any who were curious about the actual summary result:

Device Inference Score: 29853
Device Training Score: 35514
Device AI Score: 65367

For more information and results, please visit AI-Benchmark

As a point of comparison, in 2017 an NVIDIA TITAN V running with 5120 cores (CUDA), freq 1.20/1.46, with 12 GB RAM on CUDA 10.1 and Ubuntu 18.04 had an inference score of 16,192, a training score of 17,215 and and overall AI score of 33406.

Not too shabby an improvement, CUDA version notwithstanding :)

Robert_Crovella · December 20, 2022, 5:01am

That’s correct, the latest TF container currently is 22.11 and it includes CUDA 11.8, per the release notes.

What I said was:

I didn’t say the latest containers use CUDA 12. For that, even for NGC, you will have to wait.

CUDA 11.8 is the first CUDA build that officially/natively supports cc8.9 GPUs.

system · January 3, 2023, 5:01am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tensorflow1.14 is not working on RTX3090 inside the Docker container of Ubuntu18.04 and CUDA10.0 with Python2 CUDA Programming and Performance cuda , tensorflow , ubuntu , docker	11	5492	April 2, 2022
CUDA 9.0 ImportError: libcublas.so.8.0 CUDA Setup and Installation	17	39494	January 22, 2018
Why is it so confusing to update nvidia drivers for an older Tesla C2070? CUDA Setup and Installation	7	813	January 27, 2021
Tensorflow 2.1 with CUDA10.2 warnings .. Frameworks tensorflow	15	17755	July 3, 2020
cuDNN/CUDA/TensorFlow setup prroblem CUDA Setup and Installation	2	1111	March 17, 2020
Slow startup and model loading time Frameworks tensorflow , ubuntu	6	4905	April 27, 2024
I'm using a RTX2080 and I'm trying to use tensorflow-gpu CUDA Setup and Installation	1	1240	November 13, 2018
Can't get cuda:10.0 docker container to run with tensorflow-gpu Frameworks tensorflow	3	1434	March 4, 2020
CUDA 10.2 & Tensorflow 2.0. Getting an error when testing Tensorflow CUDA Setup and Installation	7	20926	March 20, 2020
all CUDA-capable devices are busy or unavailable. What is wrong? cuDNN	10	9602	October 12, 2021

Tensorflow and CUDA 12

Related topics