I’ve build a new machine: AMD Ryzen 7 7700x 8-core with a GEforce RTX 4080 running Ubuntu 22.04. After installation, CUDA 12 with the most recent CUDA toolkit are installed and functional. Here is my dilemma - I’m trying to install tensorflow and keras, and have them take advantage of the GPU. I’ve searched the web, tried repeated installations with and without virtual environments, followed instructions from tensorflow/org, other instructions from anaconda/org, and suggestions from every corner I could find. I’ve even tried building up tensorflow from scratch on the machine. Every attempt has ended in a different kind of failure. Sometimes code runs (without the GPU), sometimes it fails (when trying to use the GPU). Sometimes libraries are missing, but when they are not something else fails. The most entertaining was when I finally succeeded in an installation where tensorflow imported without an error, the GPU was recognized, the NUMA error was corrected…and I got the “Optimization loop failed: Cancelled: Operation was cancelled” error but NOT every iteration - sometimes 1 out of 5, or 1 out of 10 (before you ask, the batch size was small and the memory on this GPU is relatively large - not that).
I understand that it will be some time before tensorflow catches up to CUDA 12. All I ask at this point is, given this hardware, someone point me in the direction of clear tensorflow installation instructions (and alternative cuda version installation instructions) that will actually work - so that tensorflow is imported without an error, and the GPU is available without error. As a point of comparison, I’ve got a Windows 11 machine with an RTX 3080 sitting next to this one - the exact same code runs on it flawlessly. I’ve concluded I’m just too dumb to get 4080+tensorflow+cuda+cudnn to function all together.
It’s possible to get this to work, but the results will be disappointing (performance wise) if you use a Tensorflow build that doesn’t have direct support for the cc8.9 GPU (4080).
One possible approach is to use NGC. The latest NGC TF containers support your GPU, and setup to use NGC is fairly simple. Install the OS, install the latest driver for your GPU, install docker and nvidia container runtime, and then sign up for NGC, then pull the latest TF container.
Took a bit of tweaking, but that worked. One would like to to think that anything you can run in Docker you should be able to install onto the platform directly, at least in theory, but I’ll take the win. Thank you very much!!
Full disclosure, even in the most recent Tensorflow container it does appear to be running a CUDA version < 12.0 for Tensorflow itself. nvidia-smi identifies CUDA 12.0, but if you run a benchmark in python:
from ai_benchmark import AIBenchmark
benchmark = AIBenchmark()
results = benchmark.run()
AI-Benchmark-v.0.1.2
Let the AI Games begin…
2022-12-19 18:54:52.344927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1637] Created device /device:GPU:0 with 12724 MB memory: → device: 0, name: NVIDIA GeForce RTX 4080, pci bus id: 0000:01:00.0, compute capability: 8.9
2022-12-19 18:54:52.345681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1637] Created device /device:GPU:0 with 12724 MB memory: → device: 0, name: NVIDIA GeForce RTX 4080, pci bus id: 0000:01:00.0, compute capability: 8.9
2022-12-19 18:54:52.346252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1637] Created device /device:GPU:0 with 12724 MB memory: → device: 0, name: NVIDIA GeForce RTX 4080, pci bus id: 0000:01:00.0, compute capability: 8.9
I’m not complaining - it works, and I can get my work done. I’ll just keep updating the container and sooner or later I’m sure the CUDA build will catch up.
Oh, and for any who were curious about the actual summary result:
Device Inference Score: 29853
Device Training Score: 35514
Device AI Score: 65367
For more information and results, please visit AI-Benchmark
As a point of comparison, in 2017 an NVIDIA TITAN V running with 5120 cores (CUDA), freq 1.20/1.46, with 12 GB RAM on CUDA 10.1 and Ubuntu 18.04 had an inference score of 16,192, a training score of 17,215 and and overall AI score of 33406.
Not too shabby an improvement, CUDA version notwithstanding :)